Discourse Markers

TRUNAJOD discourse markers module.

This discourse markers module is based on the lexicon described in [iAMC05], and at the point of writing they are under https://cs.famaf.unc.edu.ar/~laura/shallowdisc4summ/discmar/. TRUNAJOD comes with this lexicon bundled, and this module has the following set ‘constants’:

Type of discourse marker

Discourse marker

cause

dado que, porque, debido a, gracias a, por si, por, por eso, en conclusión, así que, como consecuencia, para, para que, por estar razón, por tanto, en efecto

context

teniendo en cuenta, después, antes, originalmente, a condición de, durante, mientras, a no ser que, cuando, donde, de acuerdo con, lejos de, tan pronto como, por el momento, entre, hacia, hasta, mediante, según, en cualquier caso, entonces, respecto a, en ese caso, si, siempre que, sin duda, a la vez

equality

en resumen, concretamente, en esencia, en comparación, en otras palabras, en particular, es decir, por ejemplo, precisamente, tal como, por último, por un lado, por otro lado, a propósito, no sólo, sino también, en dos palabras, además, también, aparte, aún es más, incluso, especialmente, sobretodo

highly polysemic

como, desde, sobre, antes que nada, para empezar

Revision

a pesar de, aunque, excepto, pese a, no obstante, sin embargo, en realidad, de hecho, al contrario, el hecho es que, es cierto que, pero’, con todo, ahora bien, de todos modos

Vague meaning closed class words

y, e, ni, o, u, que, con, sin, contra, en, a,

The following module constants are defined as sets CAUSE_DISCOURSE_MARKERS, CONTEXT_DISCOURSE_MARKERS, EQUALITY_DISCOURSE_MARKERS, HIGHLY_POLYSEMIC_DISCOURSE_MARKERS, REVISION_DISCOURSE_MARKERS, VAGUE_MEANING_CLOSED_CLASS_WORDS.

TRUNAJOD.discourse_markers.find_matches(text: str, list: List[str]) → int

Return matches of words in list in a target text.

Given a text and a list of possible matches (in this module, discourse markers list), returns the number of matches found in text. This ignores case.

Hint

For non-Spanish users You could use this function with your custom list of discourse markers in case you need to compute this metric. In that case, the way to call the funcion would be: find_matches(YOUR_TEXT, ["dm1", "dm2", etc])

Parameters
  • text (string) – Text to be processed

  • list (Python list of strings) – list of discourse markers

Returns

Number of ocurrences

Return type

int

TRUNAJOD.discourse_markers.get_cause_dm_count(text: spacy.tokens.doc.Doc) → float

Count discourse markers associated with cause.

Parameters

text (Spacy Doc) – The text to be analized

Returns

Average of revision cause markers over sentences

Return type

float

TRUNAJOD.discourse_markers.get_closed_class_vague_meaning_count(text: spacy.tokens.doc.Doc) → float

Count words that have vague meaning.

Parameters

text (Spacy Doc) – The text to be analized

Returns

Average of vague meaning words over sentences

Return type

float

TRUNAJOD.discourse_markers.get_context_dm_count(text: spacy.tokens.doc.Doc) → float

Count discourse markers associated with context.

Parameters

text (Spacy Doc) – The text to be analized

Returns

Average of context discourse markers over sentences

Return type

float

TRUNAJOD.discourse_markers.get_equality_dm_count(text: spacy.tokens.doc.Doc) → float

Count discourse markers associated with equality.

Parameters

text (Spacy Doc) – The text to be analized

Returns

Average of equality discourse markers over sentences

Return type

float

TRUNAJOD.discourse_markers.get_overall_markers(text: spacy.tokens.doc.Doc) → float

Count all types of discourse markers.

Parameters

text (Spacy Doc) – The text to be analized

Returns

Average discourse markers over sentences

Return type

float

TRUNAJOD.discourse_markers.get_polysemic_dm_count(text: spacy.tokens.doc.Doc) → float

Count discourse markers that are highly polysemic.

Parameters

text (Spacy Doc) – The text to be analized

Returns

Average of highly polysemic discourse markers over sentences

Return type

float

TRUNAJOD.discourse_markers.get_revision_dm_count(text: spacy.tokens.doc.Doc) → float

Count discourse markers associated with revisions.

Parameters

text (Spacy Doc) – The text to be analized

Returns

Average of revision discourse markers over sentences

Return type

float

iAMC05

Laura Alonso i Alemany, Irene Castellón Masalles, and Lluıs Padró Cirera. Representing discourse for automatic text summarization via shallow nlp techniques. Unpublished PhD thesis, Universitat de Barcelona, 2005.