Discourse Markers¶

TRUNAJOD discourse markers module.

This discourse markers module is based on the lexicon described in [iAMC05], and at the point of writing they are under https://cs.famaf.unc.edu.ar/~laura/shallowdisc4summ/discmar/. TRUNAJOD comes with this lexicon bundled, and this module has the following set ‘constants’:

Type of discourse marker	Discourse marker
cause	dado que, porque, debido a, gracias a, por si, por, por eso, en conclusión, así que, como consecuencia, para, para que, por estar razón, por tanto, en efecto
context	teniendo en cuenta, después, antes, originalmente, a condición de, durante, mientras, a no ser que, cuando, donde, de acuerdo con, lejos de, tan pronto como, por el momento, entre, hacia, hasta, mediante, según, en cualquier caso, entonces, respecto a, en ese caso, si, siempre que, sin duda, a la vez
equality	en resumen, concretamente, en esencia, en comparación, en otras palabras, en particular, es decir, por ejemplo, precisamente, tal como, por último, por un lado, por otro lado, a propósito, no sólo, sino también, en dos palabras, además, también, aparte, aún es más, incluso, especialmente, sobretodo
highly polysemic	como, desde, sobre, antes que nada, para empezar
Revision	a pesar de, aunque, excepto, pese a, no obstante, sin embargo, en realidad, de hecho, al contrario, el hecho es que, es cierto que, pero’, con todo, ahora bien, de todos modos
Vague meaning closed class words	y, e, ni, o, u, que, con, sin, contra, en, a,

The following module constants are defined as sets CAUSE_DISCOURSE_MARKERS, CONTEXT_DISCOURSE_MARKERS, EQUALITY_DISCOURSE_MARKERS, HIGHLY_POLYSEMIC_DISCOURSE_MARKERS, REVISION_DISCOURSE_MARKERS, VAGUE_MEANING_CLOSED_CLASS_WORDS.

TRUNAJOD.discourse_markers.find_matches(text: str, list: List[str]) → int¶

Return matches of words in list in a target text.

Given a text and a list of possible matches (in this module, discourse markers list), returns the number of matches found in text. This ignores case.

Hint

For non-Spanish users You could use this function with your custom list of discourse markers in case you need to compute this metric. In that case, the way to call the funcion would be: find_matches(YOUR_TEXT, ["dm1", "dm2", etc])

Parameters

text (string) – Text to be processed
list (Python list of strings) – list of discourse markers

Returns

Number of ocurrences

Return type

int

TRUNAJOD.discourse_markers.get_cause_dm_count(text: spacy.tokens.doc.Doc) → float¶

Count discourse markers associated with cause.

Parameters: text (Spacy Doc) – The text to be analized
Returns: Average of revision cause markers over sentences
Return type: float

TRUNAJOD.discourse_markers.get_closed_class_vague_meaning_count(text: spacy.tokens.doc.Doc) → float¶

Count words that have vague meaning.

Parameters: text (Spacy Doc) – The text to be analized
Returns: Average of vague meaning words over sentences
Return type: float

TRUNAJOD.discourse_markers.get_context_dm_count(text: spacy.tokens.doc.Doc) → float¶

Count discourse markers associated with context.

Parameters: text (Spacy Doc) – The text to be analized
Returns: Average of context discourse markers over sentences
Return type: float

TRUNAJOD.discourse_markers.get_equality_dm_count(text: spacy.tokens.doc.Doc) → float¶

Count discourse markers associated with equality.

Parameters: text (Spacy Doc) – The text to be analized
Returns: Average of equality discourse markers over sentences
Return type: float

TRUNAJOD.discourse_markers.get_overall_markers(text: spacy.tokens.doc.Doc) → float¶

Count all types of discourse markers.

Parameters: text (Spacy Doc) – The text to be analized
Returns: Average discourse markers over sentences
Return type: float

TRUNAJOD.discourse_markers.get_polysemic_dm_count(text: spacy.tokens.doc.Doc) → float¶

Count discourse markers that are highly polysemic.

Parameters: text (Spacy Doc) – The text to be analized
Returns: Average of highly polysemic discourse markers over sentences
Return type: float

TRUNAJOD.discourse_markers.get_revision_dm_count(text: spacy.tokens.doc.Doc) → float¶

Count discourse markers associated with revisions.

Parameters: text (Spacy Doc) – The text to be analized
Returns: Average of revision discourse markers over sentences
Return type: float

iAMC05: Laura Alonso i Alemany, Irene Castellón Masalles, and Lluıs Padró Cirera. Representing discourse for automatic text summarization via shallow nlp techniques. Unpublished PhD thesis, Universitat de Barcelona, 2005.