Discourse Markers¶
TRUNAJOD discourse markers module.
This discourse markers module is based on the lexicon described in [iAMC05], and at the point of writing they are under https://cs.famaf.unc.edu.ar/~laura/shallowdisc4summ/discmar/. TRUNAJOD comes with this lexicon bundled, and this module has the following set ‘constants’:
Type of discourse marker |
Discourse marker |
---|---|
cause |
dado que, porque, debido a, gracias a, por si, por, por eso, en conclusión, así que, como consecuencia, para, para que, por estar razón, por tanto, en efecto |
context |
teniendo en cuenta, después, antes, originalmente, a condición de, durante, mientras, a no ser que, cuando, donde, de acuerdo con, lejos de, tan pronto como, por el momento, entre, hacia, hasta, mediante, según, en cualquier caso, entonces, respecto a, en ese caso, si, siempre que, sin duda, a la vez |
equality |
en resumen, concretamente, en esencia, en comparación, en otras palabras, en particular, es decir, por ejemplo, precisamente, tal como, por último, por un lado, por otro lado, a propósito, no sólo, sino también, en dos palabras, además, también, aparte, aún es más, incluso, especialmente, sobretodo |
highly polysemic |
como, desde, sobre, antes que nada, para empezar |
Revision |
a pesar de, aunque, excepto, pese a, no obstante, sin embargo, en realidad, de hecho, al contrario, el hecho es que, es cierto que, pero’, con todo, ahora bien, de todos modos |
Vague meaning closed class words |
y, e, ni, o, u, que, con, sin, contra, en, a, |
The following module constants are defined as sets CAUSE_DISCOURSE_MARKERS
,
CONTEXT_DISCOURSE_MARKERS
, EQUALITY_DISCOURSE_MARKERS
,
HIGHLY_POLYSEMIC_DISCOURSE_MARKERS
, REVISION_DISCOURSE_MARKERS
,
VAGUE_MEANING_CLOSED_CLASS_WORDS
.
-
TRUNAJOD.discourse_markers.
find_matches
(text: str, list: List[str]) → int¶ Return matches of words in list in a target text.
Given a text and a list of possible matches (in this module, discourse markers list), returns the number of matches found in text. This ignores case.
Hint
For non-Spanish users You could use this function with your custom list of discourse markers in case you need to compute this metric. In that case, the way to call the funcion would be:
find_matches(YOUR_TEXT, ["dm1", "dm2", etc])
- Parameters
text (string) – Text to be processed
list (Python list of strings) – list of discourse markers
- Returns
Number of ocurrences
- Return type
int
-
TRUNAJOD.discourse_markers.
get_cause_dm_count
(text: spacy.tokens.doc.Doc) → float¶ Count discourse markers associated with cause.
- Parameters
text (Spacy Doc) – The text to be analized
- Returns
Average of revision cause markers over sentences
- Return type
float
-
TRUNAJOD.discourse_markers.
get_closed_class_vague_meaning_count
(text: spacy.tokens.doc.Doc) → float¶ Count words that have vague meaning.
- Parameters
text (Spacy Doc) – The text to be analized
- Returns
Average of vague meaning words over sentences
- Return type
float
-
TRUNAJOD.discourse_markers.
get_context_dm_count
(text: spacy.tokens.doc.Doc) → float¶ Count discourse markers associated with context.
- Parameters
text (Spacy Doc) – The text to be analized
- Returns
Average of context discourse markers over sentences
- Return type
float
-
TRUNAJOD.discourse_markers.
get_equality_dm_count
(text: spacy.tokens.doc.Doc) → float¶ Count discourse markers associated with equality.
- Parameters
text (Spacy Doc) – The text to be analized
- Returns
Average of equality discourse markers over sentences
- Return type
float
-
TRUNAJOD.discourse_markers.
get_overall_markers
(text: spacy.tokens.doc.Doc) → float¶ Count all types of discourse markers.
- Parameters
text (Spacy Doc) – The text to be analized
- Returns
Average discourse markers over sentences
- Return type
float
-
TRUNAJOD.discourse_markers.
get_polysemic_dm_count
(text: spacy.tokens.doc.Doc) → float¶ Count discourse markers that are highly polysemic.
- Parameters
text (Spacy Doc) – The text to be analized
- Returns
Average of highly polysemic discourse markers over sentences
- Return type
float
-
TRUNAJOD.discourse_markers.
get_revision_dm_count
(text: spacy.tokens.doc.Doc) → float¶ Count discourse markers associated with revisions.
- Parameters
text (Spacy Doc) – The text to be analized
- Returns
Average of revision discourse markers over sentences
- Return type
float
- iAMC05
Laura Alonso i Alemany, Irene Castellón Masalles, and Lluıs Padró Cirera. Representing discourse for automatic text summarization via shallow nlp techniques. Unpublished PhD thesis, Universitat de Barcelona, 2005.