Semantic Measures¶
Semantic measures TRUNAJOD methods.
The dimensions defined in this module, require external knowledge, for example synonym overlap measurement requires knowledge from a word onthology, and semantic measurements require word vectors (word embeddings) obtained from CORPUS semantics.
-
TRUNAJOD.semantic_measures.
avg_w2v_semantic_similarity
(docs, N)¶ Compute average semantic similarity between adjacent sentences.
This is using word2vec [MCC+13] model based on SPACY implementation. The semantic similarity is based on [FKL98] approach to compute text coherence.
- Parameters
docs (Doc Generator) – Docs generator provided by SPACY API
N (int) – Number of sentences
- Returns
Average sentence similarity (cosine)
- Return type
float
-
TRUNAJOD.semantic_measures.
get_synsets
(lemma, synset_dict)¶ Return synonym set given a word lemma.
The function requires that the synset_dict is passed into it. In our case we provide downloadable models from MCR (Multilingual-Central-Repository). [GALR12]. If the lemma is not found in the synset_dict, then this function returns a set with the lemma in it.
- Parameters
lemma (string) – Lemma to be look-up into the synset
synset_dict (Python dict) – key-value pairs, lemma to synset
- Returns
The set of synonyms of a given lemma
- Return type
Python set of strings
-
TRUNAJOD.semantic_measures.
overlap
(lemma_list_group, synset_dict)¶ Compute average overlap in a text.
Computes semantic synset overlap (synonyms), based on a lemma list group and a dictionary containing synsets. Note that the computations are carried out dividing by number of text segments considered; matches TAACO implementation. For more details about this measurement, refer to [CKM16]
- Parameters
lemma_list_group (List of List of strings) – List of tokenized and lemmatized sentences
synset_dict (Python dict) – key-value pairs for lemma-synonyms
- Returns
Average overlap between sentences
- Return type
float
- CKM16
Scott A Crossley, Kristopher Kyle, and Danielle S McNamara. The tool for the automatic analysis of text cohesion (taaco): automatic assessment of local, global, and text cohesion. Behavior research methods, 48(4):1227–1237, 2016.
- FKL98
Peter W Foltz, Walter Kintsch, and Thomas K Landauer. The measurement of textual coherence with latent semantic analysis. Discourse processes, 25(2-3):285–307, 1998.
- GALR12
Aitor Gonzalez-Agirre, Egoitz Laparra, and German Rigau. Multilingual central repository version 3.0. In LREC, 2525–2529. 2012.
- MCC+13
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, L Sutskever, and G Zweig. Word2vec. URL https://code. google. com/p/word2vec, 2013.