Entity Grids¶
Entity grid module for TRUNAJOD.
In this module, entity grid based features are implemented. On one side, an entity grid [BL08] implementation is provided. We also provide an implementation of the entity graph coherence modeling [GS13].
Danger
These set of features or measurements Really depends on the dependency parsing accuracy, which relies on the CORPUS the dependency parsed was trained. There is no guarantee that this will work with all types of texts. On the other hand, the implementation is simple and we do not do any coreference resolution for noun-phrases and just rely on simple heuristics.
It is also worth noting, that we consider an entity grid of two-sentence sequence and the API currently does not provide any hyper-parameter tunning to change this.
-
class
TRUNAJOD.entity_grid.
EntityGrid
(doc, model_name='spacy')¶ Entity grid class.
Class Entity Grid, creates an entity grid from a doc, which is output of applying spacy.nlp(text) to a text. Thus, this class depends on spacy module. It only supports 2-transitions entity grid.
-
get_egrid
() → dict¶ Return obtained entity grid (for debugging purposes).
- Returns
entity grid represented as a dict
- Return type
dict
-
get_nn_transitions
() → float¶ Get – transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_no_transitions
() → float¶ Get -O transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_ns_transitions
() → float¶ Get -S transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_nx_transitions
() → float¶ Get -X transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_on_transitions
() → float¶ Get O- transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_oo_transitions
() → float¶ Get OO transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_os_transitions
() → float¶ Get OS transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_ox_transitions
() → float¶ Get OX transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_sentence_count
() → int¶ Return sentence count obtained while processing.
- Returns
Number of sentences
- Return type
int
-
get_sn_transitions
() → float¶ Get S- transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_so_transitions
() → float¶ Get SO transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_ss_transitions
() → float¶ Get SS transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_sx_transitions
() → float¶ Get SX transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_xn_transitions
() → float¶ Get X- transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_xo_transitions
() → float¶ Get XO transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_xs_transitions
() → float¶ Get XS transitions.
- Returns
Ratio of transitions
- Return type
float
-
get_xx_transitions
() → float¶ Get XX transitions.
- Returns
Ratio of transitions
- Return type
float
-
-
TRUNAJOD.entity_grid.
dependency_mapping
(dep: str) → str¶ Map dependency tag to entity grid tag.
We consider the notation provided in [BL08]:
EGrid Tag
Dependency Tag
S
nsub, csubj, csubjpass, dsubjpass
O
iobj, obj, pobj, dobj
X
For any other dependency tag
- Parameters
dep (string) – Dependency tag
- Returns
EGrid tag
- Return type
string
-
TRUNAJOD.entity_grid.
get_local_coherence
(egrid: TRUNAJOD.entity_grid.EntityGrid) → [<class 'float'>, <class 'float'>, <class 'float'>, <class 'float'>]¶ Get local coherence from entity grid.
This method gets the coherence value using all the approaches described in [GS13]. This include:
local_coherence_PU
local_coherence_PW
local_coherence_PACC
local_coherence_PU_dist
local_coherence_PW_dist
local_coherence_PACC_dist
- Parameters
egrid (EntityGrid) – An EntityGrid object.
- Returns
Local coherence based on different heuristics
- Return type
tuple of floats
-
TRUNAJOD.entity_grid.
weighting_syntactic_role
(entity_role: str) → int¶ Return weight given an entity grammatical role.
Weighting scheme for syntactic role of an entity. This uses the heuristic from [GS13], which is:
EGrid Tag
Weight
S
3
O
2
X
1
dash
0
- Parameters
entity_role (string) – Entity grammatical role (S, O, X, -)
- Returns
Role weight
- Return type
int
- BL08(1,2)
Regina Barzilay and Mirella Lapata. Modeling local coherence: an entity-based approach. Computational Linguistics, 34(1):1–34, 2008.
- GS13(1,2,3)
Camille Guinaudeau and Michael Strube. Graph-based local coherence modeling. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 93–103. 2013.