Syllabizer¶
Syllabizer module.
This syllabizator is for spanish texts. It is based on http://sramatic.tripod.com/silabas.html
And based on mabodo’s implementation: https://github.com/mabodo/sibilizador
Strong vowels are
a-e-o
Weak vowels are
i-u
The following rules are applied:
Rule |
Description |
---|---|
v |
The smallest syllabe is formed by one vowel. |
V+ - V+ |
Two vowels are separated if both are strong vowels. |
V-V+ and V-V- |
Two vowels are not separated if one is strong and the other |
CV |
is weak nor if both are weak. Most common syllable in Spanish is the one that has a |
C-C |
consonant and a vowel. Two consonants joined are usually separated. |
CC/C = l,r |
Two join consonants are maintained joint if the second is an l or r. |
CC/ = ch,ll,rr |
Two consonants are joined if they represent the sounds ch, ll,rr. |
C-CC |
If three consonants are joined, the first one is separated from the rest. |
CC-C/CsC |
In the situation of three joined consonants, the first two are separated from the last one if the one in the middle |
CC-CC |
is an s. If four consonants are joined, they are halved. |
-
class
TRUNAJOD.syllabizer.
CharLine
(word)¶ Auxiliary object to set char types on a word.
A word string is processed and converted into a char sequence, consisting on consonants, vowels that are used to apply rules for syllabizing Spanish words. This is a helper class used by the Syllabizator class and it is unlikely the user will need to explicitly instanitate an object of this class.
-
static
char_type
(char: str) → str¶ Get char type (vowel, consonant, etc).
This method checks a
char
type based on syllabization rules. If thechar
is inSTRONG_VOWELS
this returns'V'
. If thechar
is inWEAK_VOWELS
it will return'v'
. If thechar
is an'x'
or's'
it will return'x'
and's'
respectively. Otherwise it will return'c'
representing a consonant.- Parameters
char (string) – Char from were to get the type
- Returns
Char type
- Return type
string
-
find
(finder: str) → int¶ Find string occurrence in the type representation.
- Parameters
finder (string) – String to be searched
- Returns
Position of occurrence of the finder
- Return type
int
-
split
(pos: int, where: int) → [~CharLine, ~CharLine]¶ Split the object into two Charline objects.
-
static
-
class
TRUNAJOD.syllabizer.
Syllabizer
¶ Syllabizer class to process syllables from a word.
It has methods that take a word split it into syllables using different rules. This class is mainly used for counting syllables.
-
static
number_of_syllables
(word: str) → int¶ Return number of sillables of a word.
- Parameters
word (string) – Word to be processed
- Returns
Syllable count for the word.
- Return type
int
-
static