Syllabizer

Syllabizer module.

This syllabizator is for spanish texts. It is based on http://sramatic.tripod.com/silabas.html

And based on mabodo’s implementation: https://github.com/mabodo/sibilizador

  • Strong vowels are a-e-o

  • Weak vowels are i-u

The following rules are applied:

Rule

Description

v

The smallest syllabe is formed by one vowel.

V+ - V+

Two vowels are separated if both are strong vowels.

V-V+ and V-V-

Two vowels are not separated if one is strong and the other

CV

is weak nor if both are weak. Most common syllable in Spanish is the one that has a

C-C

consonant and a vowel. Two consonants joined are usually separated.

CC/C = l,r

Two join consonants are maintained joint if the second is an l or r.

CC/ = ch,ll,rr

Two consonants are joined if they represent the sounds ch, ll,rr.

C-CC

If three consonants are joined, the first one is separated from the rest.

CC-C/CsC

In the situation of three joined consonants, the first two are separated from the last one if the one in the middle

CC-CC

is an s. If four consonants are joined, they are halved.

class TRUNAJOD.syllabizer.CharLine(word)

Auxiliary object to set char types on a word.

A word string is processed and converted into a char sequence, consisting on consonants, vowels that are used to apply rules for syllabizing Spanish words. This is a helper class used by the Syllabizator class and it is unlikely the user will need to explicitly instanitate an object of this class.

static char_type(char: str) → str

Get char type (vowel, consonant, etc).

This method checks a char type based on syllabization rules. If the char is in STRONG_VOWELS this returns 'V'. If the char is in WEAK_VOWELS it will return 'v'. If the char is an 'x' or 's' it will return 'x' and 's' respectively. Otherwise it will return 'c' representing a consonant.

Parameters

char (string) – Char from were to get the type

Returns

Char type

Return type

string

find(finder: str) → int

Find string occurrence in the type representation.

Parameters

finder (string) – String to be searched

Returns

Position of occurrence of the finder

Return type

int

split(pos: int, where: int) → [~CharLine, ~CharLine]

Split the object into two Charline objects.

Parameters
  • pos (int) – Start position of the split

  • where (int) – End position of the split

Returns

Tuple with two charlines split

Return type

Tuple (CharLine, CharLine)

split_by(finder: str, where: int) → [~CharLine, ~CharLine]

Split charline by finder occurrence on type_char.

Parameters
  • finder (string) – Type char string

  • where (int) – End position to look for.

Returns

Split of two charlines based on match.

Return type

Tuple (CharLine, CharLine)

class TRUNAJOD.syllabizer.Syllabizer

Syllabizer class to process syllables from a word.

It has methods that take a word split it into syllables using different rules. This class is mainly used for counting syllables.

static number_of_syllables(word: str) → int

Return number of sillables of a word.

Parameters

word (string) – Word to be processed

Returns

Syllable count for the word.

Return type

int

static split(chars: TRUNAJOD.syllabizer.CharLine) → [<class 'TRUNAJOD.syllabizer.CharLine'>]

Split CharLine into syllabes.

Parameters

chars (CharLine) – Word to be syllabized

Returns

Syllabes

Return type

List [CharLine]