abydos.stemmer package¶
abydos.stemmer.
The stemmer package collects stemmer classes for a number of languages including:
English stemmers:
German stemmers:
Caumanns' (
Caumanns
)CLEF German (
CLEFGerman
)CLEF German Plus (
CLEFGermanPlus
)Snowball German (
SnowballGerman
)Swedish stemmers:
CLEF Swedish (
CLEFSwedish
)Snowball Swedish (
SnowballSwedish
)Latin stemmer:
Schinke (
Schinke
)Danish stemmer:
Snowball Danish (
SnowballDanish
)Dutch stemmer:
Snowball Dutch (
SnowballDutch
)Norwegian stemmer:
Snowball Norwegian (
SnowballNorwegian
)
Each stemmer has a stem
method, which takes a word and returns its stemmed
form:
>>> stmr = Porter()
>>> stmr.stem('democracy')
'democraci'
>>> stmr.stem('trusted')
'trust'
-
class
abydos.stemmer.
CLEFGerman
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
CLEF German stemmer.
The CLEF German stemmer is defined at [Sav05].
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return CLEF German stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = CLEFGerman() >>> stmr.stem('lesen') 'lese' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabier'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
CLEFGermanPlus
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
CLEF German stemmer plus.
The CLEF German stemmer plus is defined at [Sav05].
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return 'CLEF German stemmer plus' stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = CLEFGermanPlus() >>> stmr.stem('lesen') 'les' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabi'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
CLEFSwedish
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
CLEF Swedish stemmer.
The CLEF Swedish stemmer is defined at [Sav05].
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return CLEF Swedish stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = CLEFSwedish() >>> stmr.stem('undervisa') 'undervis' >>> stmr.stem('suspension') 'suspensio' >>> stmr.stem('visshet') 'viss'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
Caumanns
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Caumanns stemmer.
Jörg Caumanns' stemmer is described in his article in [Cau99].
This implementation is based on the GermanStemFilter described at [Lan13].
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return Caumanns German stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Caumanns() >>> stmr.stem('lesen') 'les' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabier'
New in version 0.2.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
Lovins
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Lovins stemmer.
The Lovins stemmer is described in Julie Beth Lovins's article [Lov68].
New in version 0.3.6.
Initialize the stemmer.
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return Lovins stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Lovins() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
New in version 0.2.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
PaiceHusk
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Paice-Husk stemmer.
Implementation of the Paice-Husk Stemmer, also known as the Lancaster Stemmer, developed by Chris Paice, with the assistance of Gareth Husk
This is based on the algorithm's description in [Pai90].
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return Paice-Husk stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = PaiceHusk() >>> stmr.stem('assumption') 'assum' >>> stmr.stem('verifiable') 'ver' >>> stmr.stem('fancies') 'fant' >>> stmr.stem('fanciful') 'fancy' >>> stmr.stem('torment') 'tor'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
Porter
(early_english: bool = False)[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Porter stemmer.
The Porter stemmer is described in [Por80].
New in version 0.3.6.
Initialize Porter instance.
- Parameters
early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
New in version 0.4.0.
-
stem
(word: str) → str[source]¶ Return Porter stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Porter() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
>>> stmr = Porter(early_english=True) >>> stmr.stem('eateth') 'eat'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.stemmer.
Porter2
(early_english: bool = False)[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Porter2 (Snowball English) stemmer.
The Porter2 (Snowball English) stemmer is defined in [Por02].
New in version 0.3.6.
Initialize Porter2 instance.
- Parameters
early_english (bool) -- Set to True in order to remove -eth & -est (2nd & 3rd person singular verbal agreement suffixes)
New in version 0.4.0.
-
stem
(word: str) → str[source]¶ Return the Porter2 (Snowball English) stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Porter2() >>> stmr.stem('reading') 'read' >>> stmr.stem('suspension') 'suspens' >>> stmr.stem('elusiveness') 'elus'
>>> stmr = Porter2(early_english=True) >>> stmr.stem('eateth') 'eat'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.stemmer.
SStemmer
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
S-stemmer.
The S stemmer is defined in [Har91].
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return the S-stemmed form of a word.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SStemmer() >>> stmr.stem('summaries') 'summary' >>> stmr.stem('summary') 'summary' >>> stmr.stem('towers') 'tower' >>> stmr.stem('reading') 'reading' >>> stmr.stem('census') 'census'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
Schinke
[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
Schinke stemmer.
This is defined in [SGRW96].
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return the stem of a word according to the Schinke stemmer.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = Schinke() >>> stmr.stem('atque') 'atque,atque' >>> stmr.stem('census') 'cens,censu' >>> stmr.stem('virum') 'uir,uiru' >>> stmr.stem('populusque') 'popul,populu' >>> stmr.stem('senatus') 'senat,senatu'
New in version 0.3.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str with the noun then verb stem, comma-separated
-
stem_dict
(word: str) → Dict[str, str][source]¶ Return the stem of a word according to the Schinke stemmer.
- Parameters
word (str) -- The word to stem
- Returns
Word stems in a dictionary
- Return type
dict
Examples
>>> stmr = Schinke() >>> stmr.stem_dict('atque') {'n': 'atque', 'v': 'atque'} >>> stmr.stem_dict('census') {'n': 'cens', 'v': 'censu'} >>> stmr.stem_dict('virum') {'n': 'uir', 'v': 'uiru'} >>> stmr.stem_dict('populusque') {'n': 'popul', 'v': 'populu'} >>> stmr.stem_dict('senatus') {'n': 'senat', 'v': 'senatu'}
New in version 0.6.0.
-
-
class
abydos.stemmer.
SnowballDanish
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Danish stemmer.
The Snowball Danish stemmer is defined at: http://snowball.tartarus.org/algorithms/danish/stemmer.html
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return Snowball Danish stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballDanish() >>> stmr.stem('underviser') 'undervis' >>> stmr.stem('suspension') 'suspension' >>> stmr.stem('sikkerhed') 'sikker'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
SnowballDutch
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Dutch stemmer.
The Snowball Dutch stemmer is defined at: http://snowball.tartarus.org/algorithms/dutch/stemmer.html
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return Snowball Dutch stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballDutch() >>> stmr.stem('lezen') 'lez' >>> stmr.stem('opschorting') 'opschort' >>> stmr.stem('ongrijpbaarheid') 'ongrijp'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
SnowballGerman
(alternate_vowels: bool = False)[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball German stemmer.
The Snowball German stemmer is defined at: http://snowball.tartarus.org/algorithms/german/stemmer.html
New in version 0.3.6.
Initialize SnowballGerman instance.
- Parameters
alternate_vowels (bool) -- Composes ae as ä, oe as ö, and ue as ü before running the algorithm
New in version 0.4.0.
-
stem
(word: str) → str[source]¶ Return Snowball German stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballGerman() >>> stmr.stem('lesen') 'les' >>> stmr.stem('graues') 'grau' >>> stmr.stem('buchstabieren') 'buchstabi'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
class
abydos.stemmer.
SnowballNorwegian
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Norwegian stemmer.
The Snowball Norwegian stemmer is defined at: http://snowball.tartarus.org/algorithms/norwegian/stemmer.html
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return Snowball Norwegian stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballNorwegian() >>> stmr.stem('lese') 'les' >>> stmr.stem('suspensjon') 'suspensjon' >>> stmr.stem('sikkerhet') 'sikker'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
SnowballSwedish
[source]¶ Bases:
abydos.stemmer._snowball._Snowball
Snowball Swedish stemmer.
The Snowball Swedish stemmer is defined at: http://snowball.tartarus.org/algorithms/swedish/stemmer.html
New in version 0.3.6.
-
stem
(word: str) → str[source]¶ Return Snowball Swedish stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str
Examples
>>> stmr = SnowballSwedish() >>> stmr.stem('undervisa') 'undervis' >>> stmr.stem('suspension') 'suspension' >>> stmr.stem('visshet') 'viss'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
-
-
class
abydos.stemmer.
UEALite
(max_word_length: int = 20, max_acro_length: int = 8, var: str = 'standard')[source]¶ Bases:
abydos.stemmer._stemmer._Stemmer
UEA-Lite stemmer.
The UEA-Lite stemmer is discussed in [JS05].
This is chiefly based on the Java implementation of the algorithm, with variants based on the Perl implementation and Jason Adams' Ruby port.
Java version: [Chu] Perl version: [JS05] Ruby version: [Ada17]
New in version 0.3.6.
Initialize UEALite instance.
- Parameters
max_word_length (int) -- The maximum word length allowed
max_acro_length (int) -- The maximum acronym length allowed
var (str) --
Variant rules to use:
standard
to use the original (Java-version) rulesAdams
to use Jason Adams' rulesPerl
to use the original Perl rules
New in version 0.4.0.
-
stem
(word: str) → str[source]¶ Return UEA-Lite stem.
- Parameters
word (str) -- The word to stem
- Returns
Word stem
- Return type
str or (str, int)
Examples
>>> stmr = UEALite() >>> stmr.stem('readings') 'read' >>> stmr.stem('insulted') 'insult' >>> stmr.stem('cussed') 'cuss' >>> stmr.stem('fancies') 'fancy' >>> stmr.stem('eroded') 'erode'
New in version 0.1.0.
Changed in version 0.3.6: Encapsulated in class
Changed in version 0.6.0: Made return a str exclusively