public interface LanguageProfile
It is built from a training text that should be fairly large and clean.
It contains the n-grams from the training text in the desired gram sizes (eg 2 and 3-grams),
with possible text filters applied for cleaning. Also, rarely occurring n-grams may have been cut to
reduce the noise and index size. Use a LanguageProfileBuilder
.
The profile may be created at runtime on-the-fly, or it may be loaded from a previously generated text file (see OldLangProfileConverter).
Modifier and Type | Method and Description |
---|---|
int |
getFrequency(String gram) |
@NotNull List<Integer> |
getGramLengths()
Tells what the n in n-grams are used here.
|
@NotNull LdLocale |
getLocale() |
long |
getMaxGramCount(int gramLength)
Tells how often the n-gram with the highest amount of occurrences used in this profile occurred.
|
long |
getMinGramCount(int gramLength)
Tells how often the n-gram with the lowest amount of occurrences used in this profile occurred.
|
long |
getNumGramOccurrences(int gramLength)
Tells how often all n-grams of a certain length occurred, combined.
|
int |
getNumGrams()
Tells how many n-grams there are for all n-gram sizes combined.
|
int |
getNumGrams(int gramLength)
Tells how many different n-grams there are for a certain n-gram size.
|
@NotNull Iterable<Map.Entry<String,Integer>> |
iterateGrams()
Iterates all ngram strings with frequency.
|
@NotNull Iterable<Map.Entry<String,Integer>> |
iterateGrams(int gramLength)
Iterates all gramLength-gram strings with frequency.
|
@NotNull @NotNull LdLocale getLocale()
@NotNull @NotNull List<Integer> getGramLengths()
int getFrequency(String gram)
gram
- for example "a" or "foo".int getNumGrams(int gramLength)
gramLength
- 1-nint getNumGrams()
long getNumGramOccurrences(int gramLength)
getNumGrams(int)
.gramLength
- 1-nlong getMinGramCount(int gramLength)
gramLength
- 1-nlong getMaxGramCount(int gramLength)
gramLength
- 1-n@NotNull @NotNull Iterable<Map.Entry<String,Integer>> iterateGrams()
Copyright © 2022. All rights reserved.