类 NGramTokenizer
java.lang.Object
weka.core.tokenizers.Tokenizer
weka.core.tokenizers.CharacterDelimitedTokenizer
weka.core.tokenizers.NGramTokenizer
- 所有已实现的接口:
Serializable
,Enumeration
,OptionHandler
,RevisionHandler
Splits a string into an n-gram with min and max
grams.
Valid options are:
-delimiters <value> The delimiters to use (default ' \r\n\t.,;:'"()?!').
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
- 版本:
- $Revision: 1.4 $
- 作者:
- Sebastian Germesin (sebastian.germesin@dfki.de), FracPete (fracpete at waikato dot ac dot nz)
- 另请参阅:
-
构造器概要
构造器 -
方法概要
修饰符和类型方法说明int
Gets the max N of the NGram.int
Gets the min N of the NGram.String[]
Gets the current option settings for the OptionHandler.Returns the revision string.Returns a string describing the stemmerboolean
returns true if there's more elements availableReturns an enumeration of all the available options..static void
Runs the tokenizer with the given options and strings to tokenize.Returns N-grams and also (N-1)-grams and ....Returns the tip text for this property.Returns the tip text for this property.void
setNGramMaxSize
(int value) Sets the max size of the Ngram.void
setNGramMinSize
(int value) Sets the min size of the Ngram.void
setOptions
(String[] options) Parses a given list of options.void
Sets the string to tokenize.从类继承的方法 weka.core.tokenizers.CharacterDelimitedTokenizer
delimitersTipText, getDelimiters, setDelimiters
从类继承的方法 weka.core.tokenizers.Tokenizer
runTokenizer, tokenize
从接口继承的方法 java.util.Enumeration
asIterator
-
构造器详细资料
-
NGramTokenizer
public NGramTokenizer()
-
-
方法详细资料
-
globalInfo
Returns a string describing the stemmer- 指定者:
globalInfo
在类中Tokenizer
- 返回:
- a description suitable for displaying in the explorer/experimenter gui
-
listOptions
Returns an enumeration of all the available options..- 指定者:
listOptions
在接口中OptionHandler
- 覆盖:
listOptions
在类中CharacterDelimitedTokenizer
- 返回:
- an enumeration of all available options.
-
getOptions
Gets the current option settings for the OptionHandler.- 指定者:
getOptions
在接口中OptionHandler
- 覆盖:
getOptions
在类中CharacterDelimitedTokenizer
- 返回:
- the list of current option settings as an array of strings
-
setOptions
Parses a given list of options. Valid options are:-delimiters <value> The delimiters to use (default ' \r\n\t.,;:'"()?!').
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
- 指定者:
setOptions
在接口中OptionHandler
- 覆盖:
setOptions
在类中CharacterDelimitedTokenizer
- 参数:
options
- the list of options as an array of strings- 抛出:
Exception
- if an option is not supported
-
getNGramMaxSize
public int getNGramMaxSize()Gets the max N of the NGram.- 返回:
- the size (N) of the NGram.
-
setNGramMaxSize
public void setNGramMaxSize(int value) Sets the max size of the Ngram.- 参数:
value
- the size of the NGram.
-
NGramMaxSizeTipText
Returns the tip text for this property.- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNGramMinSize
public void setNGramMinSize(int value) Sets the min size of the Ngram.- 参数:
value
- the size of the NGram.
-
getNGramMinSize
public int getNGramMinSize()Gets the min N of the NGram.- 返回:
- the size (N) of the NGram.
-
NGramMinSizeTipText
Returns the tip text for this property.- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
hasMoreElements
public boolean hasMoreElements()returns true if there's more elements available- 指定者:
hasMoreElements
在接口中Enumeration
- 指定者:
hasMoreElements
在类中Tokenizer
- 返回:
- true if there are more elements available
-
nextElement
Returns N-grams and also (N-1)-grams and .... and 1-grams.- 指定者:
nextElement
在接口中Enumeration
- 指定者:
nextElement
在类中Tokenizer
- 返回:
- the next element
-
tokenize
Sets the string to tokenize. Tokenization happens immediately. -
getRevision
Returns the revision string.- 返回:
- the revision
-
main
Runs the tokenizer with the given options and strings to tokenize. The tokens are printed to stdout.- 参数:
args
- the commandline options and strings to tokenize
-