public final class ICUTokenizer
extends org.apache.lucene.analysis.Tokenizer
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the ICUTokenizerConfig
ICUTokenizerConfig
Constructor and Description |
---|
ICUTokenizer(java.io.Reader input)
Construct a new ICUTokenizer that breaks text into words from the given
Reader.
|
ICUTokenizer(java.io.Reader input,
ICUTokenizerConfig config)
Construct a new ICUTokenizer that breaks text into words from the given
Reader, using a tailored BreakIterator configuration.
|
Modifier and Type | Method and Description |
---|---|
void |
end() |
boolean |
incrementToken() |
void |
reset() |
void |
reset(java.io.Reader input) |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public ICUTokenizer(java.io.Reader input)
The default script-specific handling is used.
input
- Reader containing text to tokenize.DefaultICUTokenizerConfig
public ICUTokenizer(java.io.Reader input, ICUTokenizerConfig config)
input
- Reader containing text to tokenize.config
- Tailored BreakIterator configurationpublic boolean incrementToken() throws java.io.IOException
incrementToken
in class org.apache.lucene.analysis.TokenStream
java.io.IOException
public void reset() throws java.io.IOException
reset
in class org.apache.lucene.analysis.TokenStream
java.io.IOException
public void reset(java.io.Reader input) throws java.io.IOException
reset
in class org.apache.lucene.analysis.Tokenizer
java.io.IOException
public void end() throws java.io.IOException
end
in class org.apache.lucene.analysis.TokenStream
java.io.IOException
Copyright © 2000-2024 Apache Software Foundation. All Rights Reserved.