org.apache.lucene.analysis.standard
public class StandardTokenizer extends Tokenizer implements StandardTokenizerConstants
This should be a good tokenizer for most European-language documents:
Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.
| Field Summary | |
|---|---|
| Token | jj_nt |
| Token | token |
| StandardTokenizerTokenManager | token_source |
| Constructor Summary | |
|---|---|
| StandardTokenizer(Reader reader) Constructs a tokenizer for this Reader. | |
| StandardTokenizer(CharStream stream) | |
| StandardTokenizer(StandardTokenizerTokenManager tm) | |
| Method Summary | |
|---|---|
| void | disable_tracing() |
| void | enable_tracing() |
| ParseException | generateParseException() |
| Token | getNextToken() |
| Token | getToken(int index) |
| Token | next() Returns the next token in the stream, or null at EOS.
|
| void | ReInit(CharStream stream) |
| void | ReInit(StandardTokenizerTokenManager tm) |
The returned token's type is set to an element of tokenImage.