net.percederberg.grammatica.parser
Class Tokenizer

java.lang.Object
  |
  +--net.percederberg.grammatica.parser.Tokenizer

public class Tokenizer
extends java.lang.Object

A character stream tokenizer. This class groups the characters read from the stream together into tokens ("words"). The grouping is controlled by token patterns that contain either a fixed string to search for, or a regular expression. If the stream of characters don't match any of the token patterns, a parse exception is thrown.

Version:
1.0
Author:
Per Cederberg,

Constructor Summary
Tokenizer(java.io.Reader in)
          Creates a new tokenizer for the specified input stream.
 
Method Summary
 void addPattern(TokenPattern pattern)
          Adds a new token pattern to the tokenizer.
 int getCurrentColumn()
          Returns the current column number.
 int getCurrentLine()
          Returns the current line number.
 TokenPattern getPattern(int id)
          Returns the token pattern with the specified id.
 Token next()
          Finds the next token on the stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Tokenizer

public Tokenizer(java.io.Reader in)
Creates a new tokenizer for the specified input stream.

Parameters:
in - the input stream to read
Method Detail

getPattern

public TokenPattern getPattern(int id)
Returns the token pattern with the specified id.

Parameters:
id - the token pattern id
Returns:
the token pattern found, or null if not present

getCurrentLine

public int getCurrentLine()
Returns the current line number. This number will be the line number of the next token returned.

Returns:
the current line number

getCurrentColumn

public int getCurrentColumn()
Returns the current column number. This number will be the column number of the next token returned.

Returns:
the current column number

addPattern

public void addPattern(TokenPattern pattern)
                throws ParserCreationException
Adds a new token pattern to the tokenizer. The pattern will be added last in the list, choosing a previous token pattern in case two matches the same string.

Parameters:
pattern - the pattern to add
Throws:
ParserCreationException - if the pattern couldn't be added to the tokenizer

next

public Token next()
           throws ParseException
Finds the next token on the stream. This method will return null when end of file has been reached. It will return a parse exception if no token matched the input stream, or if a token pattern with the error flag set matched. Any tokens matching a token pattern with the ignore flag set will be silently ignored and the next token will be returned.

Returns:
the next token found, or null if end of file was encountered
Throws:
ParseException - if the input stream couldn't be read or parsed correctly