pt.tumba.spell
Class TeXWordFinder

java.lang.Object
  extended by pt.tumba.spell.DefaultWordFinder
      extended by pt.tumba.spell.TeXWordFinder

public class TeXWordFinder
extends DefaultWordFinder

A word finder for TeX and LaTeX documents, which searches text for sequences of letters, but ignores any commands and environments as well as Math environments.

Author:
Bruno Martins
See Also:
DefaultWordFinder

Field Summary
private  boolean IGNORE_COMMENTS
          Boolean flag indicating if TeX comments should be ignored.
static int REG_EXPR
          Constant value specifying regular expressions on user defined ignores.
private  int regexUserDefinedIgnores
          An integer specifying the type of expression to use.
static int STRING_EXPR
          Constant value specifying strings on user defined ignores.
private  java.util.Set userDefinedIgnores
          A Set of user defined ignores.
 
Fields inherited from class pt.tumba.spell.DefaultWordFinder
currentSegmentPos, currentWord, currentWordPos, nextSegmentPos, nextWord, nextWordPos, sentenceIterator, solveHardCases, startsSentence, text
 
Constructor Summary
TeXWordFinder()
          Constructor for TexWordFinder.
TeXWordFinder(java.lang.String inText)
          Constructor for TeXWordFinder.
 
Method Summary
 void addUserDefinedIgnores(java.util.Collection expressions, int regex)
          This method is used to import a user defined set of either strings or regular expressions to ignore.
 java.lang.String currentSegment()
          Returns the current text segment from the input.
private  int ignoreUserDefined(int i)
          User defined ignore.
 java.lang.String next()
          This method scans the text from the end of the last word, and returns a String corresponding to the next word.
 void setIgnoreComments(boolean ignore)
          Allows one to indicate if TeX comments should be ignored.
 
Methods inherited from class pt.tumba.spell.DefaultWordFinder
current, getText, hasNext, ignore, ignore, ignore, ignore, isWordChar, isWordChar, lookAhead, nextSegment, replace, replaceBigram, replaceSegment, setText, splitSegments, splitWords, startsSentence, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

IGNORE_COMMENTS

private boolean IGNORE_COMMENTS
Boolean flag indicating if TeX comments should be ignored.


userDefinedIgnores

private java.util.Set userDefinedIgnores
A Set of user defined ignores.


regexUserDefinedIgnores

private int regexUserDefinedIgnores
An integer specifying the type of expression to use. e.g. REG_EXPR, STRING_EXPR.


STRING_EXPR

public static final int STRING_EXPR
Constant value specifying strings on user defined ignores.

See Also:
Constant Field Values

REG_EXPR

public static final int REG_EXPR
Constant value specifying regular expressions on user defined ignores.

See Also:
Constant Field Values
Constructor Detail

TeXWordFinder

public TeXWordFinder(java.lang.String inText)
Constructor for TeXWordFinder.

Parameters:
inText - A String with the input text to tokenize.

TeXWordFinder

public TeXWordFinder()
Constructor for TexWordFinder.

Method Detail

currentSegment

public java.lang.String currentSegment()
Returns the current text segment from the input. A segment is defined as the character sequence between the current position and the next non-alphanumeric character, considering also white spaces.

Overrides:
currentSegment in class DefaultWordFinder
Returns:
A String with the current text segment.

next

public java.lang.String next()
This method scans the text from the end of the last word, and returns a String corresponding to the next word. If there are no more words to return, it retuns a null String.

Overrides:
next in class DefaultWordFinder
Returns:
the next word.

addUserDefinedIgnores

public void addUserDefinedIgnores(java.util.Collection expressions,
                                  int regex)
This method is used to import a user defined set of either strings or regular expressions to ignore.

Parameters:
expressions - a collection of of Objects whose toString() value should be the expression. Typically String objects.
regex - is an integer specifying the type of expression to use. e.g. REG_EXPR, STRING_EXPR.

ignoreUserDefined

private int ignoreUserDefined(int i)
User defined ignore.

Parameters:
i -
Returns:

setIgnoreComments

public void setIgnoreComments(boolean ignore)
Allows one to indicate if TeX comments should be ignored.

Parameters:
ignore - true if TeX comments should be ignored and false otherwise.