pt.tumba.spell
Class CommonMisspellings

java.lang.Object
  extended by pt.tumba.spell.CommonMisspellings

public final class CommonMisspellings
extends java.lang.Object

CommonMisspellings is a simple bad word to good word lookup table.

Author:
Bruno Martins
See Also:
Map

Field Summary
private  java.util.Map commonMisspellingIndex
          The lookup table for the dictionary, storing badTerm<->goodTerm relations.
private  BloomFilter correctSpellings
          A BloomFilter containing correctly spelled words.
private  java.lang.String dictionaryFileName
          The File path leading up to this dictionary.
private  java.lang.String dictionaryGoodFormsFileName
          The File path leading up to the dictionary of correct spellings.
 
Constructor Summary
CommonMisspellings(java.lang.String dictionaryFileName)
          Constructor for CommonMisspellings.
CommonMisspellings(java.lang.String dictionaryFileName, boolean compression)
          Constructor for CommonMisspellings.
CommonMisspellings(java.lang.String dictionaryFileName, java.lang.String dictionaryGoodFormsFileName)
          Constructor for CommonMisspellings.
CommonMisspellings(java.lang.String dictionaryFileName, java.lang.String dictionaryGoodFormsFileName, boolean compression)
          Constructor for CommonMisspellings.
 
Method Summary
 void cleanup()
          Cleanup the lookup table.
 java.lang.String[] find(java.lang.String pTerm)
          Search the lookup table and return the correct spellings for a given misspelled word.
private  void index(java.io.Reader in)
          This method indexes the contents of what comes in from a Reader.
private  void index(java.lang.String pBadTerm, java.lang.String pGoodTerm)
          Index a given term.
private  void indexCorrectSpellings(java.io.Reader in)
          This method indexes the contents of what comes in from a Reader.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dictionaryFileName

private java.lang.String dictionaryFileName
The File path leading up to this dictionary.


dictionaryGoodFormsFileName

private java.lang.String dictionaryGoodFormsFileName
The File path leading up to the dictionary of correct spellings.


commonMisspellingIndex

private java.util.Map commonMisspellingIndex
The lookup table for the dictionary, storing badTerm<->goodTerm relations.


correctSpellings

private BloomFilter correctSpellings
A BloomFilter containing correctly spelled words.

Constructor Detail

CommonMisspellings

public CommonMisspellings(java.lang.String dictionaryFileName)
                   throws java.lang.Exception
Constructor for CommonMisspellings.

Parameters:
dictionaryFileName - The File path leading up to this dictionary.
Throws:
java.lang.Exception

CommonMisspellings

public CommonMisspellings(java.lang.String dictionaryFileName,
                          java.lang.String dictionaryGoodFormsFileName)
                   throws java.lang.Exception
Constructor for CommonMisspellings.

Parameters:
dictionaryFileName - The File path leading up to this dictionary.
dictionaryGoodFormsFileName - The File path leading up to the dictionary of correct spellings.
Throws:
java.lang.Exception

CommonMisspellings

public CommonMisspellings(java.lang.String dictionaryFileName,
                          boolean compression)
                   throws java.lang.Exception
Constructor for CommonMisspellings.

Parameters:
dictionaryFileName - The File path leading up to this dictionary.
compression - If true, the dictionary file is compressed with the GZIP algorithm, and if false, the file is a normal text document.
Throws:
java.lang.Exception

CommonMisspellings

public CommonMisspellings(java.lang.String dictionaryFileName,
                          java.lang.String dictionaryGoodFormsFileName,
                          boolean compression)
                   throws java.lang.Exception
Constructor for CommonMisspellings.

Parameters:
dictionaryFileName - The File path leading up to this dictionary.
dictionaryGoodFormsFileName - The File path leading up to the dictionary of correct spellings.
compression - If true, the dictionary file is compressed with the GZIP algorithm, and if false, the file is a normal text document.
Throws:
java.lang.Exception
Method Detail

index

private void index(java.io.Reader in)
            throws java.lang.Exception
This method indexes the contents of what comes in from a Reader. The input is expected to be in the form of badTerm : goodTerm and rows that start with # are ignored.

Parameters:
in - The Reader from where to read.
Throws:
java.lang.Exception

indexCorrectSpellings

private void indexCorrectSpellings(java.io.Reader in)
                            throws java.lang.Exception
This method indexes the contents of what comes in from a Reader. The input is expected to be in the form of a single word per line and rows that start with # are ignored.

Parameters:
in - The Reader from where to read.
Throws:
java.lang.Exception

index

private void index(java.lang.String pBadTerm,
                   java.lang.String pGoodTerm)
Index a given term.

Parameters:
pBadTerm - The incorrect spelling for a term.
pGoodTerm - The correct spelling for a term.

find

public java.lang.String[] find(java.lang.String pTerm)
Search the lookup table and return the correct spellings for a given misspelled word.

Parameters:
pTerm - A misspelled word.
Returns:
An array with the list of correct spelling alternatives for the given word.

cleanup

public void cleanup()
Cleanup the lookup table.