|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object pt.tumba.spell.SpellChecker
public class SpellChecker
The main class of the spell checking package.
Field Summary | |
---|---|
private CommonMisspellings |
commonErrors
A dictionary of common misspellings |
private TernarySearchTrie |
dictionary
The main dictionary for the spelling checker. |
private boolean |
useBigrams
Use bigrams for context dependent spelling correction |
Constructor Summary | |
---|---|
SpellChecker()
|
Method Summary | |
---|---|
java.lang.String |
findMostSimilar(java.lang.String key)
Takes a word and returns the most similar word from the dictionary, using Levenshtein Distance, Phonetic similarity, Keyboard Proximity and other heuristics to measure similarity. |
java.lang.String |
findMostSimilar(java.lang.String key,
boolean useFrequency)
Takes a word and returns the most similar word from the dictionary, using Levenshtein Distance, Phonetic similarity, Keyboard Proximity and other heuristics to measure similarity. |
java.util.List |
findMostSimilarList(java.lang.String key)
Takes a word and returns a List with similar words from the dictionary,
using Levenshtein Distance to rank words in the list. |
SpellChecker |
getInstance()
Deprecated. TODO: Remove this method and check dependencies with other code. |
private static java.lang.String |
heuristicsPortuguese(java.lang.String str)
Phonetic heuristics for the Portuguese language, taking as input a Portuguese word and replacing letters and groups of letter that correspond to a specific "sound" by a cannonical representation. |
void |
initialize(java.lang.String path)
Reads the dictionary to memory. |
void |
initialize(java.lang.String path1,
java.lang.String path2)
Reads the dictionary to memory. |
void |
initialize(java.lang.String path1,
java.lang.String path2,
java.lang.String path3)
Reads the dictionary to memory. |
static void |
main(java.lang.String[] args)
Main method. |
java.lang.String |
spellCheck(java.lang.String s)
Checks spelling errors in terms from a given String . |
java.lang.String |
spellCheckQuery(java.lang.String s)
Checks spelling errors in terms for a search engine query, ignoring commands to the search system. |
java.lang.String |
spellCheckTeX(java.lang.String s)
Checks spelling errors in terms from a TeX document. |
java.lang.String |
spellCheckWord(java.lang.String word)
Checks if a word is correctly spelled, producing as output a string with the word plus SGML tags indicating if it is correctly spelled or not. |
java.lang.String |
spellCheckXML(java.lang.String s)
Checks spelling errors in terms from an XML document. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private TernarySearchTrie dictionary
private CommonMisspellings commonErrors
private boolean useBigrams
Constructor Detail |
---|
public SpellChecker()
Method Detail |
---|
public SpellChecker getInstance()
SpellChecker
.private static java.lang.String heuristicsPortuguese(java.lang.String str)
str
- A String
with a Portuguese word.
public void initialize(java.lang.String path) throws java.lang.Exception
path
- The File
path leading up to the dictionary.
java.lang.Exception
- an Exception indicating if any problem occured while reading the dictionary.public void initialize(java.lang.String path1, java.lang.String path2) throws java.lang.Exception
path1
- The File
path leading up to the dictionary.path2
- The File
path leading up to a dictionary of common misspellings.
java.lang.Exception
- an Exception indicating if any problem occured while reading the dictionary.public void initialize(java.lang.String path1, java.lang.String path2, java.lang.String path3) throws java.lang.Exception
path1
- The File
path leading up to the dictionary.path2
- The File
path leading up to a dictionary of common misspellings.path3
- The File
path leading up to a dictionary of correct spellings.
java.lang.Exception
- an Exception indicating if any problem occured while reading the dictionary.public java.lang.String findMostSimilar(java.lang.String key)
key
- The word to check in the dictionary.
public java.lang.String findMostSimilar(java.lang.String key, boolean useFrequency)
key
- The word to check in the dictionary.useFrequency
- Use the relative frequency method.
public java.util.List findMostSimilarList(java.lang.String key)
List
with similar words from the dictionary,
using Levenshtein Distance to rank words in the list.
key
- The word to check in the dictionary.
List
of similar words from the dictionary.public java.lang.String spellCheckQuery(java.lang.String s)
s
- A String
with a search engine query.
String
with spelling errors identifyed.spellCheckWord(String)
public java.lang.String spellCheck(java.lang.String s)
String
.
s
- A String
.
String
with spelling errors identifyed.spellCheckWord(String)
public java.lang.String spellCheckTeX(java.lang.String s)
s
- A String
with the TeX document.
String
with spelling errors identifyed.spellCheckWord(String)
public java.lang.String spellCheckXML(java.lang.String s)
s
- A String
with the XML document.
String
with spelling errors identifyed.spellCheckWord(String)
public java.lang.String spellCheckWord(java.lang.String word)
SGML
tags indicating if it is correctly spelled or not.
The possible SGML
tags are:
<misspell> - The word was not found in the dictionary but a suggestion could not be generated.
<plain> - The word is correctly spelled.
<suggestion> - The word was not found in the dictionary and a suggestion was generated.
word
- The word to check.
String
with the word provided as input (or an appropriate correction)
surrounded with SGML
tags indicating if it is correctly spelled or not.public static void main(java.lang.String[] args) throws java.lang.Exception
args
- The command line input, tokenized.
java.lang.Exception
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |