What is JaSpell?

JaSpell is a Java spelling checking package. It is of particular interest for developers, since it provides a set of APIs (Application Programming Interfaces) that allow one to add spelling checking to any Java Application easily. For end-users, JaSpell does not do too much besides checking things you paste into a command line.

At this time, it comes with an English and a Portuguese dictionary, together with lists of acronyms and proper names that the system is supposed to ignore. Dictionaries are regular text files where each line contains a word and an associated word frequency.

The software is employed at the tumba! Portuguese Web search engine, where it is used to support interactive spelling checking of user queries.

 People

JaSpell was developed at the XLDB group of the Department of Informatics of the Faculty of Sciences of the University of Lisbon in Portugal. It was created to support the research paper "Spelling Correction for Search Engine Queries".

JaSpell was written by Bruno Martins.





SourceForge.net Logo

 Research

JaSpell is a Java spelling checking package implemented with basis on the ternary search tree data structure proposed in Jon Bentley & Bob Sedgewick, "Ternary Search Trees".

The ternary search tree (TST) provides a fast and flexible approach for storing the dictionary -- it finds all keys having a given prefix, suffix, infix, or those keys that closely match a given pattern. One can easily search the tree for partial matches and implement near-match functions, which gives the ability to suggest alternatives for misspelled words.

To rank possible corrections for a misspelled word, we propose to use the word frequency in a large corpus as a popularity ranking, together with other heuristics such as keyboard proximity or phonetic keys provided by the Double Metaphone algorithm described in Lawrence Philips, "The Double Metaphone Search Algorithm".

 Availability

JaSpell is released under the BSD License, which basically states that you can do anything you like with it as long as you mention the authors and make it clear that the library is covered by the BSD License. It also exempts us from any liability, should this library eat your hard disc or kill your cat.

Source code, samples and detailed documentation are provided in the download. The Java API documentation is also available online.

The package is relatively simple install and run. We encourage you to try it out and let us know of any problems you find. We would also be very happy to hear from people who are using this software package.