|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object pt.tumba.spell.TernarySearchTrie
public class TernarySearchTrie
Implementation of a Ternary Search Trie, a data structure for storing String
objects
that combines the compact size of a binary search tree with the speed of a digital search trie, and is
therefore ideal for practical use in sorting and searching data.
This data structure is faster than hashing for many typical search problems, and supports a broader range of useful problems and operations. Ternary searches are faster than hashing and more powerful, too.
The theory of ternary search trees was described at a symposium in 1997 (see "Fast Algorithms for Sorting and Searching Strings," by J.L. Bentley and R. Sedgewick, Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, January 1997). Algorithms in C, Third Edition, by Robert Sedgewick (Addison-Wesley, 1998) provides yet another view of ternary search trees.
Nested Class Summary | |
---|---|
protected class |
TernarySearchTrie.TSTNode
An inner class of Ternary Search Trie that represents a node in the trie. |
Field Summary | |
---|---|
private int |
defaultNumReturnValues
The default number of values returned by the matchAlmost method. |
private int |
matchAlmostDiff
the number of differences allowed in a call to the matchAlmostKey method. |
private TernarySearchTrie.TSTNode |
rootNode
The base node in the trie. |
Constructor Summary | |
---|---|
TernarySearchTrie()
Constructs an empty Ternary Search Trie. |
|
TernarySearchTrie(java.io.File file)
Constructs a Ternary Search Trie and loads data from a File into the Trie. |
|
TernarySearchTrie(java.io.File file,
boolean compression)
Constructs a Ternary Search Trie and loads data from a File into the Trie. |
Method Summary | |
---|---|
private static int |
compareCharsAlphabetically(int cCompare2,
int cRef)
Compares characters by alfabetical order. |
private void |
deleteNode(TernarySearchTrie.TSTNode nodeToDelete)
Deletes the node passed in as an argument. |
private TernarySearchTrie.TSTNode |
deleteNodeRecursion(TernarySearchTrie.TSTNode currentNode)
Recursivelly visits each node to be deleted. |
java.lang.Object |
get(java.lang.String key)
Retrieve the object indexed by a key. |
java.lang.Integer |
getAndIncrement(java.lang.String key)
Retrieve the Integer indexed by key, increment it by one unit
and store the new Integer . |
protected java.lang.String |
getKey(TernarySearchTrie.TSTNode node)
Returns the key that indexes the node argument. |
TernarySearchTrie.TSTNode |
getNode(java.lang.String key)
Returns the node indexed by key, or null if that node doesn't exist. |
protected TernarySearchTrie.TSTNode |
getNode(java.lang.String key2,
TernarySearchTrie.TSTNode startNode)
Returns the node indexed by key, or null if that node doesn't exist. |
protected TernarySearchTrie.TSTNode |
getOrCreateNode(java.lang.String key)
Returns the node indexed by key, creating that node if it doesn't exist, and creating any required intermediate nodes if they don't exist. |
java.util.List |
matchAlmost(java.lang.String key)
Returns a List of keys that almost match the argument key. |
protected java.util.List |
matchAlmost(java.lang.String key,
int numReturnValues)
Returns a List of keys that almost match the argument key. |
private java.util.List |
matchAlmostRecursion(TernarySearchTrie.TSTNode currentNode,
int charIndex,
int d,
java.lang.String matchAlmostKey,
int matchAlmostNumReturnValues,
java.util.List matchAlmostResult2,
boolean upTo)
Recursivelly vists the nodes in order to find the ones that almost match a given key. |
java.util.List |
matchPrefix(java.lang.String prefix)
Returns an alphabetical List of all keys in the trie that begin with a given prefix. |
java.util.List |
matchPrefix(java.lang.String prefix,
int numReturnValues)
Returns an alphabetical List of all keys in the trie that begin with a
given prefix. |
int |
numDataNodes()
Returns the number of nodes in the trie that have non-null data. |
protected int |
numDataNodes(TernarySearchTrie.TSTNode startingNode)
Returns the number of nodes in the subtrie below and including the starting node. |
int |
numNodes()
Returns the total number of nodes in the trie. |
protected int |
numNodes(TernarySearchTrie.TSTNode startingNode)
Returns the total number of nodes in the subtrie below and including the starting Node. |
void |
put(java.lang.String key,
java.lang.Object value)
Stores a value in the trie. |
private int |
recursiveNodeCalculator(TernarySearchTrie.TSTNode currentNode,
boolean checkData,
int numNodes2)
Recursivelly visists each node to calculate the number of nodes. |
void |
remove(java.lang.String key)
Removes the value indexed by key. |
void |
setMatchAlmostDiff(int diff)
Sets the number of characters by which words can differ from target word when calling the matchAlmost method. |
void |
setNumReturnValues(int num)
Sets the default maximum number of values returned from the matchPrefix and
matchAlmost methods. |
protected java.util.List |
sortKeys(TernarySearchTrie.TSTNode startNode,
int numReturnValues)
Returns keys sorted in alphabetical order. |
private java.util.List |
sortKeysRecursion(TernarySearchTrie.TSTNode currentNode,
int sortKeysNumReturnValues,
java.util.List sortKeysResult2)
Returns keys sorted in alphabetical order. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private int defaultNumReturnValues
matchAlmost
method.
private int matchAlmostDiff
matchAlmostKey
method.
private TernarySearchTrie.TSTNode rootNode
Constructor Detail |
---|
public TernarySearchTrie()
public TernarySearchTrie(java.io.File file) throws java.io.IOException
File
into the Trie.
The file is a normal text document, where each line is of the form
word : integer.
file
- The File
with the data to load into the Trie.
java.io.IOException
- A problem occured while reading the data.public TernarySearchTrie(java.io.File file, boolean compression) throws java.io.IOException
File
into the Trie.
The file is a normal text document, where each line is of the form " word : integer".
file
- The File
with the data to load into the Trie.compression
- If true, the file is compressed with the GZIP algorithm, and if false,
the file is a normal text document.
java.io.IOException
- A problem occured while reading the data.Method Detail |
---|
private static int compareCharsAlphabetically(int cCompare2, int cRef)
cCompare2
- The first char in the comparison.cRef
- The second char in the comparison.
private void deleteNode(TernarySearchTrie.TSTNode nodeToDelete)
nodeToDelete
- The node to delete.private TernarySearchTrie.TSTNode deleteNodeRecursion(TernarySearchTrie.TSTNode currentNode)
null
.
The TSTNode instance returned by this method will be next node to
be operated on by deleteNodeRecursion
(This emulates recursive
method call while avoiding the JVM overhead normally associated
with a recursive method.)
currentNode
- The node to delete.
public java.lang.Object get(java.lang.String key)
key
- A String
index.
public java.lang.Integer getAndIncrement(java.lang.String key)
Integer
indexed by key, increment it by one unit
and store the new Integer
.
key
- A String
index.
integer
retrieved from the Ternary Search Trie.protected java.lang.String getKey(TernarySearchTrie.TSTNode node)
node
- The node whose index is to be calculated.
String
that indexes the node argument.public TernarySearchTrie.TSTNode getNode(java.lang.String key)
null
if that node doesn't exist.
Search begins at root node.
key
- A String
that indexes the node that is returned.
TernarySearchTrie.TSTNode
.protected TernarySearchTrie.TSTNode getNode(java.lang.String key2, TernarySearchTrie.TSTNode startNode)
null
if that node doesn't exist.
The search begins at root node.
key2
- A String
that indexes the node that is returned.startNode
- The top node defining the subtrie to be searched.
TernarySearchTrie.TSTNode
.protected TernarySearchTrie.TSTNode getOrCreateNode(java.lang.String key) throws java.lang.NullPointerException, java.lang.IllegalArgumentException
key
- A String
that indexes the node that is returned.
TernarySearchTrie.TSTNode
.
java.lang.NullPointerException
- If the key is null
.
java.lang.IllegalArgumentException
- If the key is an empty String
.public java.util.List matchAlmost(java.lang.String key)
List
of keys that almost match the argument key. Keys returned
will have exactly diff characters that do not match the target key,
where diff is equal to the last value passed in as an argument to the
setMatchAlmostDiff
method.
If the matchAlmost
method is called before the setMatchAlmostDiff
method has
been called for the first time, then diff = 0.
key
- The target key.
List
with the results.protected java.util.List matchAlmost(java.lang.String key, int numReturnValues)
List
of keys that almost match the argument key. Keys returned
will have exactly diff characters that do not match the target key,
where diff is equal to the last value passed in as an argument to the
setMatchAlmostDiff
method.
If the matchAlmost
method is called before the setMatchAlmostDiff
method has
been called for the first time, then diff = 0.
key
- The target key.numReturnValues
- The maximum number of values returned by this method.
List
with the resultsprivate java.util.List matchAlmostRecursion(TernarySearchTrie.TSTNode currentNode, int charIndex, int d, java.lang.String matchAlmostKey, int matchAlmostNumReturnValues, java.util.List matchAlmostResult2, boolean upTo)
currentNode
- The current node.charIndex
- The current char.d
- The number of differences so far.matchAlmostNumReturnValues
- The maximum number of values in the result List
.matchAlmostResult2
- The results so far.upTo
- If true all keys having up to and including matchAlmostDiff
mismatched letters will be included in the result (including
a key that is exactly the same as the target string) otherwise
keys will be included in the result only if they have exactly
matchAlmostDiff number of mismatched letters.matchAlmostKey
- The key being searched.
List
with the results.public java.util.List matchPrefix(java.lang.String prefix)
List
of all keys in the trie that begin with a given prefix.
Only keys for nodes having non-null data are included in the List
.
prefix
- Each key returned from this method will begin with the characters in prefix.
List
with the results.public java.util.List matchPrefix(java.lang.String prefix, int numReturnValues)
List
of all keys in the trie that begin with a
given prefix. Only keys for nodes having non-null data are included in the List
.
prefix
- Each key returned from this method will begin with the characters in prefix.numReturnValues
- The maximum number of values returned from this method.
List
with the resultspublic int numDataNodes()
protected int numDataNodes(TernarySearchTrie.TSTNode startingNode)
startingNode
- The top node of the subtrie. the node that defines the subtrie.
public int numNodes()
protected int numNodes(TernarySearchTrie.TSTNode startingNode)
startingNode
- The top node of the subtrie. The node that defines the subtrie.
public void put(java.lang.String key, java.lang.Object value)
key
- A String
that indexes the object to be stored.value
- The object to be stored in the Trie.private int recursiveNodeCalculator(TernarySearchTrie.TSTNode currentNode, boolean checkData, int numNodes2)
currentNode
- The current node.checkData
- If true we check the data to be different of null
.numNodes2
- The number of nodes so far.
public void remove(java.lang.String key)
key
- A string
that indexes the object to be removed from the Trie.public void setMatchAlmostDiff(int diff)
matchAlmost
method.
Arguments less than 0 will set the char difference to 0, and arguments greater than 3 will set the char difference to 3.
diff
- The number of characters by which words can differ from target word.public void setNumReturnValues(int num)
matchPrefix
and
matchAlmost
methods.
The value should be set this to -1 to get an unlimited number of return values. note that the methods mentioned above provide overloaded versions that allow you to specify the maximum number of return values, in which case this value is temporarily overridden.
num
- The number of values that will be returned when calling the
methods above.protected java.util.List sortKeys(TernarySearchTrie.TSTNode startNode, int numReturnValues)
The number of keys returned is limited to numReturnValues. To get a list that isn't limited in size, set numReturnValues to -1.
startNode
- The top node defining the subtrie to be searched.numReturnValues
- The maximum number of values returned from this method.
List
with the results.private java.util.List sortKeysRecursion(TernarySearchTrie.TSTNode currentNode, int sortKeysNumReturnValues, java.util.List sortKeysResult2)
Sorted keys will be appended to the end of the resulting List
. The result may be
empty when this method is invoked, but may not be null
.
currentNode
- The current node.sortKeysNumReturnValues
- The maximum number of values in the result.sortKeysResult2
- The results so far.
List
with the results.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |