I ran a simple test to compare the Radix Tree implementation I found on the web with the Trie implementation that I coded from my last post along with a binary search. I loaded all of these structures with over 50,000 random strings, and then performed 5000 prefix searches on each. Below are the results:
5000 prefix searches took the following amount of time…
Binary Search: 6 ms (1.2 microseconds per search)
Trie Search: 38 ms (7.6 microseconds per search)
Radix Trie Search: 59 ms (11.8 microseconds per search)
Overall, they’re all pretty decent solutions. But as far as speed goes, it looks like the binary search algorithm is the winner here, with my humble Trie structure coming in second, and the Radix Trie implementation coming in last. This just goes to show you that sometimes the simplest implementations are the best.
The code that I ran my test with is shown below.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Random;
import java.util.Set;
import java.util.TreeSet;
import ds.tree.RadixTree;
import ds.tree.RadixTreeImpl;
public class Main {
Set<String> valueSet = new TreeSet<String>();
Trie<String> trie = new TrieImpl<String>();
RadixTree<String> radixTree = new RadixTreeImpl<String>();
/**
* Load up the trie with lots of random strings
*/
public void loadTrie() {
// Load the Trie and a Set
for (int i = 0; i < 51000; i++) {
String val = getRandomString();
valueSet.add(val);
trie.add(val, val);
}
// Load the RadixTree
String[] strArry = new String[valueSet.size()];
valueSet.toArray(strArry);
for (int i = 0; i < strArry.length; i++) {
radixTree.insert(strArry[i], strArry[i]);
}
}
/**
* Test a prefix search
*/
public void testSearch() {
String[] strArry = new String[valueSet.size()];
valueSet.toArray(strArry);
// Test Trie Access
long start = System.currentTimeMillis();
for(int i = 0; i < 5000; i++) {
trie.search(strArry[i].substring(0, 3));
}
long end = System.currentTimeMillis();
long duration = end - start;
System.out.println("Trie Access = " + duration);
// Test RadixTree Access
start = System.currentTimeMillis();
for(int i = 0; i < 5000; i++) {
radixTree.searchPrefix(strArry[i].substring(0, 3), 10000);
}
end = System.currentTimeMillis();
duration = end - start;
System.out.println("RadixTrie Access = " + duration);
// Test Binary Search Access
start = System.currentTimeMillis();
for(int i = 0; i < 5000; i++) {
List<String> list = new ArrayList<String>();
String prefix = strArry[i].substring(0, 3);
int pos = Arrays.binarySearch(strArry, prefix);
while (pos >= 0 && pos < strArry.length) {
if (strArry[pos].startsWith(prefix)) {
list.add(strArry[pos]);
} else {
break;
}
pos++;
}
}
end = System.currentTimeMillis();
duration = end - start;
System.out.println("Binary Search Access = " + duration);
}
/**
* Returns a random string with length between
* 5 and 10.
*/
private String getRandomString() {
StringBuilder sb = new StringBuilder();
int len = getRandomNumber(5, 10);
for (int i = 0; i < len; i++) {
sb.append(getRandomChar());
}
return sb.toString();
}
/**
* Returns a random number between start and end
*/
private int getRandomNumber(int start, int end) {
//get the range, casting to long to avoid overflow problems
long range = (long)end - (long)start + 1;
// compute a fraction of the range, 0 <= frac < range
long fraction = (long)(range * new Random().nextDouble());
return (int)(fraction + start);
}
/**
* Returns a random letter of the alphabet.
*/
private char getRandomChar() {
String alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ";
int pos = (int)(Math.random()*26);
return alphabet.charAt(pos);
}
public static void main(String[] args) {
Main main = new Main();
main.loadTrie();
main.testSearch();
}
}


#1 by Steve on September 7, 2009 - 5:51 am
Your binary search code above has a bug.
You need to change:
int pos = Arrays.binarySearch(strArry, prefix);
to
int pos = Arrays.binarySearch(strArry, prefix);
if( pos < 0 ) { pos = -pos – 1; }
Else you never match any results.
However, your Trie code is very inefficient, which is the reason why it's slower than bsearch. A simple change to the getValuesFromNode function makes your implementation 10x faster:
private void getValuesFromNode(TrieNode currNode, List valueList)
{
if (currNode.isTerminal())
{
valueList.add(currNode.getNodeValue());
}
Collection<TrieNode> children = currNode.getChildNodes();
if( children != null )
{
for( TrieNode childNode : children)
{
getValuesFromNode( childNode, valueList);
}
}
}
And in TreeNode add a new variable Collection<TreeNode> childNodes, and change setChildren to:
public void setChildren(Map<Character, TrieNode> children)
{
this.children = children;
childNodes = children.values();
}
#2 by Steve on September 7, 2009 - 6:03 am
Actually there’s a bug in my code too
Instead of changing setChildren, add a method
Collection getChildNodes()
{
return children.values();
}
#3 by stevedaskam on September 7, 2009 - 10:13 pm
Thanks, Steve!! I had a feeling that my implementation could be improved upon. The Trie structure should be faster than the Binary Search. I will apply your changes to the code, and run the tests again.
I will post the results when complete.
Thanks for taking the time to look at this.
Steve
#4 by stevedaskam on September 9, 2009 - 8:24 pm
OK, I applied your changes, and it improved the Trie implementation a lot. I also had to change the addNode( ) method in TrieImpl so that the test data in Main.java would get added to the new collection.
Next, I ran the two test classes (TrieTest.java to make sure that everything still worked correctly after my changes, and Main.java to do the performance tests). The results for 5000 prefix searches are below:
Trie 28 ms
Binary Search 8 ms
So, it dropped the lookup time for the Trie by about 10 ms, but it still does not beat the Binary Search.
One side note: I noticed initially that these changes did cause the Trie implementation to appear faster than the Binary Search, but then I discovered that there was no data loaded into the Trie, and thus I had to update the addNode( ) method to add the data to the new collection variable.
Thanks for your input. I will continue to look for ways to improve the implementation.
#5 by Steve on September 10, 2009 - 9:40 am
Glat to hear it. I would have thought some other ways to eek some more speed would be:
1) change the TrieNode datastructure from a HashMap to a straight array (whereby you lookup the character getNumericValue as the index) – this should cut lookup speed slightly, but you lose flexibility as you’d probably restrict yourself to just ascii chars in the array.
2) initialise the ArrayList to some reasonable default size (this would improve all the speeds, so relatively everything would stay the same).
#6 by Pierre on October 8, 2010 - 5:28 am
Hi Steve,
I’m really interested in something really fast.
Can you please post your latest src code and findings between binary and trie search?
BTW. Do you know how grammar corrections suggestion per sentence level work and a full stop has been entered?
Thnx,
Pierre
#7 by stevedaskam on October 8, 2010 - 7:24 am
Hi Pierre,
It’s been a good while since I looked at the code, and while it did improve some, I never did find a way to make it faster than the binary search which ran about 6 to 8 ms. It’s hard to beat the binary search algorithm!!
Anyway, I am still interested in exploring other solutions. I will post my somewhat improved code, assuming I can find it on my machine somewhere
Thanks,
Steve
#8 by David on April 16, 2011 - 3:00 am
Hi Steve,
This was very helpful. Did you ever manage to update your Trie code files with the suggestions pointed out in the above comments? Please let me know. Thanks.
#9 by Jim Wright (@shipwreckman) on April 5, 2012 - 11:14 pm
This is very interesting and I appreciate your thoroughness with following up on suggestions. Also I realise this is old.
I don’t know the answer but I think it’s important to remember that Tries may be faster for insert operations, than a sorted arraylist. As I say, I don’t know, but it could matter.