Prefix Trees and understanding Autocomplete

Shashwat Prajjwal
5 min readMay 14, 2020

Autocompletes have been around for so long that we don’t even notice them anymore. From the URL bar on your browser, to the search bar on your favorite apps to autocomplete on your phone keyboard, some form of autocomplete is always there to facilitate your typing.

Now, if this somehow doesn’t seem familiar to you, let me rekindle your memory about autocomplete. The idea behind autocomplete is that you enter a part of a word and the underlying software predicts what you are trying to type. Sounds pretty simple right?

It indeed is pretty simple and uses a data structure called Trees. For those interested in the technical definition of trees, ‘a tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the “children”), with the constraints that no reference is duplicated, and none points to the root.’

YEAH, BUT WHAT EXACTLY ARE TREES?

So, a tree is something very similar to family trees people learn about in middle school. It starts out with maybe your grandparents and has your parents and their siblings on the second level. Next level consists of you, your siblings and all your cousins.

Trees in computers are very similar. You start with one item at the top that can have none to many children. One of the most popular trees are Binary Trees where each item has at most 2 children. Some other requirements of Binary trees are that every item on the left side of an item is less than it and every item on the right side is greater. It also doesn’t have any duplicates.

Now just like how family trees have people at every level, Trees in computer science have Nodes. Nodes are individual items that have other nodes as children. Each Node can access it’s children who can in turn access their children. This is how we traverse a tree.

Okay, but what does this have to do with autocomplete?

Well, there are various kinds of trees that computer scientists can use. The one we are gonna look at is called Prefix Tree or Trie. Tries are very interesting trees where you get a character at each level.

Over here, we see that N and S are first children nodes. If we traverse down from N, we can find two paths at E. Either path will give us a complete word [NEED, NESTED]. What’s interesting is that if we were autocompleting for NES, we’d instantly know that out of the 4 words saved in this tree[NEED, NESTED, SEED, SPEED], the word we were looking for was NESTED. What’s even more interesting is that we did not have to go through every word and check if it had NES as it’s first three characters. We simply traversed the tree through characters N, E and S and then checked what were the possible paths we could take to the end of the tree.

This might not seem like a big advantage as one can argue that it isn’t too hard to look at all 4 words and tell exactly what word we would want to get auto completed however things get very lengthy as the number of words increase. According to wikipedia, English language has over 400k words, which means everytime we want to look for a word, we would have to look through every single word to figure out exactly which one we want to autocomplete to.

However, if we are to use a Prefix tree, the maximum number of steps we would need to find a word would be the length of the largest word in the language which is pneumonoultramicroscopicsilicovolcanoconiosis being 45 characters long. This is however a worst case scenario that isn’t going to come up very often. Most of the commonly used english words are upto 10 characters long making autocomplete using tree only require upto 10 steps compared to 400,000 that would be required if we were looking through every possible word. This is a performance boost of 40000%.

Okay, but what do I do with this information?

After learning about Prefix trees, I made a simple React component that utilizes Prefix trees to autocomplete into an input HTML tag. This component also saves up any options a user pick in the browser so that chosen options get recommended more frequently.

recommendation with apple showing towards the end
apple being selected a couple of times
apple gets recommended first

The code for this component can be found on my Github.

Sources:

Tree — Computer ScienceWiki:
https://computersciencewiki.org/index.php/Tree
Binary Tree Data Structure — GeeksforGeeks: https://www.geeksforgeeks.org/binary-tree-data-structure/
Node(Computer Science) — Wikipedia: https://en.wikipedia.org/wiki/Node_(computer_science)
Tree Traversal- Wikipedia:
https://en.wikipedia.org/wiki/Tree_traversal
Trie — Wikipedia
https://en.wikipedia.org/wiki/Trie
List of Dictionary by number of word — Wikipedia
https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words
What is the longest english word — Lexico
https://www.lexico.com/explore/what-is-the-longest-english-word

--

--