Suffix trees are much faster when the text is fixed and known first while the patterns vary.
Time Complexity: O(m) for single time processing the text, then only O(n) for each new pattern.
In the rest of the post, I am going to describe what a suffix tree is, how to construct and query it.
Prefixes & Suffixes
For a string S:
- Prefix of S: substring of S beginning at the first position of S
- Suffix of S: substring that ends at its last position.
P is a substring of S if P is a prefix of some suffix of S. Suffix Trees exploit this property of strings to solve the substring pattern matching problem.
A suffix tree ST for an m-character string S is a rooted directed tree with exactly m
leaves numbered 1 to m. Each internal node, other than the root, has at least two children and each edge is labeled with a nonempty substring of S.
The key feature of the suffix tree is that for any leaf i, the concatenation of the edge-labels on the path from the root to the leaf i exactly spells out the suffix of S that starts at position i.
If one suffix Sj of S matches a prefix of another suffix Si of S, then the path for Sj would not end at a leaf.
S = xabxa
S1 = xabxa and S4 = xa. In this case, the suffix S4 is also a prefix of suffix S1.
How to avoid this problem?
- Assume that the last character of S appears nowhere else in S.
- Add a new character $ not in the alphabet to the end of S.
Suffix Tree for S= xabxa$
Let us first build a suffix tree the naive way.
Here's an excellent implementation of the suffix tree described above:http://en.literateprograms.org/Suffix_tree_(Java) and
In the next post, I will discuss Ukkonen’s linear-time construction that takes time O(m) to build a suffix tree.