Class SuffixTree

  • All Implemented Interfaces:
    java.io.Serializable

    public class SuffixTree
    extends java.lang.Object
    implements java.io.Serializable
    Suffix tree implementation. The interface is a bit strange, as it needed to be as space-efficient as possible. More work could be done on the space issue.

    A suffix tree is an efficient method for encoding the frequencies of motifs in a sequence. They are sometimes used to quickly screen for similar sequences. For instance, all motifs of length up to 2 in the sequence AAGT could be encoded as:

     root(4)
     |
     A(2)--------G(1)-----T(1)
     |           |
     A(1)--G(1)  T(1)
     

    A possible method of comparing SuffixTrees is provided as a kernel function as org.biojava.stats.svm.tools.SuffixTreeKernel.

    Author:
    Matthew Pocock, Thomas Down (documentation and other updates)
    See Also:
    Serialized Form
    • Constructor Detail

      • SuffixTree

        public SuffixTree​(FiniteAlphabet alphabet)
        Construct a new SuffixTree to contain motifs over the specified alphabet.
        Parameters:
        alphabet - The alphabet of this SuffixTree (must be finite).
    • Method Detail

      • getAlphabet

        public FiniteAlphabet getAlphabet()
        Return the Alphabet containing all Symbols which might be found in this SuffixTree.
      • getRoot

        public SuffixTree.SuffixNode getRoot()
        Return the node object which is the root of this suffix tree. This represents the set of all motifs found in this tree.
      • addSymbols

        public void addSymbols​(SymbolList sList,
                               int window)
                        throws IllegalSymbolException
        Add a count for all motifs with length of up to window to this tree.
        Parameters:
        sList - a SymbolList whose motifs should be added to the tree.
        window - The maximum motif length to count.
        Throws:
        IllegalSymbolException
      • incCounts

        protected void incCounts​(int i,
                                 int c)
      • maxLength

        public int maxLength()
        Return the length of the longest motif represented in this SuffixTree
      • frequency

        public int frequency​(int length)
        Return the number of motifs of a given length encoded in this SuffixTree.