Class SequenceUtil


  • public final class SequenceUtil
    extends java.lang.Object
    Utility class for operations on sequences
    Since:
    3.0.2
    Version:
    1.0
    Author:
    Peter Troshin
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.util.regex.Pattern AA
      Valid Amino acids
      static java.util.regex.Pattern AMBIGUOUS_AA
      Same as AA pattern but with two additional letters - XU
      static java.util.regex.Pattern AMBIGUOUS_NUCLEOTIDE
      Ambiguous nucleotide
      static java.util.regex.Pattern DIGIT
      A digit
      static java.util.regex.Pattern NON_AA
      inversion of AA pattern
      static java.util.regex.Pattern NON_NUCLEOTIDE
      Non nucleotide
      static java.util.regex.Pattern NONWORD
      Non word
      static java.util.regex.Pattern NUCLEOTIDE
      Nucleotides a, t, g, c, u
      static java.util.regex.Pattern WHITE_SPACE
      A whitespace character: [\t\n\x0B\f\r]
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.lang.String cleanSequence​(java.lang.String sequence)
      Removes all whitespace chars in the sequence string
      static java.lang.String deepCleanSequence​(java.lang.String sequence)
      Removes all special characters and digits as well as whitespace chars from the sequence
      static boolean isAmbiguosProtein​(java.lang.String sequence)
      Check whether the sequence confirms to amboguous protein sequence
      static boolean isNonAmbNucleotideSequence​(java.lang.String sequence)
      Ambiguous DNA chars : AGTCRYMKSWHBVDN // differs from protein in only one (!) - B char
      static boolean isNucleotideSequence​(FastaSequence s)  
      static boolean isProteinSequence​(java.lang.String sequence)  
      static java.util.List<FastaSequence> readFasta​(java.io.InputStream inStream)
      Reads fasta sequences from inStream into the list of FastaSequence objects
      static void writeFasta​(java.io.OutputStream os, java.util.List<FastaSequence> sequences)
      Writes FastaSequence in the file, each sequence will take one line only
      static void writeFasta​(java.io.OutputStream outstream, java.util.List<FastaSequence> sequences, int width)
      Writes list of FastaSequeces into the outstream formatting the sequence so that it contains width chars on each line
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • WHITE_SPACE

        public static final java.util.regex.Pattern WHITE_SPACE
        A whitespace character: [\t\n\x0B\f\r]
      • DIGIT

        public static final java.util.regex.Pattern DIGIT
        A digit
      • NONWORD

        public static final java.util.regex.Pattern NONWORD
        Non word
      • AA

        public static final java.util.regex.Pattern AA
        Valid Amino acids
      • NON_AA

        public static final java.util.regex.Pattern NON_AA
        inversion of AA pattern
      • AMBIGUOUS_AA

        public static final java.util.regex.Pattern AMBIGUOUS_AA
        Same as AA pattern but with two additional letters - XU
      • NUCLEOTIDE

        public static final java.util.regex.Pattern NUCLEOTIDE
        Nucleotides a, t, g, c, u
      • AMBIGUOUS_NUCLEOTIDE

        public static final java.util.regex.Pattern AMBIGUOUS_NUCLEOTIDE
        Ambiguous nucleotide
      • NON_NUCLEOTIDE

        public static final java.util.regex.Pattern NON_NUCLEOTIDE
        Non nucleotide
    • Method Detail

      • isNucleotideSequence

        public static boolean isNucleotideSequence​(FastaSequence s)
        Returns:
        true is the sequence contains only letters a,c, t, g, u
      • isNonAmbNucleotideSequence

        public static boolean isNonAmbNucleotideSequence​(java.lang.String sequence)
        Ambiguous DNA chars : AGTCRYMKSWHBVDN // differs from protein in only one (!) - B char
      • cleanSequence

        public static java.lang.String cleanSequence​(java.lang.String sequence)
        Removes all whitespace chars in the sequence string
        Parameters:
        sequence -
        Returns:
        cleaned up sequence
      • deepCleanSequence

        public static java.lang.String deepCleanSequence​(java.lang.String sequence)
        Removes all special characters and digits as well as whitespace chars from the sequence
        Parameters:
        sequence -
        Returns:
        cleaned up sequence
      • isProteinSequence

        public static boolean isProteinSequence​(java.lang.String sequence)
        Parameters:
        sequence -
        Returns:
        true is the sequence is a protein sequence, false overwise
      • isAmbiguosProtein

        public static boolean isAmbiguosProtein​(java.lang.String sequence)
        Check whether the sequence confirms to amboguous protein sequence
        Parameters:
        sequence -
        Returns:
        return true only if the sequence if ambiguous protein sequence Return false otherwise. e.g. if the sequence is non-ambiguous protein or DNA
      • writeFasta

        public static void writeFasta​(java.io.OutputStream outstream,
                                      java.util.List<FastaSequence> sequences,
                                      int width)
                               throws java.io.IOException
        Writes list of FastaSequeces into the outstream formatting the sequence so that it contains width chars on each line
        Parameters:
        outstream -
        sequences -
        width - - the maximum number of characters to write in one line
        Throws:
        java.io.IOException
      • readFasta

        public static java.util.List<FastaSequence> readFasta​(java.io.InputStream inStream)
                                                       throws java.io.IOException
        Reads fasta sequences from inStream into the list of FastaSequence objects
        Parameters:
        inStream - from
        Returns:
        list of FastaSequence objects
        Throws:
        java.io.IOException
      • writeFasta

        public static void writeFasta​(java.io.OutputStream os,
                                      java.util.List<FastaSequence> sequences)
                               throws java.io.IOException
        Writes FastaSequence in the file, each sequence will take one line only
        Parameters:
        os -
        sequences -
        Throws:
        java.io.IOException