Class ReadNameParser

  • All Implemented Interfaces:
    Serializable
    Direct Known Subclasses:
    OpticalDuplicateFinder

    public class ReadNameParser
    extends Object
    implements Serializable
    Provides access to the physical location information about a cluster. All values should be defaulted to -1 if unavailable. ReadGroup and Tile should only allow non-zero positive integers, x and y coordinates may be negative.
    See Also:
    Serialized Form
    • Field Detail

      • DEFAULT_READ_NAME_REGEX

        public static final String DEFAULT_READ_NAME_REGEX
        The read name regular expression (regex) is used to extract three pieces of information from the read name: tile, x location, and y location. Any read name regex should parse the read name to produce these and only these values. An example regex is: (?:.*:)?([0-9]+)[^:]*:([0-9]+)[^:]*:([0-9]+)[^:]*$ which assumes that fields in the read name are delimited by ':' and the last three fields correspond to the tile, x and y locations, ignoring any trailing non-digit characters. The default regex is optimized for fast parsing (see getLastThreeFields(String, char, int[])) by searching for the last three fields, ignoring any trailing non-digit characters, assuming the delimiter ':'. This should consider correctly read names where we have 5 or 7 field with the last three fields being tile/x/y, as is the case for the majority of read names produced by Illumina technology.
      • readNameRegex

        protected final String readNameRegex
    • Constructor Detail

      • ReadNameParser

        public ReadNameParser()
        Creates are read name parser using the default read name regex and optical duplicate distance. See DEFAULT_READ_NAME_REGEX for an explanation on how the read name is parsed.
      • ReadNameParser

        public ReadNameParser​(String readNameRegex)
        Creates are read name parser using the given read name regex. See DEFAULT_READ_NAME_REGEX for an explanation on how to format the regular expression (regex) string.
        Parameters:
        readNameRegex - the read name regular expression string to parse read names, null to never parse location information.
      • ReadNameParser

        public ReadNameParser​(String readNameRegex,
                              htsjdk.samtools.util.Log log)
        Creates are read name parser using the given read name regex. See DEFAULT_READ_NAME_REGEX for an explanation on how to format the regular expression (regex) string.
        Parameters:
        readNameRegex - the read name regular expression string to parse read names, null to never parse location information..
        log - the log to which to write messages.
    • Method Detail

      • addLocationInformation

        public boolean addLocationInformation​(String readName,
                                              PhysicalLocation loc)
        Method used to extract tile/x/y from the read name and add it to the PhysicalLocationShort so that it can be used later to determine optical duplication
        Parameters:
        readName - the name of the read/cluster
        loc - the object to add tile/x/y to
        Returns:
        true if the read name contained the information in parsable form, false otherwise
      • getLastThreeFields

        public static int getLastThreeFields​(String readName,
                                             char delim,
                                             int[] tokens)
                                      throws NumberFormatException
        Given a string, splits the string by the delimiter, and returns the the last three fields parsed as integers. Parsing a field considers only a sequence of digits up until the first non-digit character. The three values are stored in the passed-in array.
        Throws:
        NumberFormatException - if any of the tokens that should contain numbers do not start with parsable numbers
      • rapidParseInt

        public static int rapidParseInt​(String input)
                                 throws NumberFormatException
        Very specialized method to rapidly parse a sequence of digits from a String up until the first non-digit character.
        Throws:
        NumberFormatException - if the String does not start with an optional - followed by at least on digit