Class QualityEncodingDetector


  • public class QualityEncodingDetector
    extends Object
    Utility for determining the type of quality encoding/format (see FastqQualityFormat) used in a SAM/BAM or Fastq.

    To use this class, invoke the detect() method with a SamReader or FastqReader, as appropriate. The consumer is responsible for closing readers.

    • Field Detail

      • DEFAULT_MAX_RECORDS_TO_ITERATE

        public static final long DEFAULT_MAX_RECORDS_TO_ITERATE
        The maximum number of records over which the detector will iterate before making a determination, by default.
        See Also:
        Constant Field Values
    • Constructor Detail

      • QualityEncodingDetector

        public QualityEncodingDetector()
    • Method Detail

      • add

        public long add​(long maxRecords,
                        FastqReader... readers)
        Adds the provided reader's records to the detector.
        Returns:
        The number of records read
      • add

        public long add​(long maxRecords,
                        SamReader reader)
        Adds the provided reader's records to the detector.
        Returns:
        The number of records read
      • add

        public long add​(long maxRecords,
                        CloseableIterator<SAMRecord> iterator,
                        boolean useOriginalQualities)
        Adds the provided iterator's records (optionally using the original qualities) to the detector.
        Returns:
        The number of records read
      • add

        public void add​(FastqRecord fastqRecord)
        Adds the provided record's qualities to the detector.
      • add

        public void add​(SAMRecord samRecord,
                        boolean useOriginalQualities)
        Adds the provided record's qualities to the detector.
      • add

        public void add​(SAMRecord samRecord)
      • isDeterminationAmbiguous

        public boolean isDeterminationAmbiguous()
        Tests whether or not the detector can make a determination without guessing (i.e., if all but one quality format can be excluded using established exclusion conventions).
        Returns:
        True if more than one format is possible after exclusions; false otherwise
      • generateCandidateQualities

        public EnumSet<FastqQualityFormat> generateCandidateQualities​(boolean checkExpected)
        Processes collected quality data and applies rules to determine which quality formats are possible.

        Specifically, for each format's known range of possible values (its "quality scheme"), exclude formats if any observed values fall outside of that range. Additionally, exclude formats for which we expect to see at least one quality in a range of values, but do not. (For example, for Phred, we expect to eventually see a value below 58. If we never see such a value, we exclude Phred as a possible format unless the checkExpected flag is set to false in which case we leave Phred as a possible quality format.)

      • detect

        public static FastqQualityFormat detect​(long maxRecords,
                                                FastqReader... readers)
        Reads through the records in the provided fastq reader and uses their quality scores to determine the quality format used in the fastq.
        Parameters:
        readers - The fastq readers from which qualities are to be read; at least one must be provided
        maxRecords - The maximum number of records to read from the reader before making a determination (a guess, so more records is better)
        Returns:
        The determined quality format
      • detect

        public static FastqQualityFormat detect​(long maxRecords,
                                                CloseableIterator<SAMRecord> iterator,
                                                boolean useOriginalQualities)
        Reads through the records in the provided SAM reader and uses their quality scores to determine the quality format used in the SAM.
        Parameters:
        iterator - The iterator from which SAM records are to be read
        maxRecords - The maximum number of records to read from the reader before making a determination (a guess,
        useOriginalQualities - whether to use the original qualities (if available) rather than the current ones so more records is better)
        Returns:
        The determined quality format
      • detect

        public static FastqQualityFormat detect​(SamReader reader,
                                                FastqQualityFormat expectedQualityFormat)
        Reads through the records in the provided SAM reader and uses their quality scores to sanity check the expected quality passed in. If the expected quality format is sane we just hand this back otherwise we throw a SAMException.
      • generateBestGuess

        public FastqQualityFormat generateBestGuess​(QualityEncodingDetector.FileContext context,
                                                    FastqQualityFormat expectedQuality)
        Make the best guess at the quality format. If an expected quality is passed in the values are sanity checked (ignoring expected range) and if they are deemed acceptable the expected quality is passed back. Otherwise we use a set of heuristics to make our best guess.