Class CollectAlignmentSummaryMetrics


  • @DocumentedFeature
    public class CollectAlignmentSummaryMetrics
    extends SinglePassSamProgram
    A command line tool to read a BAM file and produce standard alignment metrics that would be applicable to any alignment. Metrics to include, but not limited to:
    • Total number of reads (total, period, no exclusions)
    • Total number of PF reads (PF == does not fail vendor check flag)
    • Number of PF noise reads (does not fail vendor check and has noise attr set)
    • Total aligned PF reads (any PF read that has a sequence and position)
    • High quality aligned PF reads (high quality == mapping quality >= 20)
    • High quality aligned PF bases (actual aligned bases, calculate off alignment blocks)
    • High quality aligned PF Q20 bases (subset of above where base quality >= 20)
    • Median mismatches in HQ aligned PF reads (how many aligned bases != ref on average)
    • Reads aligned in pairs (vs. reads aligned with mate unaligned/not present)
    • Read length (how to handle mixed lengths?)
    • Bad Cycles - how many machine cycles yielded combined no-call and mismatch rates of >= 80%
    • Strand balance - reads mapped to positive strand / total mapped reads
    Metrics are written for the first read of a pair, the second read, and combined for the pair. Chimeras are identified if any of the of following criteria are met:
    • the insert size is larger than MAX_INSERT_SIZE
    • the ends of a pair map to different contigs
    • the paired end orientation is different that the expected orientation
    • the read contains an SA tag (chimeric alignment)
    • Field Detail

      • HISTOGRAM_FILE

        @Argument(shortName="H",
                  doc="If Provided, file to write read-length chart pdf.",
                  optional=true)
        public File HISTOGRAM_FILE
      • MAX_INSERT_SIZE

        @Argument(doc="Paired-end reads above this insert size will be considered chimeric along with inter-chromosomal pairs.")
        public int MAX_INSERT_SIZE
      • EXPECTED_PAIR_ORIENTATIONS

        @Argument(doc="Paired-end reads that do not have this expected orientation will be considered chimeric.")
        public Set<htsjdk.samtools.SamPairUtil.PairOrientation> EXPECTED_PAIR_ORIENTATIONS
      • ADAPTER_SEQUENCE

        @Argument(doc="List of adapter sequences to use when processing the alignment metrics.")
        public List<String> ADAPTER_SEQUENCE
      • METRIC_ACCUMULATION_LEVEL

        @Argument(shortName="LEVEL",
                  doc="The level(s) at which to accumulate metrics.")
        public Set<MetricAccumulationLevel> METRIC_ACCUMULATION_LEVEL
      • IS_BISULFITE_SEQUENCED

        @Argument(shortName="BS",
                  doc="Whether the SAM or BAM file consists of bisulfite sequenced reads.")
        public boolean IS_BISULFITE_SEQUENCED
      • COLLECT_ALIGNMENT_INFORMATION

        @Argument(doc="A flag to disable the collection of actual alignment information. If false, tool will only count READS, PF_READS, and NOISE_READS. (For backwards compatibility).")
        public boolean COLLECT_ALIGNMENT_INFORMATION
    • Constructor Detail

      • CollectAlignmentSummaryMetrics

        public CollectAlignmentSummaryMetrics()
    • Method Detail

      • customCommandLineValidation

        protected String[] customCommandLineValidation()
        Description copied from class: CommandLineProgram
        Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.
        Overrides:
        customCommandLineValidation in class CommandLineProgram
        Returns:
        null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
      • setup

        protected void setup​(htsjdk.samtools.SAMFileHeader header,
                             File samFile)
        Description copied from class: SinglePassSamProgram
        Should be implemented by subclasses to do one-time initialization work.
        Specified by:
        setup in class SinglePassSamProgram
      • acceptRead

        protected void acceptRead​(htsjdk.samtools.SAMRecord rec,
                                  htsjdk.samtools.reference.ReferenceSequence ref)
        Description copied from class: SinglePassSamProgram
        Should be implemented by subclasses to accept SAMRecords one at a time. If the read has a reference sequence and a reference sequence file was supplied to the program it will be passed as 'ref'. Otherwise 'ref' may be null.
        Specified by:
        acceptRead in class SinglePassSamProgram