Package picard.sam
Class MergeBamAlignment
- java.lang.Object
-
- picard.cmdline.CommandLineProgram
-
- picard.sam.MergeBamAlignment
-
@DocumentedFeature public class MergeBamAlignment extends CommandLineProgram
Summary
A command-line tool for merging BAM/SAM alignment info from a third-party aligner with the data in an unmapped BAM file, producing a third BAM file that has alignment data (from the aligner) and all the remaining data from the unmapped BAM. Quick note: this is not a tool for taking multiple sam files and creating a bigger file by merging them. For that use-case, seeMergeSamFiles
.Details
Many alignment tools (still!) require fastq format input. The unmapped bam may contain useful information that will be lost in the conversion to fastq (meta-data like sample alias, library, barcodes, etc., and read-level tags.) This tool takes an unaligned bam with meta-data, and the aligned bam produced by callingSamToFastq
and then passing the result to an aligner/mapper. It produces a new SAM file that includes all aligned and unaligned reads and also carries forward additional read attributes from the unmapped BAM (attributes that are otherwise lost in the process of converting to fastq). The resulting file will be valid for use by Picard and GATK tools. The output may be coordinate-sorted, in which case the tags, NM, MD, and UQ will be calculated and populated, or query-name sorted, in which case the tags will not be calculated or populated.Usage example:
java -jar picard.jar MergeBamAlignment \\ ALIGNED=aligned.bam \\ UNMAPPED=unmapped.bam \\ O=merge_alignments.bam \\ R=reference_sequence.fasta
Caveats
This tool has been developing for a while and many arguments have been added to it over the years. You may be particularly interested in the following (partial) list:- CLIP_ADAPTERS -- Whether to (soft-)clip the ends of the reads that are identified as belonging to adapters
- IS_BISULFITE_SEQUENCE -- Whether the sequencing originated from bisulfite sequencing, in which case NM will be calculated differently
- ALIGNER_PROPER_PAIR_FLAGS -- Use if the aligner that was used cannot be trusted to set the "Proper pair" flag and then the tool will set this flag based on orientation and distance between pairs.
- ADD_MATE_CIGAR -- Whether to use this opportunity to add the MC tag to each read.
- UNMAP_CONTAMINANT_READS (and MIN_UNCLIPPED_BASES) -- Whether to identify extremely short alignments (with clipping on both sides) as cross-species contamination and unmap the reads.
-
-
Field Summary
-
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
-
Constructor Summary
Constructors Constructor Description MergeBamAlignment()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String[]
customCommandLineValidation()
Put any custom command-line validation in an override of this method.protected int
doWork()
Do the work after command line has been parsed.protected boolean
requiresReference()
-
Methods inherited from class picard.cmdline.CommandLineProgram
getCommandLine, getCommandLineParser, getDefaultHeaders, getFaqLink, getMetricsFile, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, setDefaultHeaders, useLegacyParser
-
-
-
-
Field Detail
-
pgTagArgumentCollection
@ArgumentCollection public final PGTagArgumentCollection pgTagArgumentCollection
-
UNMAPPED_BAM
@Argument(shortName="UNMAPPED", doc="Original SAM or BAM file of unmapped reads, which must be in queryname order.") public File UNMAPPED_BAM
-
ALIGNED_BAM
@Argument(shortName="ALIGNED", doc="SAM or BAM file(s) with alignment data.", mutex={"READ1_ALIGNED_BAM","READ2_ALIGNED_BAM"}, optional=true) public List<File> ALIGNED_BAM
-
READ1_ALIGNED_BAM
@Argument(shortName="R1_ALIGNED", doc="SAM or BAM file(s) with alignment data from the first read of a pair.", mutex="ALIGNED_BAM", optional=true) public List<File> READ1_ALIGNED_BAM
-
READ2_ALIGNED_BAM
@Argument(shortName="R2_ALIGNED", doc="SAM or BAM file(s) with alignment data from the second read of a pair.", mutex="ALIGNED_BAM", optional=true) public List<File> READ2_ALIGNED_BAM
-
OUTPUT
@Argument(shortName="O", doc="Merged SAM or BAM file to write to.") public File OUTPUT
-
PROGRAM_RECORD_ID
@Argument(shortName="PG", doc="The program group ID of the aligner (if not supplied by the aligned file).", optional=true) public String PROGRAM_RECORD_ID
-
PROGRAM_GROUP_VERSION
@Argument(shortName="PG_VERSION", doc="The version of the program group (if not supplied by the aligned file).", optional=true) public String PROGRAM_GROUP_VERSION
-
PROGRAM_GROUP_COMMAND_LINE
@Argument(shortName="PG_COMMAND", doc="The command line of the program group (if not supplied by the aligned file).", optional=true) public String PROGRAM_GROUP_COMMAND_LINE
-
PROGRAM_GROUP_NAME
@Argument(shortName="PG_NAME", doc="The name of the program group (if not supplied by the aligned file).", optional=true) public String PROGRAM_GROUP_NAME
-
PAIRED_RUN
@Deprecated @Argument(doc="DEPRECATED. This argument is ignored and will be removed.", shortName="PE", optional=true) public Boolean PAIRED_RUN
Deprecated.
-
JUMP_SIZE
@Deprecated @Argument(doc="The expected jump size (required if this is a jumping library). Deprecated. Use EXPECTED_ORIENTATIONS instead", shortName="JUMP", mutex="EXPECTED_ORIENTATIONS", optional=true) public Integer JUMP_SIZE
Deprecated.
-
CLIP_ADAPTERS
@Argument(doc="Whether to clip adapters where identified.") public boolean CLIP_ADAPTERS
-
IS_BISULFITE_SEQUENCE
@Argument(doc="Whether the lane is bisulfite sequence (used when calculating the NM tag).") public boolean IS_BISULFITE_SEQUENCE
-
ALIGNED_READS_ONLY
@Argument(doc="Whether to output only aligned reads. ") public boolean ALIGNED_READS_ONLY
-
MAX_INSERTIONS_OR_DELETIONS
@Argument(doc="The maximum number of insertions or deletions permitted for an alignment to be included. Alignments with more than this many insertions or deletions will be ignored. Set to -1 to allow any number of insertions or deletions.", shortName="MAX_GAPS") public int MAX_INSERTIONS_OR_DELETIONS
-
ATTRIBUTES_TO_RETAIN
@Argument(doc="Reserved alignment attributes (tags starting with X, Y, or Z) that should be brought over from the alignment data when merging.", optional=true) public List<String> ATTRIBUTES_TO_RETAIN
-
ATTRIBUTES_TO_REMOVE
@Argument(doc="Attributes from the alignment record that should be removed when merging. This overrides ATTRIBUTES_TO_RETAIN if they share common tags.", optional=true) public List<String> ATTRIBUTES_TO_REMOVE
-
ATTRIBUTES_TO_REVERSE
@Argument(shortName="RV", doc="Attributes on negative strand reads that need to be reversed.", optional=true) public Set<String> ATTRIBUTES_TO_REVERSE
-
ATTRIBUTES_TO_REVERSE_COMPLEMENT
@Argument(shortName="RC", doc="Attributes on negative strand reads that need to be reverse complemented.", optional=true) public Set<String> ATTRIBUTES_TO_REVERSE_COMPLEMENT
-
READ1_TRIM
@Argument(shortName="R1_TRIM", doc="The number of bases trimmed from the beginning of read 1 prior to alignment") public int READ1_TRIM
-
READ2_TRIM
@Argument(shortName="R2_TRIM", doc="The number of bases trimmed from the beginning of read 2 prior to alignment") public int READ2_TRIM
-
EXPECTED_ORIENTATIONS
@Argument(shortName="ORIENTATIONS", doc="The expected orientation of proper read pairs. Replaces JUMP_SIZE", mutex="JUMP_SIZE", optional=true) public List<htsjdk.samtools.SamPairUtil.PairOrientation> EXPECTED_ORIENTATIONS
-
ALIGNER_PROPER_PAIR_FLAGS
@Argument(doc="Use the aligner\'s idea of what a proper pair is rather than computing in this program.") public boolean ALIGNER_PROPER_PAIR_FLAGS
-
SORT_ORDER
@Argument(shortName="SO", doc="The order in which the merged reads should be output.") public htsjdk.samtools.SAMFileHeader.SortOrder SORT_ORDER
-
PRIMARY_ALIGNMENT_STRATEGY
@Argument(doc="Strategy for selecting primary alignment when the aligner has provided more than one alignment for a pair or fragment, and none are marked as primary, more than one is marked as primary, or the primary alignment is filtered out for some reason. For all strategies, ties are resolved arbitrarily.") public picard.sam.MergeBamAlignment.PrimaryAlignmentStrategy PRIMARY_ALIGNMENT_STRATEGY
-
CLIP_OVERLAPPING_READS
@Argument(doc="For paired reads, soft clip the 3\' end of each read if necessary so that it does not extend past the 5\' end of its mate.") public boolean CLIP_OVERLAPPING_READS
-
INCLUDE_SECONDARY_ALIGNMENTS
@Argument(doc="If false, do not write secondary alignments to output.") public boolean INCLUDE_SECONDARY_ALIGNMENTS
-
ADD_MATE_CIGAR
@Argument(shortName="MC", optional=true, doc="Adds the mate CIGAR tag (MC) if true, does not if false.") public Boolean ADD_MATE_CIGAR
-
UNMAP_CONTAMINANT_READS
@Argument(shortName="UNMAP_CONTAM", optional=true, doc="Detect reads originating from foreign organisms (e.g. bacterial DNA in a non-bacterial sample),and unmap + label those reads accordingly.") public boolean UNMAP_CONTAMINANT_READS
-
MIN_UNCLIPPED_BASES
@Argument(doc="If UNMAP_CONTAMINANT_READS is set, require this many unclipped bases or else the read will be marked as contaminant.") public int MIN_UNCLIPPED_BASES
-
MATCHING_DICTIONARY_TAGS
@Argument(doc="List of Sequence Records tags that must be equal (if present) in the reference dictionary and in the aligned file. Mismatching tags will cause an error if in this list, and a warning otherwise.") public List<String> MATCHING_DICTIONARY_TAGS
-
UNMAPPED_READ_STRATEGY
@Argument(doc="How to deal with alignment information in reads that are being unmapped (e.g. due to cross-species contamination.) Currently ignored unless UNMAP_CONTAMINANT_READS = true", optional=true) public AbstractAlignmentMerger.UnmappingReadStrategy UNMAPPED_READ_STRATEGY
-
-
Method Detail
-
requiresReference
protected boolean requiresReference()
- Overrides:
requiresReference
in classCommandLineProgram
-
doWork
protected int doWork()
Description copied from class:CommandLineProgram
Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.- Specified by:
doWork
in classCommandLineProgram
- Returns:
- program exit status.
-
customCommandLineValidation
protected String[] customCommandLineValidation()
Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.- Overrides:
customCommandLineValidation
in classCommandLineProgram
- Returns:
- null if command line is valid. If command line is invalid, returns an array of error messages to be written to the appropriate place.
-
-