Package htsjdk.samtools.cram.build
Class ContainerFactory
- java.lang.Object
-
- htsjdk.samtools.cram.build.ContainerFactory
-
public final class ContainerFactory extends Object
Aggregates SAMRecord objects into one or moreContainer
s, composed of one or moreSlice
s. based on a set of rules implemented by this class in combination with the parameter values provided via aCRAMEncodingStrategy
object. The general call pattern is to pass records in one at a time, and process Containers as they are returned:
Multiple slices are only aggregated into a single container if slices/container is > 1, *and* all of the slices are SINGLE_REFERENCE and have the same (mapped) reference context. MULTI_REFERENCE slices are never aggregated with other slices into a single container, no matter how many slices/container are requested, since it can be very inefficient to do so (the spec requires that if any slice in a container is multiple-reference, all slices in the container must also be MULTI_REFERENCE). For coordinate sorted inputs, a MULTI_REFERENCE slice is only created when there are not enough reads mapped to a single reference sequence to reach the MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD. This usually only happens near the end of the reads mapped to a given sequence. When that happens, a small MULTI_REFERENCE slice for the remaining reads mapped to the previous sequence, plus some subsequent records are accumulated until MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD is hit, and the resulting MULTI_REFERENCE slice will be emitted into it's own container.long containerOffset = initialOffset; // after writing header, etc ContainerFactory containerFactory = new ContainerFactory(...) // retrieve input records and obtain/emit Containers as they are produced by the factory... while (inputSAM.hasNext() { Container container = containerFactory.getNextContainer(inputSAM.next, containerOffset); if (container != null) { containerOffset = writeContainer(container...) } } // if there is a final Container, retrieve and emit it Container finalContainer = containerFactory.getFinalContainer(containerOffset); if (finalContainer != null) { containers.add(finalContainer); }
-
-
Constructor Summary
Constructors Constructor Description ContainerFactory(SAMFileHeader samFileHeader, CRAMEncodingStrategy encodingStrategy, CRAMReferenceSource referenceSource)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Container
getFinalContainer(long containerByteOffset)
Obtain aContainer
from any remaining accumulated SAMRecords, if any.Container
getNextContainer(SAMRecord samRecord, long containerByteOffset)
boolean
shouldEmitContainer(int currentReferenceContextID, int nextRecordIndex, int numberOfSliceEntries)
Determine if a Container should be emitted based on the current reference context and the reference context for the next record to be processed, and the encoding strategy parameters.
-
-
-
Constructor Detail
-
ContainerFactory
public ContainerFactory(SAMFileHeader samFileHeader, CRAMEncodingStrategy encodingStrategy, CRAMReferenceSource referenceSource)
- Parameters:
samFileHeader
- theSAMFileHeader
(used to determine sort order and resolve read groups)encodingStrategy
- theCRAMEncodingStrategy
parameters to usereferenceSource
- theCRAMReferenceSource
to use for containers created by this factory
-
-
Method Detail
-
getNextContainer
public final Container getNextContainer(SAMRecord samRecord, long containerByteOffset)
-
getFinalContainer
public Container getFinalContainer(long containerByteOffset)
Obtain aContainer
from any remaining accumulated SAMRecords, if any.
-
shouldEmitContainer
public boolean shouldEmitContainer(int currentReferenceContextID, int nextRecordIndex, int numberOfSliceEntries)
Determine if a Container should be emitted based on the current reference context and the reference context for the next record to be processed, and the encoding strategy parameters. A container is emitted if: - the requested number of slices per container has been reached, or - a multi-reference slice has been accumulated (a multi-ref slice will always be emitted into it's own container as soon as it's generated, since we dont want to confer multi-ref-ness on the next slice, which might otherwise be single-ref), or - we haven't reached the requested number of slices, but we're changing reference contexts and we don't want to create a MULTI-REF container out of two or more SINGLE_REF slices with different contexts, since by the spec we'd be forced to call that container MULTI-REF, and thus the slices would have to be multi-ref. So instead emit a single ref container- Parameters:
currentReferenceContextID
-nextRecordIndex
-numberOfSliceEntries
-- Returns:
- true if a
Container
should be emitted, otherwise false
-
-