Class ExtractIlluminaBarcodes


  • @DocumentedFeature
    public class ExtractIlluminaBarcodes
    extends CommandLineProgram
    Determine the barcode for each read in an Illumina lane. For each tile, a file is written to the basecalls directory of the form s___barcode.txt. An output file contains a line for each read in the tile, aligned with the regular basecall output The output file contains the following tab-separated columns: - read subsequence at barcode position - Y or N indicating if there was a barcode match - matched barcode sequence (empty if read did not match one of the barcodes). If there is no match but we're close to the threshold of calling it a match we output the barcode that would have been matched but in lower case
    • Field Detail

      • BARCODE_SEQUENCE_COLUMN

        public static final String BARCODE_SEQUENCE_COLUMN
        Column header for the first barcode sequence (preferred).
        See Also:
        Constant Field Values
      • BARCODE_SEQUENCE_1_COLUMN

        public static final String BARCODE_SEQUENCE_1_COLUMN
        Column header for the first barcode sequence.
        See Also:
        Constant Field Values
      • BARCODE_NAME_COLUMN

        public static final String BARCODE_NAME_COLUMN
        Column header for the barcode name.
        See Also:
        Constant Field Values
      • LIBRARY_NAME_COLUMN

        public static final String LIBRARY_NAME_COLUMN
        Column header for the library name.
        See Also:
        Constant Field Values
      • BASECALLS_DIR

        @Argument(doc="The Illumina basecalls directory. ",
                  shortName="B")
        public File BASECALLS_DIR
      • OUTPUT_DIR

        @Argument(doc="Where to write _barcode.txt files.  By default, these are written to BASECALLS_DIR.",
                  optional=true)
        public File OUTPUT_DIR
      • LANE

        @Argument(doc="Lane number. ",
                  shortName="L")
        public Integer LANE
      • READ_STRUCTURE

        @Argument(doc="A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the  data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Sample Barcode, M for molecular barcode, T for Template, and S for skip).  E.g. If the input data consists of 80 base clusters and we provide a read structure of \"28T8M8B8S28T\" then the sequence may be split up into four reads:\n* read one with 28 cycles (bases) of template\n* read two with 8 cycles (bases) of molecular barcode (ex. unique molecular barcode)\n* read three with 8 cycles (bases) of sample barcode\n* 8 cycles (bases) skipped.\n* read four with 28 cycles (bases) of template\nThe skipped cycles would NOT be included in an output SAM/BAM file or in read groups therein.",
                  shortName="RS")
        public String READ_STRUCTURE
      • BARCODE

        @Argument(doc="Barcode sequence.  These must be unique, and all the same length.  This cannot be used with reads that have more than one barcode; use BARCODE_FILE in that case. ",
                  mutex="BARCODE_FILE")
        public List<String> BARCODE
      • BARCODE_FILE

        @Argument(doc="Tab-delimited file of barcode sequences, barcode name and, optionally, library name.  Barcodes must be unique and all the same length.  Column headers must be \'barcode_sequence\' (or \'barcode_sequence_1\'), \'barcode_sequence_2\' (optional), \'barcode_name\', and \'library_name\'.",
                  mutex="BARCODE")
        public File BARCODE_FILE
      • METRICS_FILE

        @Argument(doc="Per-barcode and per-lane metrics written to this file.",
                  shortName="M")
        public File METRICS_FILE
      • MAX_MISMATCHES

        @Argument(doc="Maximum mismatches for a barcode to be considered a match.")
        public int MAX_MISMATCHES
      • MIN_MISMATCH_DELTA

        @Argument(doc="Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match.")
        public int MIN_MISMATCH_DELTA
      • MAX_NO_CALLS

        @Argument(doc="Maximum allowable number of no-calls in a barcode read before it is considered unmatchable.")
        public int MAX_NO_CALLS
      • MINIMUM_BASE_QUALITY

        @Argument(shortName="Q",
                  doc="Minimum base quality. Any barcode bases falling below this quality will be considered a mismatch even in the bases match.")
        public int MINIMUM_BASE_QUALITY
      • MINIMUM_QUALITY

        @Argument(doc="The minimum quality (after transforming 0s to 1s) expected from reads.  If qualities are lower than this value, an error is thrown.The default of 2 is what the Illumina\'s spec describes as the minimum, but in practice the value has been observed lower.")
        public int MINIMUM_QUALITY
      • COMPRESS_OUTPUTS

        @Argument(shortName="GZIP",
                  doc="Compress output s_l_t_barcode.txt files using gzip and append a .gz extension to the file names.")
        public boolean COMPRESS_OUTPUTS
      • NUM_PROCESSORS

        @Argument(doc="Run this many PerTileBarcodeExtractors in parallel.  If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS < 0 then the number of cores used will be the number available on the machine less NUM_PROCESSORS.")
        public int NUM_PROCESSORS
    • Constructor Detail

      • ExtractIlluminaBarcodes

        public ExtractIlluminaBarcodes()