Defines the "vg minimizer" subcommand, which builds the minimizer index.
The index contains the lexicographically smallest kmer in a window of w successive kmers and their reverse complements. If the kmer contains characters other than A, C, G, and T, it will not be indexed.
The index contains either all or haplotype-consistent minimizers. Indexing all minimizers from complex graph regions can take a long time (e.g. tens of hours vs 5-10 minutes for 1000GP), because many windows have the same minimizer. As the total number of minimizers is manageable (e.g. 1.5x more for 1000GP) it should be possible to develop a better algorithm for finding the minimizers.
A quick idea for indexing the entire graph:
- For each node v, extract the subgraph for the windows starting in v.
- Extract all k'-mers from the subgraph and use them to determine where the minimizers can start.