support.bitvector
module¶
An implementation of an object that acts like a collection of on/off bits.
Base classes¶
- class whoosh.idsets.DocIdSet¶
Base class for a set of positive integers, implementing a subset of the built-in
set
type’s interface with extra docid-related methods.This is a superclass for alternative set implementations to the built-in
set
which are more memory-efficient and specialized toward storing sorted lists of positive integers, though they will inevitably be slower thanset
for most operations since they’re pure Python.- after(i)¶
Returns the next integer in the set after
i
, or None.
- before(i)¶
Returns the previous integer in the set before
i
, or None.
- first()¶
Returns the first (lowest) integer in the set.
- invert_update(size)¶
Updates the set in-place to contain numbers in the range
[0 - size)
except numbers that are in this set.
- last()¶
Returns the last (highest) integer in the set.
- class whoosh.idsets.BaseBitSet¶
Implementation classes¶
- class whoosh.idsets.BitSet(source=None, size=0)¶
A DocIdSet backed by an array of bits. This can also be useful as a bit array (e.g. for a Bloom filter). It is much more memory efficient than a large built-in set of integers, but wastes memory for sparse sets.
- Parameters
maxsize – the maximum size of the bit array.
source – an iterable of positive integers to add to this set.
bits – an array of unsigned bytes (“B”) to use as the underlying bit array. This is used by some of the object’s methods.
- class whoosh.idsets.OnDiskBitSet(dbfile, basepos, bytecount)¶
A DocIdSet backed by an array of bits on disk.
>>> st = RamStorage() >>> f = st.create_file("test.bin") >>> bs = BitSet([1, 10, 15, 7, 2]) >>> bytecount = bs.to_disk(f) >>> f.close() >>> # ... >>> f = st.open_file("test.bin") >>> odbs = OnDiskBitSet(f, bytecount) >>> list(odbs) [1, 2, 7, 10, 15]
- Parameters
dbfile – a
StructFile
object to read from.basepos – the base position of the bytes in the given file.
bytecount – the number of bytes to use for the bit array.
- class whoosh.idsets.SortedIntSet(source=None, typecode='I')¶
A DocIdSet backed by a sorted array of integers.
- class whoosh.idsets.MultiIdSet(idsets, offsets)¶
Wraps multiple SERIAL sub-DocIdSet objects and presents them as an aggregated, read-only set.
- Parameters
idsets – a list of DocIdSet objects.
offsets – a list of offsets corresponding to the DocIdSet objects in
idsets
.