BackingStoreHashTableFromScan
public class BackingStoreHashtable
extends java.lang.Object
A BackingStoreHashtable is a utility class which will store a set of rows into an in memory hash table, or overflow the hash table to a tempory on disk structure.
All rows must contain the same number of columns, and the column at position N of all the rows must have the same format id. If the BackingStoreHashtable needs to be overflowed to disk, then an arbitrary row will be chosen and used as a template for creating the underlying overflow container.
The hash table will be built logically as follows (actual implementation may differ). The important points are that the hash value is the standard java hash value on the row[key_column_numbers[0], if key_column_numbers.length is 1, or row[key_column_numbers[0, 1, ...]] if key_column_numbers.length > 1, and that duplicate detection is done by the standard java duplicate detection provided by java.util.Hashtable.
import java.util.Hashtable; hash_table = new Hashtable(); Object row; // is a DataValueDescriptor[] or a LocatedRow boolean needsToClone = rowSource.needsToClone(); while((row = rowSource.getNextRowFromRowSource()) != null) { if (needsToClone) row = clone_row_from_row(row); Object key = KeyHasher.buildHashKey(row, key_column_numbers); if ((duplicate_value = hash_table.put(key, row)) != null) { Vector row_vec; // inserted a duplicate if ((duplicate_value instanceof vector)) { row_vec = (Vector) duplicate_value; } else { // allocate vector to hold duplicates row_vec = new Vector(2); // insert original row into vector row_vec.addElement(duplicate_value); // put the vector as the data rather than the row hash_table.put(key, row_vec); } // insert new row into vector row_vec.addElement(row); } }
What actually goes into the hash table is a little complicated. That is because the row may either be an array of column values (i.e. DataValueDescriptor[]) or a LocatedRow (i.e., a structure holding the columns plus a RowLocation). In addition, the hash value itself may either be one of these rows or (in the case of multiple rows which hash to the same value) a bucket (List) of rows. To sum this up, the values in a hash table which does not spill to disk may be the following:
If rows spill to disk, then they just become arrays of columns. In this case, a LocatedRow becomes a DataValueDescriptor[], where the last cell contains the RowLocation.
Modifier and Type | Class | Description |
---|---|---|
private class |
BackingStoreHashtable.BackingStoreHashtableEnumeration |
|
private static class |
BackingStoreHashtable.RowList |
List of
DataValueDescriptor[] instances that represent rows. |
Modifier and Type | Field | Description |
---|---|---|
private static int |
ARRAY_LIST_SIZE |
The estimated number of bytes used by ArrayList(0)
|
private java.util.Properties |
auxillary_runtimestats |
|
private DiskHashtable |
diskHashtable |
|
private java.util.HashMap<java.lang.Object,java.lang.Object> |
hash_table |
|
private long |
inmemory_rowcnt |
|
private boolean |
keepAfterCommit |
|
private int[] |
key_column_numbers |
|
private long |
max_inmemory_rowcnt |
|
private long |
max_inmemory_size |
|
private boolean |
remove_duplicates |
|
private RowSource |
row_source |
|
private boolean |
skipNullKeyColumns |
|
private TransactionController |
tc |
Fields of the class
|
Modifier | Constructor | Description |
---|---|---|
private |
BackingStoreHashtable() |
Constructors for This class:
|
|
BackingStoreHashtable(TransactionController tc,
RowSource row_source,
int[] key_column_numbers,
boolean remove_duplicates,
long estimated_rowcnt,
long max_inmemory_rowcnt,
int initialCapacity,
float loadFactor,
boolean skipNullKeyColumns,
boolean keepAfterCommit) |
Create the BackingStoreHashtable from a row source.
|
Modifier and Type | Method | Description |
---|---|---|
private void |
add_row_to_hash_table(DataValueDescriptor[] columnValues,
RowLocation rowLocation,
boolean needsToClone) |
Do the work to add one row to the hash table.
|
private static DataValueDescriptor[] |
cloneRow(DataValueDescriptor[] old_row) |
Return a cloned copy of the row.
|
void |
close() |
Close the BackingStoreHashtable.
|
private void |
doSpaceAccounting(java.lang.Object hashValue,
boolean firstDuplicate) |
|
java.util.Enumeration<java.lang.Object> |
elements() |
Return an Enumeration that can be used to scan the entire table.
|
java.lang.Object |
get(java.lang.Object key) |
Get data associated with given key.
|
void |
getAllRuntimeStats(java.util.Properties prop) |
Return runtime stats to caller by adding them to prop.
|
private long |
getEstimatedMemUsage(java.lang.Object hashValue) |
Take a value which will go into the hash table and return an estimate
of how much memory that value will consume.
|
private DataValueDescriptor[] |
getNextRowFromRowSource() |
Call method to either get next row or next row with non-null
key columns.
|
boolean |
includeRowLocations() |
Return true if we should include RowLocations with the rows
stored in this hash table.
|
private DataValueDescriptor[] |
makeDiskRow(java.lang.Object raw) |
Make a full set of columns from an object which is either already
an array of column or otherwise a LocatedRow.
|
private DataValueDescriptor[] |
makeDiskRow(DataValueDescriptor[] columnValues,
RowLocation rowLocation) |
Construct a full set of columns, which may need to end
with the row location.The full set of columns is what's
stored on disk when we spill to disk.
|
private java.lang.Object |
makeInMemoryRow(DataValueDescriptor[] diskRow) |
Make an in-memory row from an on-disk row.
|
private java.util.List |
makeInMemoryRows(java.util.List diskRows) |
Turn a list of disk rows into a list of in-memory rows.
|
boolean |
putRow(boolean needsToClone,
DataValueDescriptor[] row,
RowLocation rowLocation) |
Put a row into the hash table.
|
java.lang.Object |
remove(java.lang.Object key) |
remove a row from the hash table.
|
void |
setAuxillaryRuntimeStats(java.util.Properties prop) |
Set the auxillary runtime stats.
|
(package private) static DataValueDescriptor[] |
shallowCloneRow(DataValueDescriptor[] old_row) |
Return a shallow cloned row
|
int |
size() |
Return number of unique rows in the hash table.
|
private boolean |
spillToDisk(DataValueDescriptor[] columnValues,
RowLocation rowLocation) |
Determine whether a new row should be spilled to disk and, if so, do it.
|
private TransactionController tc
private java.util.HashMap<java.lang.Object,java.lang.Object> hash_table
private int[] key_column_numbers
private boolean remove_duplicates
private boolean skipNullKeyColumns
private java.util.Properties auxillary_runtimestats
private RowSource row_source
private long max_inmemory_rowcnt
private long inmemory_rowcnt
private long max_inmemory_size
private boolean keepAfterCommit
private static final int ARRAY_LIST_SIZE
private DiskHashtable diskHashtable
private BackingStoreHashtable()
public BackingStoreHashtable(TransactionController tc, RowSource row_source, int[] key_column_numbers, boolean remove_duplicates, long estimated_rowcnt, long max_inmemory_rowcnt, int initialCapacity, float loadFactor, boolean skipNullKeyColumns, boolean keepAfterCommit) throws StandardException
This routine drains the RowSource. The performance characteristics depends on the number of rows inserted and the parameters to the constructor. RowLocations are supported iff row_source is null. RowLocations in a non-null row_source can be added later if there is a use-case that stresses this behavior.
If the number of rows is <= "max_inmemory_rowcnt", then the rows are inserted into a java.util.HashMap. In this case no TransactionController is necessary, a "null" tc is valid.
If the number of rows is > "max_inmemory_rowcnt", then the rows will be all placed in some sort of Access temporary file on disk. This case requires a valid TransactionController.
tc
- An open TransactionController to be used if the
hash table needs to overflow to disk.row_source
- RowSource to read rows from.key_column_numbers
- The column numbers of the columns in the
scan result row to be the key to the HashMap.
"0" is the first column in the scan result
row (which may be different than the first
row in the table of the scan).remove_duplicates
- Should the HashMap automatically remove
duplicates, or should it create the list of
duplicates?estimated_rowcnt
- The estimated number of rows in the hash table.
Pass in -1 if there is no estimate.max_inmemory_rowcnt
- The maximum number of rows to insert into the
inmemory Hash table before overflowing to disk.
Pass in -1 if there is no maximum.initialCapacity
- If not "-1" used to initialize the java HashMaploadFactor
- If not "-1" used to initialize the java HashMapskipNullKeyColumns
- Skip rows with a null key column, if true.keepAfterCommit
- If true the hash table is kept after a commit,
if false the hash table is dropped on the next commit.StandardException
- Standard exception policy.public boolean includeRowLocations()
private DataValueDescriptor[] getNextRowFromRowSource() throws StandardException
StandardException
- Standard exception policy.private static DataValueDescriptor[] cloneRow(DataValueDescriptor[] old_row) throws StandardException
StandardException
- Standard exception policy.static DataValueDescriptor[] shallowCloneRow(DataValueDescriptor[] old_row) throws StandardException
StandardException
- Standard exception policy.private void add_row_to_hash_table(DataValueDescriptor[] columnValues, RowLocation rowLocation, boolean needsToClone) throws StandardException
columnValues
- Row to add to the hash table.rowLocation
- Location of row in conglomerate; could be null.needsToClone
- If the row needs to be clonedStandardException
- Standard exception policy.private void doSpaceAccounting(java.lang.Object hashValue, boolean firstDuplicate)
private boolean spillToDisk(DataValueDescriptor[] columnValues, RowLocation rowLocation) throws StandardException
columnValues
- Actual columns from source row.rowLocation
- Optional row location.StandardException
- Standard exception policy.private DataValueDescriptor[] makeDiskRow(java.lang.Object raw)
Make a full set of columns from an object which is either already an array of column or otherwise a LocatedRow. The full set of columns is what's stored on disk when we spill to disk. This is the inverse of makeInMemoryRow().
private java.util.List makeInMemoryRows(java.util.List diskRows)
Turn a list of disk rows into a list of in-memory rows. The on disk rows are always of type DataValueDescriptor[]. But the in-memory rows could be of type LocatedRow.
private java.lang.Object makeInMemoryRow(DataValueDescriptor[] diskRow)
Make an in-memory row from an on-disk row. This is the inverse of makeDiskRow().
private DataValueDescriptor[] makeDiskRow(DataValueDescriptor[] columnValues, RowLocation rowLocation)
Construct a full set of columns, which may need to end with the row location.The full set of columns is what's stored on disk when we spill to disk.
private long getEstimatedMemUsage(java.lang.Object hashValue)
hashValue
- The object for which we want to know the memory usage.public void close() throws StandardException
Perform any necessary cleanup after finishing with the hashtable. Will deallocate/dereference objects as necessary. If the table has gone to disk this will drop any on disk files used to support the hash table.
StandardException
- Standard exception policy.public java.util.Enumeration<java.lang.Object> elements() throws StandardException
Return an Enumeration that can be used to scan the entire table. The objects in the Enumeration can be either of the following:
The situation is a little more complicated because the row representation is different depending on whether the row includes a RowLocation. If includeRowLocations()== true, then the row is a LocatedRow. Otherwise, the row is an array of DataValueDescriptor. Putting all of this together, if the row contains a RowLocation, then the objects in the Enumeration returned by this method can be either of the following:
But if the row does not contain a RowLocation, then the objects in the Enumeration returned by this method can be either of the following:
RESOLVE - is it worth it to support this routine when we have a disk overflow hash table?
StandardException
- Standard exception policy.public java.lang.Object get(java.lang.Object key) throws StandardException
Get data associated with given key.
There are 2 different types of objects returned from this routine.
In both cases, the key value is either the object stored in row[key_column_numbers[0]], if key_column_numbers.length is 1, otherwise it is a KeyHasher containing the objects stored in row[key_column_numbers[0, 1, ...]]. For every qualifying unique row value an entry is placed into the hash table.
For row values with duplicates, the value of the data is a list of rows.
The situation is a little more complicated because the row representation is different depending on whether the row includes a RowLocation. If includeRowLocations() == true, then the row is a LocatedRow. Otherwise, the row is an array of DataValueDescriptor. Putting all of this together, if the row contains a RowLocation, then the objects returned by this method can be either of the following:
But if the row does not contain a RowLocation, then the objects returned by this method can be either of the following:
The caller will have to call "instanceof" on the data value object if duplicates are expected, to determine if the data value of the hash table entry is a row or is a list of rows.
See the javadoc for elements() for more information on the objects returned by this method.
The BackingStoreHashtable "owns" the objects returned from the get() routine. They remain valid until the next access to the BackingStoreHashtable. If the client needs to keep references to these objects, it should clone copies of the objects. A valid BackingStoreHashtable can place all rows into a disk based conglomerate, declare a row buffer and then reuse that row buffer for every get() call.
key
- The key to hash on.StandardException
- Standard exception policy.public void getAllRuntimeStats(java.util.Properties prop) throws StandardException
prop
- The set of properties to append to.StandardException
- Standard exception policy.public java.lang.Object remove(java.lang.Object key) throws StandardException
a remove of a duplicate removes the entire duplicate list.
key
- The key of the row to remove.StandardException
- Standard exception policy.public void setAuxillaryRuntimeStats(java.util.Properties prop) throws StandardException
getRuntimeStats() will return both the auxillary stats and any BackingStoreHashtable() specific stats. Note that each call to setAuxillaryRuntimeStats() overwrites the Property set that was set previously.
prop
- The set of properties to append from.StandardException
- Standard exception policy.public boolean putRow(boolean needsToClone, DataValueDescriptor[] row, RowLocation rowLocation) throws StandardException
The in memory hash table will need to keep a reference to the row after the put call has returned. If "needsToClone" is true then the hash table will make a copy of the row and put that, else if "needsToClone" is false then the hash table will keep a reference to the row passed in and no copy will be made.
If routine returns false, then no reference is kept to the duplicate row which was rejected (thus allowing caller to reuse the object).
needsToClone
- does this routine have to make a copy of the row,
in order to keep a reference to it after return?row
- The row to insert into the table.rowLocation
- Location of row in conglomerate; could be null.StandardException
- Standard exception policy.public int size() throws StandardException
StandardException
- Standard exception policy.Apache Derby V10.14 Internals - Copyright © 2004,2018 The Apache Software Foundation. All Rights Reserved.