Class ArrayDigest


  • public class ArrayDigest
    extends AbstractTDigest
    Array based implementation of a TDigest.

    This implementation is essentially a one-level b-tree in which nodes are collected into pages typically with 32 values per page. Commonly, an ArrayDigest contains 500-3000 centroids. With 32 values per page, we have about 32 values per page and about 30 pages which seems to give a nice balance for speed. Sizes from 4 to 100 are plausible, however.

    • Constructor Summary

      Constructors 
      Constructor Description
      ArrayDigest​(int pageSize, double compression)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void add​(double x, int w)
      Adds a sample to a histogram.
      java.util.Iterator<com.tdunning.math.stats.ArrayDigest.Index> allAfter​(double x)  
      java.util.Iterator<com.tdunning.math.stats.ArrayDigest.Index> allBefore​(double x)
      Returns an iterator which will give each element <= to x in non-increasing order.
      void asBytes​(java.nio.ByteBuffer buf)
      Outputs a histogram as bytes using a particularly cheesy encoding.
      void asSmallBytes​(java.nio.ByteBuffer buf)
      Serialize this TDigest into a byte buffer.
      int byteSize()
      Returns an upper bound on the number bytes that will be required to represent this histogram.
      double cdf​(double x)
      Returns the fraction of all points added which are <= x.
      com.tdunning.math.stats.ArrayDigest.Index ceiling​(double x)  
      int centroidCount()
      The number of centroids currently in the TDigest.
      java.lang.Iterable<? extends Centroid> centroids()
      An iterable that lets you go through the centroids in ascending order by mean.
      void compress()
      Re-examines a t-digest to determine whether some centroids are redundant.
      void compress​(GroupTree other)  
      double compression()
      Returns the current compression factor.
      int count​(com.tdunning.math.stats.ArrayDigest.Index index)  
      com.tdunning.math.stats.ArrayDigest.Index floor​(double x)
      Returns a cursor pointing to the first element <= x.
      static ArrayDigest fromBytes​(java.nio.ByteBuffer buf)
      Reads a histogram from a byte buffer
      long headSum​(com.tdunning.math.stats.ArrayDigest.Index limit)  
      com.tdunning.math.stats.ArrayDigest.Index increment​(com.tdunning.math.stats.ArrayDigest.Index x, int delta)  
      double mean​(com.tdunning.math.stats.ArrayDigest.Index index)  
      double quantile​(double q)
      Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
      long size()
      Returns the number of points that have been added to this TDigest.
      int smallByteSize()
      Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ArrayDigest

        public ArrayDigest​(int pageSize,
                           double compression)
    • Method Detail

      • add

        public void add​(double x,
                        int w)
        Description copied from class: TDigest
        Adds a sample to a histogram.
        Specified by:
        add in class TDigest
        Parameters:
        x - The value to add.
        w - The weight of this point.
      • headSum

        public long headSum​(com.tdunning.math.stats.ArrayDigest.Index limit)
      • mean

        public double mean​(com.tdunning.math.stats.ArrayDigest.Index index)
      • count

        public int count​(com.tdunning.math.stats.ArrayDigest.Index index)
      • compress

        public void compress()
        Description copied from class: TDigest
        Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space.

        The cost is roughly the same as adding as many data points as there are centroids. This is typically < 10 * compression, but could be as high as 100 * compression.

        This is a destructive operation that is not thread-safe.

        Specified by:
        compress in class TDigest
      • size

        public long size()
        Description copied from class: TDigest
        Returns the number of points that have been added to this TDigest.
        Specified by:
        size in class TDigest
        Returns:
        The sum of the weights on all centroids.
      • cdf

        public double cdf​(double x)
        Description copied from class: TDigest
        Returns the fraction of all points added which are <= x.
        Specified by:
        cdf in class TDigest
      • quantile

        public double quantile​(double q)
        Description copied from class: TDigest
        Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
        Specified by:
        quantile in class TDigest
        Parameters:
        q - The desired fraction
        Returns:
        The value x such that cdf(x) == q
      • centroidCount

        public int centroidCount()
        Description copied from class: TDigest
        The number of centroids currently in the TDigest.
        Specified by:
        centroidCount in class TDigest
        Returns:
        The number of centroids
      • centroids

        public java.lang.Iterable<? extends Centroid> centroids()
        Description copied from class: TDigest
        An iterable that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
        Specified by:
        centroids in class TDigest
        Returns:
        The centroids in the form of an Iterable.
      • allAfter

        public java.util.Iterator<com.tdunning.math.stats.ArrayDigest.Index> allAfter​(double x)
      • floor

        public com.tdunning.math.stats.ArrayDigest.Index floor​(double x)
        Returns a cursor pointing to the first element <= x. Exposed only for testing.
        Parameters:
        x - The value used to find the cursor.
        Returns:
        The cursor.
      • ceiling

        public com.tdunning.math.stats.ArrayDigest.Index ceiling​(double x)
      • allBefore

        public java.util.Iterator<com.tdunning.math.stats.ArrayDigest.Index> allBefore​(double x)
        Returns an iterator which will give each element <= to x in non-increasing order.
        Parameters:
        x - The upper bound of all returned elements
        Returns:
        An iterator that returns elements in non-increasing order.
      • increment

        public com.tdunning.math.stats.ArrayDigest.Index increment​(com.tdunning.math.stats.ArrayDigest.Index x,
                                                                   int delta)
      • compression

        public double compression()
        Description copied from class: TDigest
        Returns the current compression factor.
        Specified by:
        compression in class TDigest
        Returns:
        The compression factor originally used to set up the TDigest.
      • byteSize

        public int byteSize()
        Returns an upper bound on the number bytes that will be required to represent this histogram.
        Specified by:
        byteSize in class TDigest
        Returns:
        The number of bytes required.
      • smallByteSize

        public int smallByteSize()
        Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.
        Specified by:
        smallByteSize in class TDigest
        Returns:
        The number of bytes required.
      • asBytes

        public void asBytes​(java.nio.ByteBuffer buf)
        Outputs a histogram as bytes using a particularly cheesy encoding.
        Specified by:
        asBytes in class TDigest
        Parameters:
        buf - The byte buffer into which the TDigest should be serialized.
      • asSmallBytes

        public void asSmallBytes​(java.nio.ByteBuffer buf)
        Description copied from class: TDigest
        Serialize this TDigest into a byte buffer. Some simple compression is used such as using variable byte representation to store the centroid weights and using delta-encoding on the centroid means so that floats can be reasonably used to store the centroid means.
        Specified by:
        asSmallBytes in class TDigest
        Parameters:
        buf - The byte buffer into which the TDigest should be serialized.
      • fromBytes

        public static ArrayDigest fromBytes​(java.nio.ByteBuffer buf)
        Reads a histogram from a byte buffer
        Returns:
        The new histogram structure