类 BIRCHCluster

所有已实现的接口:
Serializable, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class BIRCHCluster extends ClusterGenerator implements TechnicalInformationHandler
Cluster data generator designed for the BIRCH System

Dataset is generated with instances in K clusters.
Instances are 2-d data points.
Each cluster is characterized by the number of data points in itits radius and its center. The location of the cluster centers isdetermined by the pattern parameter. Three patterns are currentlysupported grid, sine and random.

For more information refer to:

Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: ACM SIGMOD International Conference on Management of Data, 103-114, 1996.

BibTeX:

 @inproceedings{Zhang1996,
    author = {Tian Zhang and Raghu Ramakrishnan and Miron Livny},
    booktitle = {ACM SIGMOD International Conference on Management of Data},
    pages = {103-114},
    publisher = {ACM Press},
    title = {BIRCH: An Efficient Data Clustering Method for Very Large Databases},
    year = {1996}
 }
 

Valid options are:

 -h
  Prints this help.
 -o <file>
  The name of the output file, otherwise the generated data is
  printed to stdout.
 -r <name>
  The name of the relation.
 -d
  Whether to print debug informations.
 -S
  The seed for random function (default 1)
 -a <num>
  The number of attributes (default 10).
 -c
  Class Flag, if set, the cluster is listed in extra attribute.
 -b <range>
  The indices for boolean attributes.
 -m <range>
  The indices for nominal attributes.
 -k <num>
  The number of clusters (default 4)
 -G
  Set pattern to grid (default is random).
  This flag cannot be used at the same time as flag I.
  The pattern is random, if neither flag G nor flag I is set.
 -I
  Set pattern to sine (default is random).
  This flag cannot be used at the same time as flag I.
  The pattern is random, if neither flag G nor flag I is set.
 -N <num>..<num>
  The range of number of instances per cluster (default 1..50).
  Lower number must be between 0 and 2500,
  upper number must be between 50 and 2500.
 -R <num>..<num>
  The range of radius per cluster (default 0.1..1.4142135623730951).
  Lower number must be between 0 and SQRT(2), 
  upper number must be between SQRT(2) and SQRT(32).
 -M <num>
  The distance multiplier (default 4.0).
 -C <num>
  The number of cycles (default 4).
 -O
  Flag for input order is ORDERED. If flag is not set then 
  input order is RANDOMIZED. RANDOMIZED is currently not 
  implemented, therefore is the input order always ORDERED.
 -P <num>
  The noise rate in percent (default 0.0).
  Can be between 0% and 30%. (Remark: The original 
  algorithm only allows noise up to 10%.)
版本:
$Revision: 1.8 $
作者:
Gabi Schmidberger (gabi@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
另请参阅:
  • 字段详细资料

    • GRID

      public static final int GRID
      Constant set for choice of pattern. (option G)
      另请参阅:
    • SINE

      public static final int SINE
      Constant set for choice of pattern. (option I)
      另请参阅:
    • RANDOM

      public static final int RANDOM
      Constant set for choice of pattern. (default)
      另请参阅:
    • TAGS_PATTERN

      public static final Tag[] TAGS_PATTERN
      the pattern tags
    • ORDERED

      public static final int ORDERED
      Constant set for input order (option O)
      另请参阅:
    • RANDOMIZED

      public static final int RANDOMIZED
      Constant set for input order (default)
      另请参阅:
    • TAGS_INPUTORDER

      public static final Tag[] TAGS_INPUTORDER
      the input order tags
  • 构造器详细资料

    • BIRCHCluster

      public BIRCHCluster()
      initializes the generator with default values
  • 方法详细资料

    • globalInfo

      public String globalInfo()
      Returns a string describing this data generator.
      返回:
      a description of the data generator suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      指定者:
      getTechnicalInformation 在接口中 TechnicalInformationHandler
      返回:
      the technical information about this class
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      指定者:
      listOptions 在接口中 OptionHandler
      覆盖:
      listOptions 在类中 ClusterGenerator
      返回:
      an enumeration of all the available options
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a list of options for this object.

      Valid options are:

       -h
        Prints this help.
       -o <file>
        The name of the output file, otherwise the generated data is
        printed to stdout.
       -r <name>
        The name of the relation.
       -d
        Whether to print debug informations.
       -S
        The seed for random function (default 1)
       -a <num>
        The number of attributes (default 10).
       -c
        Class Flag, if set, the cluster is listed in extra attribute.
       -b <range>
        The indices for boolean attributes.
       -m <range>
        The indices for nominal attributes.
       -k <num>
        The number of clusters (default 4)
       -G
        Set pattern to grid (default is random).
        This flag cannot be used at the same time as flag I.
        The pattern is random, if neither flag G nor flag I is set.
       -I
        Set pattern to sine (default is random).
        This flag cannot be used at the same time as flag I.
        The pattern is random, if neither flag G nor flag I is set.
       -N <num>..<num>
        The range of number of instances per cluster (default 1..50).
        Lower number must be between 0 and 2500,
        upper number must be between 50 and 2500.
       -R <num>..<num>
        The range of radius per cluster (default 0.1..1.4142135623730951).
        Lower number must be between 0 and SQRT(2), 
        upper number must be between SQRT(2) and SQRT(32).
       -M <num>
        The distance multiplier (default 4.0).
       -C <num>
        The number of cycles (default 4).
       -O
        Flag for input order is ORDERED. If flag is not set then 
        input order is RANDOMIZED. RANDOMIZED is currently not 
        implemented, therefore is the input order always ORDERED.
       -P <num>
        The noise rate in percent (default 0.0).
        Can be between 0% and 30%. (Remark: The original 
        algorithm only allows noise up to 10%.)
      指定者:
      setOptions 在接口中 OptionHandler
      覆盖:
      setOptions 在类中 ClusterGenerator
      参数:
      options - the list of options as an array of strings
      抛出:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the datagenerator BIRCHCluster.
      指定者:
      getOptions 在接口中 OptionHandler
      覆盖:
      getOptions 在类中 ClusterGenerator
      返回:
      an array of strings suitable for passing to setOptions
      另请参阅:
      • DataGenerator.removeBlacklist(String[])
    • setNumClusters

      public void setNumClusters(int numClusters)
      Sets the number of clusters the dataset should have.
      参数:
      numClusters - the new number of clusters
    • getNumClusters

      public int getNumClusters()
      Gets the number of clusters the dataset should have.
      返回:
      the number of clusters the dataset should have
    • numClustersTipText

      public String numClustersTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getMinInstNum

      public int getMinInstNum()
      Gets the lower boundary for instances per cluster.
      返回:
      the the lower boundary for instances per cluster
    • setMinInstNum

      public void setMinInstNum(int newMinInstNum)
      Sets the lower boundary for instances per cluster.
      参数:
      newMinInstNum - new lower boundary for instances per cluster
    • minInstNumTipText

      public String minInstNumTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getMaxInstNum

      public int getMaxInstNum()
      Gets the upper boundary for instances per cluster.
      返回:
      the upper boundary for instances per cluster
    • setMaxInstNum

      public void setMaxInstNum(int newMaxInstNum)
      Sets the upper boundary for instances per cluster.
      参数:
      newMaxInstNum - new upper boundary for instances per cluster
    • maxInstNumTipText

      public String maxInstNumTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getMinRadius

      public double getMinRadius()
      Gets the lower boundary for the radiuses of the clusters.
      返回:
      the lower boundary for the radiuses of the clusters
    • setMinRadius

      public void setMinRadius(double newMinRadius)
      Sets the lower boundary for the radiuses of the clusters.
      参数:
      newMinRadius - new lower boundary for the radiuses of the clusters
    • minRadiusTipText

      public String minRadiusTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getMaxRadius

      public double getMaxRadius()
      Gets the upper boundary for the radiuses of the clusters.
      返回:
      the upper boundary for the radiuses of the clusters
    • setMaxRadius

      public void setMaxRadius(double newMaxRadius)
      Sets the upper boundary for the radiuses of the clusters.
      参数:
      newMaxRadius - new upper boundary for the radiuses of the clusters
    • maxRadiusTipText

      public String maxRadiusTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getPattern

      public SelectedTag getPattern()
      Gets the pattern type.
      返回:
      the current pattern type
    • setPattern

      public void setPattern(SelectedTag value)
      Sets the pattern type.
      参数:
      value - new pattern type
    • patternTipText

      public String patternTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getDistMult

      public double getDistMult()
      Gets the distance multiplier.
      返回:
      the distance multiplier
    • setDistMult

      public void setDistMult(double newDistMult)
      Sets the distance multiplier.
      参数:
      newDistMult - new distance multiplier
    • distMultTipText

      public String distMultTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getNumCycles

      public int getNumCycles()
      Gets the number of cycles.
      返回:
      the number of cycles
    • setNumCycles

      public void setNumCycles(int newNumCycles)
      Sets the the number of cycles.
      参数:
      newNumCycles - new number of cycles
    • numCyclesTipText

      public String numCyclesTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getInputOrder

      public SelectedTag getInputOrder()
      Gets the input order.
      返回:
      the current input order
    • setInputOrder

      public void setInputOrder(SelectedTag value)
      Sets the input order.
      参数:
      value - new input order
    • inputOrderTipText

      public String inputOrderTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getOrderedFlag

      public boolean getOrderedFlag()
      Gets the ordered flag (option O).
      返回:
      true if ordered flag is set
    • getNoiseRate

      public double getNoiseRate()
      Gets the percentage of noise set.
      返回:
      the percentage of noise set
    • setNoiseRate

      public void setNoiseRate(double newNoiseRate)
      Sets the percentage of noise set.
      参数:
      newNoiseRate - new percentage of noise
    • noiseRateTipText

      public String noiseRateTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getSingleModeFlag

      public boolean getSingleModeFlag()
      Gets the single mode flag.
      指定者:
      getSingleModeFlag 在类中 DataGenerator
      返回:
      true if methode generateExample can be used.
    • defineDataFormat

      public Instances defineDataFormat() throws Exception
      Initializes the format for the dataset produced.
      覆盖:
      defineDataFormat 在类中 DataGenerator
      返回:
      the output data format
      抛出:
      Exception - data format could not be defined
      另请参阅:
      • DataGenerator.defaultRelationName()
    • generateExample

      public Instance generateExample() throws Exception
      Generate an example of the dataset.
      指定者:
      generateExample 在类中 DataGenerator
      返回:
      the instance generated
      抛出:
      Exception - if format not defined or generating
      examples one by one is not possible, because voting is chosen
    • generateExamples

      public Instances generateExamples() throws Exception
      Generate all examples of the dataset.
      指定者:
      generateExamples 在类中 DataGenerator
      返回:
      the instance generated
      抛出:
      Exception - if format not defined
    • generateExamples

      public Instances generateExamples(Random random, Instances format) throws Exception
      Generate all examples of the dataset.
      参数:
      random - the random number generator to use
      format - the dataset format
      返回:
      the instance generated
      抛出:
      Exception - if format not defined
    • generateFinished

      public String generateFinished() throws Exception
      Compiles documentation about the data generation after the generation process
      指定者:
      generateFinished 在类中 DataGenerator
      返回:
      string with additional information about generated dataset
      抛出:
      Exception - no input structure has been defined
    • generateStart

      public String generateStart()
      Compiles documentation about the data generation before the generation process
      指定者:
      generateStart 在类中 DataGenerator
      返回:
      string with additional information
    • getRevision

      public String getRevision()
      Returns the revision string.
      指定者:
      getRevision 在接口中 RevisionHandler
      返回:
      the revision
    • main

      public static void main(String[] args)
      Main method for testing this class.
      参数:
      args - should contain arguments for the data producer: