程序包 weka.clusterers

类 XMeans

所有已实现的接口:
Serializable, Cloneable, Clusterer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class XMeans extends RandomizableClusterer implements TechnicalInformationHandler
Cluster data using the X-means algorithm.

X-Means is K-Means extended by an Improve-Structure part In this part of the algorithm the centers are attempted to be split in its region. The decision between the children of each center and itself is done comparing the BIC-values of the two structures.

For more information see:

Dan Pelleg, Andrew W. Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Seventeenth International Conference on Machine Learning, 727-734, 2000.

BibTeX:

 @inproceedings{Pelleg2000,
    author = {Dan Pelleg and Andrew W. Moore},
    booktitle = {Seventeenth International Conference on Machine Learning},
    pages = {727-734},
    publisher = {Morgan Kaufmann},
    title = {X-means: Extending K-means with Efficient Estimation of the Number of Clusters},
    year = {2000}
 }
 

Valid options are:

 -I <num>
  maximum number of overall iterations
  (default 1).
 -M <num>
  maximum number of iterations in the kMeans loop in
  the Improve-Parameter part 
  (default 1000).
 -J <num>
  maximum number of iterations in the kMeans loop
  for the splitted centroids in the Improve-Structure part 
  (default 1000).
 -L <num>
  minimum number of clusters
  (default 2).
 -H <num>
  maximum number of clusters
  (default 4).
 -B <value>
  distance value for binary attributes
  (default 1.0).
 -use-kdtree
  Uses the KDTree internally
  (default no).
 -K <KDTree class specification>
  Full class name of KDTree class to use, followed
  by scheme options.
  eg: "weka.core.neighboursearch.kdtrees.KDTree -P"
  (default no KDTree class used).
 -C <value>
  cutoff factor, takes the given percentage of the splitted 
  centroids if none of the children win
  (default 0.0).
 -D <distance function class specification>
  Full class name of Distance function class to use, followed
  by scheme options.
  (default weka.core.EuclideanDistance).
 -N <file name>
  file to read starting centers from (ARFF format).
 -O <file name>
  file to write centers to (ARFF format).
 -U <int>
  The debug level.
  (default 0)
 -Y <file name>
  The debug vectors file.
 -S <num>
  Random number seed.
  (default 10)
版本:
$Revision: 9986 $
作者:
Gabi Schmidberger (gabi@cs.waikato.ac.nz), Mark Hall (mhall@cs.waikato.ac.nz), Malcolm Ware (mfw4@cs.waikato.ac.nz)
另请参阅:
  • 字段详细资料

    • R_LOW

      public static int R_LOW
      Index in ranges for LOW.
    • R_HIGH

      public static int R_HIGH
      Index in ranges for HIGH.
    • R_WIDTH

      public static int R_WIDTH
      Index in ranges for WIDTH.
    • D_PRINTCENTERS

      public static int D_PRINTCENTERS
      print the centers.
    • D_FOLLOWSPLIT

      public static int D_FOLLOWSPLIT
      follows the splitting of the centers.
    • D_CONVCHCLOSER

      public static int D_CONVCHCLOSER
      have a closer look at converge children.
    • D_RANDOMVECTOR

      public static int D_RANDOMVECTOR
      check on random vectors.
    • D_KDTREE

      public static int D_KDTREE
      check on kdtree.
    • D_ITERCOUNT

      public static int D_ITERCOUNT
      follow iterations.
    • D_METH_MISUSE

      public static int D_METH_MISUSE
      functions were maybe misused.
    • D_CURR

      public static int D_CURR
      for current debug.
    • D_GENERAL

      public static int D_GENERAL
      general debugging.
    • m_CurrDebugFlag

      public boolean m_CurrDebugFlag
      Flag: I'm debugging.
  • 构造器详细资料

    • XMeans

      public XMeans()
      the default constructor.
  • 方法详细资料

    • globalInfo

      public String globalInfo()
      Returns a string describing this clusterer.
      返回:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      指定者:
      getTechnicalInformation 在接口中 TechnicalInformationHandler
      返回:
      the technical information about this class
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the clusterer.
      指定者:
      getCapabilities 在接口中 CapabilitiesHandler
      指定者:
      getCapabilities 在接口中 Clusterer
      覆盖:
      getCapabilities 在类中 AbstractClusterer
      返回:
      the capabilities of this clusterer
      另请参阅:
    • buildClusterer

      public void buildClusterer(Instances data) throws Exception
      Generates the X-Means clusterer.
      指定者:
      buildClusterer 在接口中 Clusterer
      指定者:
      buildClusterer 在类中 AbstractClusterer
      参数:
      data - set of instances serving as training data
      抛出:
      Exception - if the clusterer has not been generated successfully
    • checkForNominalAttributes

      public boolean checkForNominalAttributes(Instances data)
      Checks for nominal attributes in the dataset. Class attribute is ignored.
      参数:
      data - the data to check
      返回:
      false if no nominal attributes are present
    • clusterInstance

      public int clusterInstance(Instance instance) throws Exception
      Classifies a given instance.
      指定者:
      clusterInstance 在接口中 Clusterer
      覆盖:
      clusterInstance 在类中 AbstractClusterer
      参数:
      instance - the instance to be assigned to a cluster
      返回:
      the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
      抛出:
      Exception - if instance could not be classified successfully
    • numberOfClusters

      public int numberOfClusters()
      Returns the number of clusters.
      指定者:
      numberOfClusters 在接口中 Clusterer
      指定者:
      numberOfClusters 在类中 AbstractClusterer
      返回:
      the number of clusters generated for a training dataset.
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      指定者:
      listOptions 在接口中 OptionHandler
      覆盖:
      listOptions 在类中 RandomizableClusterer
      返回:
      an enumeration of all the available options
    • minNumClustersTipText

      public String minNumClustersTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property
    • setMinNumClusters

      public void setMinNumClusters(int n)
      Sets the minimum number of clusters to generate.
      参数:
      n - the minimum number of clusters to generate
    • getMinNumClusters

      public int getMinNumClusters()
      Gets the minimum number of clusters to generate.
      返回:
      the minimum number of clusters to generate
    • maxNumClustersTipText

      public String maxNumClustersTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property
    • setMaxNumClusters

      public void setMaxNumClusters(int n)
      Sets the maximum number of clusters to generate.
      参数:
      n - the maximum number of clusters to generate
    • getMaxNumClusters

      public int getMaxNumClusters()
      Gets the maximum number of clusters to generate.
      返回:
      the maximum number of clusters to generate
    • maxIterationsTipText

      public String maxIterationsTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property
    • setMaxIterations

      public void setMaxIterations(int i) throws Exception
      Sets the maximum number of iterations to perform.
      参数:
      i - the number of iterations
      抛出:
      Exception - if i is less than 1
    • getMaxIterations

      public int getMaxIterations()
      Gets the maximum number of iterations.
      返回:
      the number of iterations
    • maxKMeansTipText

      public String maxKMeansTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property
    • setMaxKMeans

      public void setMaxKMeans(int i)
      Set the maximum number of iterations to perform in KMeans.
      参数:
      i - the number of iterations
    • getMaxKMeans

      public int getMaxKMeans()
      Gets the maximum number of iterations in KMeans.
      返回:
      the number of iterations
    • maxKMeansForChildrenTipText

      public String maxKMeansForChildrenTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property
    • setMaxKMeansForChildren

      public void setMaxKMeansForChildren(int i)
      Sets the maximum number of iterations KMeans that is performed on the child centers.
      参数:
      i - the number of iterations
    • getMaxKMeansForChildren

      public int getMaxKMeansForChildren()
      Gets the maximum number of iterations in KMeans.
      返回:
      the number of iterations
    • cutOffFactorTipText

      public String cutOffFactorTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property
    • setCutOffFactor

      public void setCutOffFactor(double i)
      Sets a new cutoff factor.
      参数:
      i - the new cutoff factor
    • getCutOffFactor

      public double getCutOffFactor()
      Gets the cutoff factor.
      返回:
      the cutoff factor
    • binValueTipText

      public String binValueTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getBinValue

      public double getBinValue()
      Gets value that represents true in a new numeric attribute. (False is always represented by 0.0.)
      返回:
      the value that represents true in a new numeric attribute
    • setBinValue

      public void setBinValue(double value)
      Sets the distance value between true and false of binary attributes. and "same" and "different" of nominal attributes
      参数:
      value - the distance
    • distanceFTipText

      public String distanceFTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDistanceF

      public void setDistanceF(DistanceFunction distanceF)
      gets the "binary" distance value.
      参数:
      distanceF - the distance function with all options set
    • getDistanceF

      public DistanceFunction getDistanceF()
      Gets the distance function.
      返回:
      the distance function
    • debugVectorsFileTipText

      public String debugVectorsFileTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDebugVectorsFile

      public void setDebugVectorsFile(File value)
      Sets the file that has the random vectors stored. Only used for debugging reasons.
      参数:
      value - the file to read the random vectors from
    • getDebugVectorsFile

      public File getDebugVectorsFile()
      Gets the file name for a file that has the random vectors stored. Only used for debugging purposes.
      返回:
      the file to read the vectors from
    • initDebugVectorsInput

      public void initDebugVectorsInput() throws Exception
      Initialises the debug vector input.
      抛出:
      Exception - if there is error opening the debug input file.
    • getNextDebugVectorsInstance

      public Instance getNextDebugVectorsInstance(Instances model) throws Exception
      Read an instance from debug vectors file.
      参数:
      model - the data model for the instance.
      返回:
      the next debug vector.
      抛出:
      Exception - if there are no debug vector in m_DebugVectors.
    • inputCenterFileTipText

      public String inputCenterFileTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setInputCenterFile

      public void setInputCenterFile(File value)
      Sets the file to read the list of centers from.
      参数:
      value - the file to read centers from
    • getInputCenterFile

      public File getInputCenterFile()
      Gets the file to read the list of centers from.
      返回:
      the file to read the centers from
    • outputCenterFileTipText

      public String outputCenterFileTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setOutputCenterFile

      public void setOutputCenterFile(File value)
      Sets file to write the list of centers to.
      参数:
      value - file to write centers to
    • getOutputCenterFile

      public File getOutputCenterFile()
      Gets the file to write the list of centers to.
      返回:
      filename of the file to write centers to
    • KDTreeTipText

      public String KDTreeTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setKDTree

      public void setKDTree(KDTree k)
      Sets the KDTree class.
      参数:
      k - a KDTree object with all options set
    • getKDTree

      public KDTree getKDTree()
      Gets the KDTree class.
      返回:
      the configured KDTree
    • useKDTreeTipText

      public String useKDTreeTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setUseKDTree

      public void setUseKDTree(boolean value)
      Sets whether to use the KDTree or not.
      参数:
      value - if true the KDTree is used
    • getUseKDTree

      public boolean getUseKDTree()
      Gets whether the KDTree is used or not.
      返回:
      true if KDTrees are used
    • debugLevelTipText

      public String debugLevelTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDebugLevel

      public void setDebugLevel(int d)
      Sets the debug level. debug level = 0, means no output
      参数:
      d - debuglevel
    • getDebugLevel

      public int getDebugLevel()
      Gets the debug level.
      返回:
      debug level
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -I <num>
        maximum number of overall iterations
        (default 1).
       -M <num>
        maximum number of iterations in the kMeans loop in
        the Improve-Parameter part 
        (default 1000).
       -J <num>
        maximum number of iterations in the kMeans loop
        for the splitted centroids in the Improve-Structure part 
        (default 1000).
       -L <num>
        minimum number of clusters
        (default 2).
       -H <num>
        maximum number of clusters
        (default 4).
       -B <value>
        distance value for binary attributes
        (default 1.0).
       -use-kdtree
        Uses the KDTree internally
        (default no).
       -K <KDTree class specification>
        Full class name of KDTree class to use, followed
        by scheme options.
        eg: "weka.core.neighboursearch.kdtrees.KDTree -P"
        (default no KDTree class used).
       -C <value>
        cutoff factor, takes the given percentage of the splitted 
        centroids if none of the children win
        (default 0.0).
       -D <distance function class specification>
        Full class name of Distance function class to use, followed
        by scheme options.
        (default weka.core.EuclideanDistance).
       -N <file name>
        file to read starting centers from (ARFF format).
       -O <file name>
        file to write centers to (ARFF format).
       -U <int>
        The debug level.
        (default 0)
       -Y <file name>
        The debug vectors file.
       -S <num>
        Random number seed.
        (default 10)
      指定者:
      setOptions 在接口中 OptionHandler
      覆盖:
      setOptions 在类中 RandomizableClusterer
      参数:
      options - the list of options as an array of strings
      抛出:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of SimpleKMeans.
      指定者:
      getOptions 在接口中 OptionHandler
      覆盖:
      getOptions 在类中 RandomizableClusterer
      返回:
      an array of strings suitable for passing to setOptions
    • toString

      public String toString()
      Return a string describing this clusterer.
      覆盖:
      toString 在类中 Object
      返回:
      a description of the clusterer as a string
    • getClusterCenters

      public Instances getClusterCenters()
      Return the centers of the clusters as an Instances object
      返回:
      the cluster centers.
    • getRevision

      public String getRevision()
      Returns the revision string.
      指定者:
      getRevision 在接口中 RevisionHandler
      覆盖:
      getRevision 在类中 AbstractClusterer
      返回:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      参数:
      argv - should contain options