程序包 weka.clusterers

类 SimpleKMeans

所有已实现的接口:
Serializable, Cloneable, Clusterer, NumberOfClustersRequestable, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, WeightedInstancesHandler

public class SimpleKMeans extends RandomizableClusterer implements NumberOfClustersRequestable, WeightedInstancesHandler
Cluster data using the k means algorithm

Valid options are:

 -N <num>
  number of clusters.
  (default 2).
 
 -V
  Display std. deviations for centroids.
 
 -M
  Replace missing values with mean/mode.
 
 -S <num>
  Random number seed.
  (default 10)
 
 -A <classname and options>
  Distance function to be used for instance comparison
  (default weka.core.EuclidianDistance)
 
 -I <num>
  Maximum number of iterations.
 
 -O 
  Preserve order of instances.
 
版本:
$Revision: 10537 $
作者:
Mark Hall (mhall@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
另请参阅:
  • 构造器详细资料

    • SimpleKMeans

      public SimpleKMeans()
      the default constructor
  • 方法详细资料

    • globalInfo

      public String globalInfo()
      Returns a string describing this clusterer
      返回:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the clusterer.
      指定者:
      getCapabilities 在接口中 CapabilitiesHandler
      指定者:
      getCapabilities 在接口中 Clusterer
      覆盖:
      getCapabilities 在类中 AbstractClusterer
      返回:
      the capabilities of this clusterer
      另请参阅:
    • buildClusterer

      public void buildClusterer(Instances data) throws Exception
      Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.
      指定者:
      buildClusterer 在接口中 Clusterer
      指定者:
      buildClusterer 在类中 AbstractClusterer
      参数:
      data - set of instances serving as training data
      抛出:
      Exception - if the clusterer has not been generated successfully
    • clusterInstance

      public int clusterInstance(Instance instance) throws Exception
      Classifies a given instance.
      指定者:
      clusterInstance 在接口中 Clusterer
      覆盖:
      clusterInstance 在类中 AbstractClusterer
      参数:
      instance - the instance to be assigned to a cluster
      返回:
      the number of the assigned cluster as an interger if the class is enumerated, otherwise the predicted value
      抛出:
      Exception - if instance could not be classified successfully
    • numberOfClusters

      public int numberOfClusters() throws Exception
      Returns the number of clusters.
      指定者:
      numberOfClusters 在接口中 Clusterer
      指定者:
      numberOfClusters 在类中 AbstractClusterer
      返回:
      the number of clusters generated for a training dataset.
      抛出:
      Exception - if number of clusters could not be returned successfully
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      指定者:
      listOptions 在接口中 OptionHandler
      覆盖:
      listOptions 在类中 RandomizableClusterer
      返回:
      an enumeration of all the available options.
    • numClustersTipText

      public String numClustersTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumClusters

      public void setNumClusters(int n) throws Exception
      set the number of clusters to generate
      指定者:
      setNumClusters 在接口中 NumberOfClustersRequestable
      参数:
      n - the number of clusters to generate
      抛出:
      Exception - if number of clusters is negative
    • getNumClusters

      public int getNumClusters()
      gets the number of clusters to generate
      返回:
      the number of clusters to generate
    • maxIterationsTipText

      public String maxIterationsTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMaxIterations

      public void setMaxIterations(int n) throws Exception
      set the maximum number of iterations to be executed
      参数:
      n - the maximum number of iterations
      抛出:
      Exception - if maximum number of iteration is smaller than 1
    • getMaxIterations

      public int getMaxIterations()
      gets the number of maximum iterations to be executed
      返回:
      the number of clusters to generate
    • displayStdDevsTipText

      public String displayStdDevsTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDisplayStdDevs

      public void setDisplayStdDevs(boolean stdD)
      Sets whether standard deviations and nominal count Should be displayed in the clustering output
      参数:
      stdD - true if std. devs and counts should be displayed
    • getDisplayStdDevs

      public boolean getDisplayStdDevs()
      Gets whether standard deviations and nominal count Should be displayed in the clustering output
      返回:
      true if std. devs and counts should be displayed
    • dontReplaceMissingValuesTipText

      public String dontReplaceMissingValuesTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDontReplaceMissingValues

      public void setDontReplaceMissingValues(boolean r)
      Sets whether missing values are to be replaced
      参数:
      r - true if missing values are to be replaced
    • getDontReplaceMissingValues

      public boolean getDontReplaceMissingValues()
      Gets whether missing values are to be replaced
      返回:
      true if missing values are to be replaced
    • distanceFunctionTipText

      public String distanceFunctionTipText()
      Returns the tip text for this property.
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getDistanceFunction

      public DistanceFunction getDistanceFunction()
      returns the distance function currently in use.
      返回:
      the distance function
    • setDistanceFunction

      public void setDistanceFunction(DistanceFunction df) throws Exception
      sets the distance function to use for instance comparison.
      参数:
      df - the new distance function to use
      抛出:
      Exception - if instances cannot be processed
    • preserveInstancesOrderTipText

      public String preserveInstancesOrderTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setPreserveInstancesOrder

      public void setPreserveInstancesOrder(boolean r)
      Sets whether order of instances must be preserved
      参数:
      r - true if missing values are to be replaced
    • getPreserveInstancesOrder

      public boolean getPreserveInstancesOrder()
      Gets whether order of instances must be preserved
      返回:
      true if missing values are to be replaced
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -N <num>
        number of clusters.
        (default 2).
       
       -V
        Display std. deviations for centroids.
       
       -M
        Replace missing values with mean/mode.
       
       -S <num>
        Random number seed.
        (default 10)
       
       -A <classname and options>
        Distance function to be used for instance comparison
        (default weka.core.EuclidianDistance)
       
       -I <num>
        Maximum number of iterations.
       
       -O
        Preserve order of instances.
       
      指定者:
      setOptions 在接口中 OptionHandler
      覆盖:
      setOptions 在类中 RandomizableClusterer
      参数:
      options - the list of options as an array of strings
      抛出:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of SimpleKMeans
      指定者:
      getOptions 在接口中 OptionHandler
      覆盖:
      getOptions 在类中 RandomizableClusterer
      返回:
      an array of strings suitable for passing to setOptions()
    • toString

      public String toString()
      return a string describing this clusterer
      覆盖:
      toString 在类中 Object
      返回:
      a description of the clusterer as a string
    • getClusterCentroids

      public Instances getClusterCentroids()
      Gets the the cluster centroids
      返回:
      the cluster centroids
    • getClusterStandardDevs

      public Instances getClusterStandardDevs()
      Gets the standard deviations of the numeric attributes in each cluster
      返回:
      the standard deviations of the numeric attributes in each cluster
    • getClusterNominalCounts

      public int[][][] getClusterNominalCounts()
      Returns for each cluster the frequency counts for the values of each nominal attribute
      返回:
      the counts
    • getSquaredError

      public double getSquaredError()
      Gets the squared error for all clusters
      返回:
      the squared error
    • getClusterSizes

      public int[] getClusterSizes()
      Gets the number of instances in each cluster
      返回:
      The number of instances in each cluster
    • getAssignments

      public int[] getAssignments() throws Exception
      Gets the assignments for each instance
      返回:
      Array of indexes of the centroid assigned to each instance
      抛出:
      Exception - if order of instances wasn't preserved or no assignments were made
    • getRevision

      public String getRevision()
      Returns the revision string.
      指定者:
      getRevision 在接口中 RevisionHandler
      覆盖:
      getRevision 在类中 AbstractClusterer
      返回:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      参数:
      argv - should contain the following arguments:

      -t training file [-N number of clusters]