程序包 weka.clusterers

类 EM

所有已实现的接口:
Serializable, Cloneable, Clusterer, DensityBasedClusterer, NumberOfClustersRequestable, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, WeightedInstancesHandler

Simple EM (expectation maximisation) class.

EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. EM can decide how many clusters to create by cross validation, or you may specify apriori how many clusters to generate.

The cross validation performed to determine the number of clusters is done in the following steps:
1. the number of clusters is set to 1
2. the training set is split randomly into 10 folds.
3. EM is performed 10 times using the 10 folds the usual CV way.
4. the loglikelihood is averaged over all 10 results.
5. if loglikelihood has increased the number of clusters is increased by 1 and the program continues at step 2.

The number of folds is fixed to 10, as long as the number of instances in the training set is not smaller 10. If this is the case the number of folds is set equal to the number of instances.

Valid options are:

 -N <num>
  number of clusters. If omitted or -1 specified, then 
  cross validation is used to select the number of clusters.
 -I <num>
  max iterations.
 (default 100)
 -V
  verbose.
 -M <num>
  minimum allowable standard deviation for normal density
  computation
  (default 1e-6)
 -O
  Display model in old format (good when there are many clusters)
 
 -S <num>
  Random number seed.
  (default 100)
版本:
$Revision: 9988 $
作者:
Mark Hall (mhall@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
另请参阅:
  • 构造器详细资料

    • EM

      public EM()
      Constructor.
  • 方法详细资料

    • globalInfo

      public String globalInfo()
      Returns a string describing this clusterer
      返回:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      指定者:
      listOptions 在接口中 OptionHandler
      覆盖:
      listOptions 在类中 RandomizableDensityBasedClusterer
      返回:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -N <num>
        number of clusters. If omitted or -1 specified, then 
        cross validation is used to select the number of clusters.
       -I <num>
        max iterations.
       (default 100)
       -V
        verbose.
       -M <num>
        minimum allowable standard deviation for normal density
        computation
        (default 1e-6)
       -O
        Display model in old format (good when there are many clusters)
       
       -S <num>
        Random number seed.
        (default 100)
      指定者:
      setOptions 在接口中 OptionHandler
      覆盖:
      setOptions 在类中 RandomizableDensityBasedClusterer
      参数:
      options - the list of options as an array of strings
      抛出:
      Exception - if an option is not supported
    • displayModelInOldFormatTipText

      public String displayModelInOldFormatTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDisplayModelInOldFormat

      public void setDisplayModelInOldFormat(boolean d)
      Set whether to display model output in the old, original format.
      参数:
      d - true if model ouput is to be shown in the old format
    • getDisplayModelInOldFormat

      public boolean getDisplayModelInOldFormat()
      Get whether to display model output in the old, original format.
      返回:
      true if model ouput is to be shown in the old format
    • minStdDevTipText

      public String minStdDevTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMinStdDev

      public void setMinStdDev(double m)
      Set the minimum value for standard deviation when calculating normal density. Reducing this value can help prevent arithmetic overflow resulting from multiplying large densities (arising from small standard deviations) when there are many singleton or near singleton values.
      参数:
      m - minimum value for standard deviation
    • setMinStdDevPerAtt

      public void setMinStdDevPerAtt(double[] m)
    • getMinStdDev

      public double getMinStdDev()
      Get the minimum allowable standard deviation.
      返回:
      the minumum allowable standard deviation
    • numClustersTipText

      public String numClustersTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumClusters

      public void setNumClusters(int n) throws Exception
      Set the number of clusters (-1 to select by CV).
      指定者:
      setNumClusters 在接口中 NumberOfClustersRequestable
      参数:
      n - the number of clusters
      抛出:
      Exception - if n is 0
    • getNumClusters

      public int getNumClusters()
      Get the number of clusters
      返回:
      the number of clusters.
    • maxIterationsTipText

      public String maxIterationsTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMaxIterations

      public void setMaxIterations(int i) throws Exception
      Set the maximum number of iterations to perform
      参数:
      i - the number of iterations
      抛出:
      Exception - if i is less than 1
    • getMaxIterations

      public int getMaxIterations()
      Get the maximum number of iterations
      返回:
      the number of iterations
    • debugTipText

      public String debugTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDebug

      public void setDebug(boolean v)
      Set debug mode - verbose output
      参数:
      v - true for verbose output
    • getDebug

      public boolean getDebug()
      Get debug mode
      返回:
      true if debug mode is set
    • getOptions

      public String[] getOptions()
      Gets the current settings of EM.
      指定者:
      getOptions 在接口中 OptionHandler
      覆盖:
      getOptions 在类中 RandomizableDensityBasedClusterer
      返回:
      an array of strings suitable for passing to setOptions()
    • getClusterModelsNumericAtts

      public double[][][] getClusterModelsNumericAtts()
      Return the normal distributions for the cluster models
      返回:
      a double[][][] value
    • getClusterPriors

      public double[] getClusterPriors()
      Return the priors for the clusters
      返回:
      a double[] value
    • toString

      public String toString()
      Outputs the generated clusters into a string.
      覆盖:
      toString 在类中 Object
      返回:
      the clusterer in string representation
    • numberOfClusters

      public int numberOfClusters() throws Exception
      Returns the number of clusters.
      指定者:
      numberOfClusters 在接口中 Clusterer
      指定者:
      numberOfClusters 在类中 AbstractClusterer
      返回:
      the number of clusters generated for a training dataset.
      抛出:
      Exception - if number of clusters could not be returned successfully
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the clusterer (i.e., the ones of SimpleKMeans).
      指定者:
      getCapabilities 在接口中 CapabilitiesHandler
      指定者:
      getCapabilities 在接口中 Clusterer
      覆盖:
      getCapabilities 在类中 AbstractClusterer
      返回:
      the capabilities of this clusterer
      另请参阅:
    • buildClusterer

      public void buildClusterer(Instances data) throws Exception
      Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.
      指定者:
      buildClusterer 在接口中 Clusterer
      指定者:
      buildClusterer 在类中 AbstractClusterer
      参数:
      data - set of instances serving as training data
      抛出:
      Exception - if the clusterer has not been generated successfully
    • clusterPriors

      public double[] clusterPriors()
      Returns the cluster priors.
      指定者:
      clusterPriors 在接口中 DensityBasedClusterer
      指定者:
      clusterPriors 在类中 AbstractDensityBasedClusterer
      返回:
      the cluster priors
    • logDensityPerClusterForInstance

      public double[] logDensityPerClusterForInstance(Instance inst) throws Exception
      Computes the log of the conditional density (per cluster) for a given instance.
      指定者:
      logDensityPerClusterForInstance 在接口中 DensityBasedClusterer
      指定者:
      logDensityPerClusterForInstance 在类中 AbstractDensityBasedClusterer
      参数:
      inst - the instance to compute the density for
      返回:
      an array containing the estimated densities
      抛出:
      Exception - if the density could not be computed successfully
    • getRevision

      public String getRevision()
      Returns the revision string.
      指定者:
      getRevision 在接口中 RevisionHandler
      覆盖:
      getRevision 在类中 AbstractClusterer
      返回:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      参数:
      argv - should contain the following arguments:

      -t training file [-T test file] [-N number of clusters] [-S random seed]