程序包 weka.clusterers
类 EM
- 所有已实现的接口:
Serializable
,Cloneable
,Clusterer
,DensityBasedClusterer
,NumberOfClustersRequestable
,CapabilitiesHandler
,OptionHandler
,Randomizable
,RevisionHandler
,WeightedInstancesHandler
public class EM
extends RandomizableDensityBasedClusterer
implements NumberOfClustersRequestable, WeightedInstancesHandler
Simple EM (expectation maximisation) class.
EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. EM can decide how many clusters to create by cross validation, or you may specify apriori how many clusters to generate.
The cross validation performed to determine the number of clusters is done in the following steps:
1. the number of clusters is set to 1
2. the training set is split randomly into 10 folds.
3. EM is performed 10 times using the 10 folds the usual CV way.
4. the loglikelihood is averaged over all 10 results.
5. if loglikelihood has increased the number of clusters is increased by 1 and the program continues at step 2.
The number of folds is fixed to 10, as long as the number of instances in the training set is not smaller 10. If this is the case the number of folds is set equal to the number of instances. Valid options are:
EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters. EM can decide how many clusters to create by cross validation, or you may specify apriori how many clusters to generate.
The cross validation performed to determine the number of clusters is done in the following steps:
1. the number of clusters is set to 1
2. the training set is split randomly into 10 folds.
3. EM is performed 10 times using the 10 folds the usual CV way.
4. the loglikelihood is averaged over all 10 results.
5. if loglikelihood has increased the number of clusters is increased by 1 and the program continues at step 2.
The number of folds is fixed to 10, as long as the number of instances in the training set is not smaller 10. If this is the case the number of folds is set equal to the number of instances. Valid options are:
-N <num> number of clusters. If omitted or -1 specified, then cross validation is used to select the number of clusters.
-I <num> max iterations. (default 100)
-V verbose.
-M <num> minimum allowable standard deviation for normal density computation (default 1e-6)
-O Display model in old format (good when there are many clusters)
-S <num> Random number seed. (default 100)
- 版本:
- $Revision: 9988 $
- 作者:
- Mark Hall (mhall@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
- 另请参阅:
-
构造器概要
构造器 -
方法概要
修饰符和类型方法说明void
buildClusterer
(Instances data) Generates a clusterer.double[]
Returns the cluster priors.Returns the tip text for this propertyReturns the tip text for this propertyReturns default capabilities of the clusterer (i.e., the ones of SimpleKMeans).double[][][]
Return the normal distributions for the cluster modelsdouble[]
Return the priors for the clustersboolean
getDebug()
Get debug modeboolean
Get whether to display model output in the old, original format.int
Get the maximum number of iterationsdouble
Get the minimum allowable standard deviation.int
Get the number of clustersString[]
Gets the current settings of EM.Returns the revision string.Returns a string describing this clustererReturns an enumeration describing the available options.double[]
Computes the log of the conditional density (per cluster) for a given instance.static void
Main method for testing this class.Returns the tip text for this propertyReturns the tip text for this propertyint
Returns the number of clusters.Returns the tip text for this propertyvoid
setDebug
(boolean v) Set debug mode - verbose outputvoid
setDisplayModelInOldFormat
(boolean d) Set whether to display model output in the old, original format.void
setMaxIterations
(int i) Set the maximum number of iterations to performvoid
setMinStdDev
(double m) Set the minimum value for standard deviation when calculating normal density.void
setMinStdDevPerAtt
(double[] m) void
setNumClusters
(int n) Set the number of clusters (-1 to select by CV).void
setOptions
(String[] options) Parses a given list of options.toString()
Outputs the generated clusters into a string.从类继承的方法 weka.clusterers.RandomizableDensityBasedClusterer
getSeed, seedTipText, setSeed
从类继承的方法 weka.clusterers.AbstractDensityBasedClusterer
distributionForInstance, logDensityForInstance, logJointDensitiesForInstance, makeCopies
从类继承的方法 weka.clusterers.AbstractClusterer
clusterInstance, forName, makeCopies, makeCopy
从接口继承的方法 weka.clusterers.Clusterer
clusterInstance
-
构造器详细资料
-
EM
public EM()Constructor.
-
-
方法详细资料
-
globalInfo
Returns a string describing this clusterer- 返回:
- a description of the evaluator suitable for displaying in the explorer/experimenter gui
-
listOptions
Returns an enumeration describing the available options.- 指定者:
listOptions
在接口中OptionHandler
- 覆盖:
listOptions
在类中RandomizableDensityBasedClusterer
- 返回:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-N <num> number of clusters. If omitted or -1 specified, then cross validation is used to select the number of clusters.
-I <num> max iterations. (default 100)
-V verbose.
-M <num> minimum allowable standard deviation for normal density computation (default 1e-6)
-O Display model in old format (good when there are many clusters)
-S <num> Random number seed. (default 100)
- 指定者:
setOptions
在接口中OptionHandler
- 覆盖:
setOptions
在类中RandomizableDensityBasedClusterer
- 参数:
options
- the list of options as an array of strings- 抛出:
Exception
- if an option is not supported
-
displayModelInOldFormatTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDisplayModelInOldFormat
public void setDisplayModelInOldFormat(boolean d) Set whether to display model output in the old, original format.- 参数:
d
- true if model ouput is to be shown in the old format
-
getDisplayModelInOldFormat
public boolean getDisplayModelInOldFormat()Get whether to display model output in the old, original format.- 返回:
- true if model ouput is to be shown in the old format
-
minStdDevTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMinStdDev
public void setMinStdDev(double m) Set the minimum value for standard deviation when calculating normal density. Reducing this value can help prevent arithmetic overflow resulting from multiplying large densities (arising from small standard deviations) when there are many singleton or near singleton values.- 参数:
m
- minimum value for standard deviation
-
setMinStdDevPerAtt
public void setMinStdDevPerAtt(double[] m) -
getMinStdDev
public double getMinStdDev()Get the minimum allowable standard deviation.- 返回:
- the minumum allowable standard deviation
-
numClustersTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNumClusters
Set the number of clusters (-1 to select by CV).- 指定者:
setNumClusters
在接口中NumberOfClustersRequestable
- 参数:
n
- the number of clusters- 抛出:
Exception
- if n is 0
-
getNumClusters
public int getNumClusters()Get the number of clusters- 返回:
- the number of clusters.
-
maxIterationsTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMaxIterations
Set the maximum number of iterations to perform- 参数:
i
- the number of iterations- 抛出:
Exception
- if i is less than 1
-
getMaxIterations
public int getMaxIterations()Get the maximum number of iterations- 返回:
- the number of iterations
-
debugTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDebug
public void setDebug(boolean v) Set debug mode - verbose output- 参数:
v
- true for verbose output
-
getDebug
public boolean getDebug()Get debug mode- 返回:
- true if debug mode is set
-
getOptions
Gets the current settings of EM.- 指定者:
getOptions
在接口中OptionHandler
- 覆盖:
getOptions
在类中RandomizableDensityBasedClusterer
- 返回:
- an array of strings suitable for passing to setOptions()
-
getClusterModelsNumericAtts
public double[][][] getClusterModelsNumericAtts()Return the normal distributions for the cluster models- 返回:
- a
double[][][]
value
-
getClusterPriors
public double[] getClusterPriors()Return the priors for the clusters- 返回:
- a
double[]
value
-
toString
Outputs the generated clusters into a string. -
numberOfClusters
Returns the number of clusters.- 指定者:
numberOfClusters
在接口中Clusterer
- 指定者:
numberOfClusters
在类中AbstractClusterer
- 返回:
- the number of clusters generated for a training dataset.
- 抛出:
Exception
- if number of clusters could not be returned successfully
-
getCapabilities
Returns default capabilities of the clusterer (i.e., the ones of SimpleKMeans).- 指定者:
getCapabilities
在接口中CapabilitiesHandler
- 指定者:
getCapabilities
在接口中Clusterer
- 覆盖:
getCapabilities
在类中AbstractClusterer
- 返回:
- the capabilities of this clusterer
- 另请参阅:
-
buildClusterer
Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.- 指定者:
buildClusterer
在接口中Clusterer
- 指定者:
buildClusterer
在类中AbstractClusterer
- 参数:
data
- set of instances serving as training data- 抛出:
Exception
- if the clusterer has not been generated successfully
-
clusterPriors
public double[] clusterPriors()Returns the cluster priors.- 指定者:
clusterPriors
在接口中DensityBasedClusterer
- 指定者:
clusterPriors
在类中AbstractDensityBasedClusterer
- 返回:
- the cluster priors
-
logDensityPerClusterForInstance
Computes the log of the conditional density (per cluster) for a given instance.- 指定者:
logDensityPerClusterForInstance
在接口中DensityBasedClusterer
- 指定者:
logDensityPerClusterForInstance
在类中AbstractDensityBasedClusterer
- 参数:
inst
- the instance to compute the density for- 返回:
- an array containing the estimated densities
- 抛出:
Exception
- if the density could not be computed successfully
-
getRevision
Returns the revision string.- 指定者:
getRevision
在接口中RevisionHandler
- 覆盖:
getRevision
在类中AbstractClusterer
- 返回:
- the revision
-
main
Main method for testing this class.- 参数:
argv
- should contain the following arguments:-t training file [-T test file] [-N number of clusters] [-S random seed]
-