Package weka.datagenerators.clusterers
Class SubspaceCluster
- java.lang.Object
-
- weka.datagenerators.DataGenerator
-
- weka.datagenerators.ClusterGenerator
-
- weka.datagenerators.clusterers.SubspaceCluster
-
- All Implemented Interfaces:
java.io.Serializable
,OptionHandler
,Randomizable
,RevisionHandler
public class SubspaceCluster extends ClusterGenerator
A data generator that produces data points in hyperrectangular subspace clusters. Valid options are:-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 1).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
-C <cluster-definition> A cluster definition of class 'SubspaceClusterDefinition' (definition needs to be quoted to be recognized as a single argument).
Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
-A <range> Generates randomly distributed instances in the cluster.
-U <range> Generates uniformly distributed instances in the cluster.
-G <range> Generates gaussian distributed instances in the cluster.
-D <num>,<num> The attribute min/max (-A and -U) or mean/stddev (-G) for the cluster.
-N <num>..<num> The range of number of instances per cluster (default 1..50).
-I Uses integer instead of continuous values (default continuous).
- Version:
- $Revision: 1.5 $
- Author:
- Gabi Schmidberger (gabi@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static int
CONTINUOUS
cluster subtype: continuousstatic int
GAUSSIAN
cluster type: gaussianstatic int
INTEGER
cluster subtype: integerstatic Tag[]
TAGS_CLUSTERSUBTYPE
the tags for the cluster typesstatic Tag[]
TAGS_CLUSTERTYPE
the tags for the cluster typesstatic int
TOTAL_UNIFORM
cluster type: total uniformstatic int
UNIFORM_RANDOM
cluster type: uniform/random
-
Constructor Summary
Constructors Constructor Description SubspaceCluster()
initializes the generator, sets the number of clusters to 0, since user has to specify them explicitly
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
clusterDefinitionsTipText()
Returns the tip text for this propertyInstances
defineDataFormat()
Initializes the format for the dataset produced.Instance
generateExample()
Generate an example of the dataset.Instances
generateExamples()
Generate all examples of the dataset.java.lang.String
generateFinished()
Compiles documentation about the data generation after the generation processjava.lang.String
generateStart()
Compiles documentation about the data generation before the generation processClusterDefinition[]
getClusterDefinitions()
returns the currently set clustersdouble
getNoiseRate()
Gets the percentage of noise set.int[]
getNumValues()
returns array that stores the number of values for a nominal attribute.java.lang.String[]
getOptions()
Gets the current settings of the datagenerator.java.lang.String
getRevision()
Returns the revision string.boolean
getSingleModeFlag()
Gets the single mode flag.java.lang.String
globalInfo()
Returns a string describing this data generator.boolean
isBoolean(int index)
Returns true if attribute is booleanboolean
isNominal(int index)
Returns true if attribute is nominaljava.util.Enumeration
listOptions()
Returns an enumeration describing the available options.static void
main(java.lang.String[] args)
Main method for testing this class.java.lang.String
noiseRateTipText()
Returns the tip text for this propertyjava.lang.String
numAttributesTipText()
Returns the tip text for this propertyvoid
setClusterDefinitions(ClusterDefinition[] value)
sets the clusters to usevoid
setNoiseRate(double newNoiseRate)
Sets the percentage of noise set.void
setNumAttributes(int numAttributes)
Sets the number of attributes the dataset should have.void
setOptions(java.lang.String[] options)
Parses a list of options for this object.-
Methods inherited from class weka.datagenerators.ClusterGenerator
booleanColsTipText, classFlagTipText, getBooleanCols, getClassFlag, getNominalCols, getNumAttributes, nominalColsTipText, setBooleanCols, setBooleanIndices, setClassFlag, setNominalCols, setNominalIndices
-
Methods inherited from class weka.datagenerators.DataGenerator
debugTipText, defaultOutput, formatTipText, getDatasetFormat, getDebug, getNumExamplesAct, getOutput, getRandom, getRelationName, getSeed, makeData, outputTipText, randomTipText, relationNameTipText, seedTipText, setDatasetFormat, setDebug, setOutput, setRandom, setRelationName, setSeed
-
-
-
-
Field Detail
-
UNIFORM_RANDOM
public static final int UNIFORM_RANDOM
cluster type: uniform/random- See Also:
- Constant Field Values
-
TOTAL_UNIFORM
public static final int TOTAL_UNIFORM
cluster type: total uniform- See Also:
- Constant Field Values
-
GAUSSIAN
public static final int GAUSSIAN
cluster type: gaussian- See Also:
- Constant Field Values
-
TAGS_CLUSTERTYPE
public static final Tag[] TAGS_CLUSTERTYPE
the tags for the cluster types
-
CONTINUOUS
public static final int CONTINUOUS
cluster subtype: continuous- See Also:
- Constant Field Values
-
INTEGER
public static final int INTEGER
cluster subtype: integer- See Also:
- Constant Field Values
-
TAGS_CLUSTERSUBTYPE
public static final Tag[] TAGS_CLUSTERSUBTYPE
the tags for the cluster types
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this data generator.- Returns:
- a description of the data generator suitable for displaying in the explorer/experimenter gui
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classClusterGenerator
- Returns:
- an enumeration of all the available options
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.Exception
Parses a list of options for this object. Valid options are:-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 1).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
-C <cluster-definition> A cluster definition of class 'SubspaceClusterDefinition' (definition needs to be quoted to be recognized as a single argument).
Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
-A <range> Generates randomly distributed instances in the cluster.
-U <range> Generates uniformly distributed instances in the cluster.
-G <range> Generates gaussian distributed instances in the cluster.
-D <num>,<num> The attribute min/max (-A and -U) or mean/stddev (-G) for the cluster.
-N <num>..<num> The range of number of instances per cluster (default 1..50).
-I Uses integer instead of continuous values (default continuous).
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classClusterGenerator
- Parameters:
options
- the list of options as an array of strings- Throws:
java.lang.Exception
- if an option is not supported
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of the datagenerator.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classClusterGenerator
- Returns:
- an array of strings suitable for passing to setOptions
- See Also:
DataGenerator.removeBlacklist(String[])
-
setNumAttributes
public void setNumAttributes(int numAttributes)
Sets the number of attributes the dataset should have.- Overrides:
setNumAttributes
in classClusterGenerator
- Parameters:
numAttributes
- the new number of attributes
-
numAttributesTipText
public java.lang.String numAttributesTipText()
Returns the tip text for this property- Overrides:
numAttributesTipText
in classClusterGenerator
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNoiseRate
public double getNoiseRate()
Gets the percentage of noise set.- Returns:
- the percentage of noise set
-
setNoiseRate
public void setNoiseRate(double newNoiseRate)
Sets the percentage of noise set.- Parameters:
newNoiseRate
- new percentage of noise
-
noiseRateTipText
public java.lang.String noiseRateTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getClusterDefinitions
public ClusterDefinition[] getClusterDefinitions()
returns the currently set clusters- Returns:
- the currently set clusters
-
setClusterDefinitions
public void setClusterDefinitions(ClusterDefinition[] value) throws java.lang.Exception
sets the clusters to use- Parameters:
value
- the clusters do use- Throws:
java.lang.Exception
- if clusters are not the correct class
-
clusterDefinitionsTipText
public java.lang.String clusterDefinitionsTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getSingleModeFlag
public boolean getSingleModeFlag()
Gets the single mode flag.- Specified by:
getSingleModeFlag
in classDataGenerator
- Returns:
- true if methode generateExample can be used.
-
defineDataFormat
public Instances defineDataFormat() throws java.lang.Exception
Initializes the format for the dataset produced.- Overrides:
defineDataFormat
in classDataGenerator
- Returns:
- the output data format
- Throws:
java.lang.Exception
- data format could not be defined- See Also:
DataGenerator.defaultRelationName()
-
isBoolean
public boolean isBoolean(int index)
Returns true if attribute is boolean- Parameters:
index
- of the attribute- Returns:
- true if the attribute is boolean
-
isNominal
public boolean isNominal(int index)
Returns true if attribute is nominal- Parameters:
index
- of the attribute- Returns:
- true if the attribute is nominal
-
getNumValues
public int[] getNumValues()
returns array that stores the number of values for a nominal attribute.- Returns:
- the array that stores the number of values for a nominal attribute
-
generateExample
public Instance generateExample() throws java.lang.Exception
Generate an example of the dataset.- Specified by:
generateExample
in classDataGenerator
- Returns:
- the instance generated
- Throws:
java.lang.Exception
- if format not defined or generating
examples one by one is not possible, because voting is chosen
-
generateExamples
public Instances generateExamples() throws java.lang.Exception
Generate all examples of the dataset.- Specified by:
generateExamples
in classDataGenerator
- Returns:
- the instance generated
- Throws:
java.lang.Exception
- if format not defined
-
generateFinished
public java.lang.String generateFinished() throws java.lang.Exception
Compiles documentation about the data generation after the generation process- Specified by:
generateFinished
in classDataGenerator
- Returns:
- string with additional information about generated dataset
- Throws:
java.lang.Exception
- no input structure has been defined
-
generateStart
public java.lang.String generateStart()
Compiles documentation about the data generation before the generation process- Specified by:
generateStart
in classDataGenerator
- Returns:
- string with additional information
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Returns:
- the revision
-
main
public static void main(java.lang.String[] args)
Main method for testing this class.- Parameters:
args
- should contain arguments for the data producer:
-
-