类 RaceSearch
java.lang.Object
weka.attributeSelection.ASSearch
weka.attributeSelection.RaceSearch
- 所有已实现的接口:
Serializable
,RankedOutputSearch
,OptionHandler
,RevisionHandler
,TechnicalInformationHandler
public class RaceSearch
extends ASSearch
implements RankedOutputSearch, OptionHandler, TechnicalInformationHandler
Races the cross validation error of competing attribute subsets. Use in conjuction with a ClassifierSubsetEval. RaceSearch has four modes:
forward selection races all single attribute additions to a base set (initially no attributes), selects the winner to become the new base set and then iterates until there is no improvement over the base set.
Backward elimination is similar but the initial base set has all attributes included and races all single attribute deletions.
Schemata search is a bit different. Each iteration a series of races are run in parallel. Each race in a set determines whether a particular attribute should be included or not---ie the race is between the attribute being "in" or "out". The other attributes for this race are included or excluded randomly at each point in the evaluation. As soon as one race has a clear winner (ie it has been decided whether a particular attribute should be inor not) then the next set of races begins, using the result of the winning race from the previous iteration as new base set.
Rank race first ranks the attributes using an attribute evaluator and then races the ranking. The race includes no attributes, the top ranked attribute, the top two attributes, the top three attributes, etc.
It is also possible to generate a raked list of attributes through the forward racing process. If generateRanking is set to true then a complete forward race will be run---that is, racing continues until all attributes have been selected. The order that they are added in determines a complete ranking of all the attributes.
Racing uses paired and unpaired t-tests on cross-validation errors of competing subsets. When there is a significant difference between the means of the errors of two competing subsets then the poorer of the two can be eliminated from the race. Similarly, if there is no significant difference between the mean errors of two competing subsets and they are within some threshold of each other, then one can be eliminated from the race.
For more information see:
Andrew W. Moore, Mary S. Lee: Efficient Algorithms for Minimizing Cross Validation Error. In: Eleventh International Conference on Machine Learning, 190-198, 1994. BibTeX:
forward selection races all single attribute additions to a base set (initially no attributes), selects the winner to become the new base set and then iterates until there is no improvement over the base set.
Backward elimination is similar but the initial base set has all attributes included and races all single attribute deletions.
Schemata search is a bit different. Each iteration a series of races are run in parallel. Each race in a set determines whether a particular attribute should be included or not---ie the race is between the attribute being "in" or "out". The other attributes for this race are included or excluded randomly at each point in the evaluation. As soon as one race has a clear winner (ie it has been decided whether a particular attribute should be inor not) then the next set of races begins, using the result of the winning race from the previous iteration as new base set.
Rank race first ranks the attributes using an attribute evaluator and then races the ranking. The race includes no attributes, the top ranked attribute, the top two attributes, the top three attributes, etc.
It is also possible to generate a raked list of attributes through the forward racing process. If generateRanking is set to true then a complete forward race will be run---that is, racing continues until all attributes have been selected. The order that they are added in determines a complete ranking of all the attributes.
Racing uses paired and unpaired t-tests on cross-validation errors of competing subsets. When there is a significant difference between the means of the errors of two competing subsets then the poorer of the two can be eliminated from the race. Similarly, if there is no significant difference between the mean errors of two competing subsets and they are within some threshold of each other, then one can be eliminated from the race.
For more information see:
Andrew W. Moore, Mary S. Lee: Efficient Algorithms for Minimizing Cross Validation Error. In: Eleventh International Conference on Machine Learning, 190-198, 1994. BibTeX:
@inproceedings{Moore1994, author = {Andrew W. Moore and Mary S. Lee}, booktitle = {Eleventh International Conference on Machine Learning}, pages = {190-198}, publisher = {Morgan Kaufmann}, title = {Efficient Algorithms for Minimizing Cross Validation Error}, year = {1994} }Valid options are:
-R <0 = forward | 1 = backward race | 2 = schemata | 3 = rank> Type of race to perform. (default = 0).
-L <significance> Significance level for comaparisons (default = 0.001(forward/backward/rank)/0.01(schemata)).
-T <threshold> Threshold for error comparison. (default = 0.001).
-A <attribute evaluator> Attribute ranker to use if doing a rank search. Place any evaluator options LAST on the command line following a "--". eg. -A weka.attributeSelection.GainRatioAttributeEval ... -- -M. (default = GainRatioAttributeEval)
-F <0 = 10 fold | 1 = leave-one-out> Folds for cross validation (default = 0 (1 if schemata race)
-Q Generate a ranked list of attributes. Forces the search to be forward and races until all attributes have selected, thus producing a ranking.
-N <num to select> Specify number of attributes to retain from the ranking. Overides -T. Use in conjunction with -Q
-J <threshold> Specify a theshold by which attributes may be discarded from the ranking. Use in conjuction with -Q
-Z Verbose output for monitoring the search.
Options specific to evaluator weka.attributeSelection.GainRatioAttributeEval:
-M treat missing values as a seperate value.
- 版本:
- $Revision: 1.26 $
- 作者:
- Mark Hall (mhall@cs.waikato.ac.nz)
- 另请参阅:
-
字段概要
字段 -
构造器概要
构造器 -
方法概要
修饰符和类型方法说明Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyGet the attribute evaluator used to generate the ranking.int
Gets the calculated number of attributes to retain.boolean
getDebug()
Get whether output is to be verboseGet the xfold typeboolean
Gets whether ranking has been requested.int
Gets the number of attributes to be retained.String[]
Gets the current settings of BestFirst.Get the race typeReturns the revision string.double
Returns the threshold so that the AttributeSelection module can discard attributes from the ranking.double
Get the significance levelReturns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.double
Get the thresholdReturns a string describing this search methodReturns an enumeration describing the available options.Returns the tip text for this propertyReturns the tip text for this propertydouble[][]
Returns a X by 2 list of attribute indexes and corresponding evaluations from best (highest) to worst.int[]
search
(ASEvaluation ASEval, Instances data) Searches the attribute subset space by racing cross validation errors of competing subsetsReturns the tip text for this propertyvoid
setAttributeEvaluator
(ASEvaluation newEvaluator) Set the attribute evaluator to use for generating the ranking.void
setDebug
(boolean d) Set whether verbose output should be generated.void
Set the xfold typevoid
setGenerateRanking
(boolean doRank) Records whether the user has requested a ranked list of attributes.void
setNumToSelect
(int n) Specify the number of attributes to select from the ranked list (if generating a ranking).void
setOptions
(String[] options) Parses a given list of options.void
Set the race typevoid
setSelectionThreshold
(double threshold) Set the threshold by which the AttributeSelection module can discard attributes.void
setSignificanceLevel
(double sig) Sets the significance level to usevoid
setThreshold
(double t) Sets the threshold for comparisonsReturns the tip text for this propertyReturns the tip text for this propertytoString()
Returns a string represenation从类继承的方法 weka.attributeSelection.ASSearch
forName, makeCopies
-
字段详细资料
-
TAGS_SELECTION
-
XVALTAGS_SELECTION
-
-
构造器详细资料
-
RaceSearch
public RaceSearch()
-
-
方法详细资料
-
globalInfo
Returns a string describing this search method- 返回:
- a description of the search method suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- 指定者:
getTechnicalInformation
在接口中TechnicalInformationHandler
- 返回:
- the technical information about this class
-
raceTypeTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRaceType
Set the race type- 参数:
d
- the type of race
-
getRaceType
Get the race type- 返回:
- the type of race
-
significanceLevelTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSignificanceLevel
public void setSignificanceLevel(double sig) Sets the significance level to use- 参数:
sig
- the significance level
-
getSignificanceLevel
public double getSignificanceLevel()Get the significance level- 返回:
- the current significance level
-
thresholdTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setThreshold
public void setThreshold(double t) Sets the threshold for comparisons- 指定者:
setThreshold
在接口中RankedOutputSearch
- 参数:
t
- the threshold to use
-
getThreshold
public double getThreshold()Get the threshold- 指定者:
getThreshold
在接口中RankedOutputSearch
- 返回:
- the current threshold
-
foldsTypeTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setFoldsType
Set the xfold type- 参数:
d
- the type of xval
-
getFoldsType
Get the xfold type- 返回:
- the type of xval
-
debugTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDebug
public void setDebug(boolean d) Set whether verbose output should be generated.- 参数:
d
- true if output is to be verbose.
-
getDebug
public boolean getDebug()Get whether output is to be verbose- 返回:
- true if output will be verbose
-
attributeEvaluatorTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setAttributeEvaluator
Set the attribute evaluator to use for generating the ranking.- 参数:
newEvaluator
- the attribute evaluator to use.
-
getAttributeEvaluator
Get the attribute evaluator used to generate the ranking.- 返回:
- the evaluator used to generate the ranking.
-
generateRankingTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setGenerateRanking
public void setGenerateRanking(boolean doRank) Records whether the user has requested a ranked list of attributes.- 指定者:
setGenerateRanking
在接口中RankedOutputSearch
- 参数:
doRank
- true if ranking is requested
-
getGenerateRanking
public boolean getGenerateRanking()Gets whether ranking has been requested. This is used by the AttributeSelection module to determine if rankedAttributes() should be called.- 指定者:
getGenerateRanking
在接口中RankedOutputSearch
- 返回:
- true if ranking has been requested.
-
numToSelectTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNumToSelect
public void setNumToSelect(int n) Specify the number of attributes to select from the ranked list (if generating a ranking). -1 indicates that all attributes are to be retained.- 指定者:
setNumToSelect
在接口中RankedOutputSearch
- 参数:
n
- the number of attributes to retain
-
getNumToSelect
public int getNumToSelect()Gets the number of attributes to be retained.- 指定者:
getNumToSelect
在接口中RankedOutputSearch
- 返回:
- the number of attributes to retain
-
getCalculatedNumToSelect
public int getCalculatedNumToSelect()Gets the calculated number of attributes to retain. This is the actual number of attributes to retain. This is the same as getNumToSelect if the user specifies a number which is not less than zero. Otherwise it should be the number of attributes in the (potentially transformed) data.- 指定者:
getCalculatedNumToSelect
在接口中RankedOutputSearch
-
selectionThresholdTipText
Returns the tip text for this property- 返回:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSelectionThreshold
public void setSelectionThreshold(double threshold) Set the threshold by which the AttributeSelection module can discard attributes.- 参数:
threshold
- the threshold.
-
getSelectionThreshold
public double getSelectionThreshold()Returns the threshold so that the AttributeSelection module can discard attributes from the ranking. -
listOptions
Returns an enumeration describing the available options.- 指定者:
listOptions
在接口中OptionHandler
- 返回:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-R <0 = forward | 1 = backward race | 2 = schemata | 3 = rank> Type of race to perform. (default = 0).
-L <significance> Significance level for comaparisons (default = 0.001(forward/backward/rank)/0.01(schemata)).
-T <threshold> Threshold for error comparison. (default = 0.001).
-A <attribute evaluator> Attribute ranker to use if doing a rank search. Place any evaluator options LAST on the command line following a "--". eg. -A weka.attributeSelection.GainRatioAttributeEval ... -- -M. (default = GainRatioAttributeEval)
-F <0 = 10 fold | 1 = leave-one-out> Folds for cross validation (default = 0 (1 if schemata race)
-Q Generate a ranked list of attributes. Forces the search to be forward and races until all attributes have selected, thus producing a ranking.
-N <num to select> Specify number of attributes to retain from the ranking. Overides -T. Use in conjunction with -Q
-J <threshold> Specify a theshold by which attributes may be discarded from the ranking. Use in conjuction with -Q
-Z Verbose output for monitoring the search.
Options specific to evaluator weka.attributeSelection.GainRatioAttributeEval:
-M treat missing values as a seperate value.
- 指定者:
setOptions
在接口中OptionHandler
- 参数:
options
- the list of options as an array of strings- 抛出:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of BestFirst.- 指定者:
getOptions
在接口中OptionHandler
- 返回:
- an array of strings suitable for passing to setOptions()
-
search
Searches the attribute subset space by racing cross validation errors of competing subsets -
rankedAttributes
从接口复制的说明:RankedOutputSearch
Returns a X by 2 list of attribute indexes and corresponding evaluations from best (highest) to worst.- 指定者:
rankedAttributes
在接口中RankedOutputSearch
- 返回:
- the ranked list of attribute indexes in an array of ints
- 抛出:
Exception
- if the ranking can't be produced
-
toString
Returns a string represenation -
getRevision
Returns the revision string.- 指定者:
getRevision
在接口中RevisionHandler
- 覆盖:
getRevision
在类中ASSearch
- 返回:
- the revision
-