类 RaceSearch

java.lang.Object
weka.attributeSelection.ASSearch
weka.attributeSelection.RaceSearch
所有已实现的接口:
Serializable, RankedOutputSearch, OptionHandler, RevisionHandler, TechnicalInformationHandler

public class RaceSearch extends ASSearch implements RankedOutputSearch, OptionHandler, TechnicalInformationHandler
Races the cross validation error of competing attribute subsets. Use in conjuction with a ClassifierSubsetEval. RaceSearch has four modes:

forward selection races all single attribute additions to a base set (initially no attributes), selects the winner to become the new base set and then iterates until there is no improvement over the base set.

Backward elimination is similar but the initial base set has all attributes included and races all single attribute deletions.

Schemata search is a bit different. Each iteration a series of races are run in parallel. Each race in a set determines whether a particular attribute should be included or not---ie the race is between the attribute being "in" or "out". The other attributes for this race are included or excluded randomly at each point in the evaluation. As soon as one race has a clear winner (ie it has been decided whether a particular attribute should be inor not) then the next set of races begins, using the result of the winning race from the previous iteration as new base set.

Rank race first ranks the attributes using an attribute evaluator and then races the ranking. The race includes no attributes, the top ranked attribute, the top two attributes, the top three attributes, etc.

It is also possible to generate a raked list of attributes through the forward racing process. If generateRanking is set to true then a complete forward race will be run---that is, racing continues until all attributes have been selected. The order that they are added in determines a complete ranking of all the attributes.

Racing uses paired and unpaired t-tests on cross-validation errors of competing subsets. When there is a significant difference between the means of the errors of two competing subsets then the poorer of the two can be eliminated from the race. Similarly, if there is no significant difference between the mean errors of two competing subsets and they are within some threshold of each other, then one can be eliminated from the race.

For more information see:

Andrew W. Moore, Mary S. Lee: Efficient Algorithms for Minimizing Cross Validation Error. In: Eleventh International Conference on Machine Learning, 190-198, 1994.

BibTeX:

 @inproceedings{Moore1994,
    author = {Andrew W. Moore and Mary S. Lee},
    booktitle = {Eleventh International Conference on Machine Learning},
    pages = {190-198},
    publisher = {Morgan Kaufmann},
    title = {Efficient Algorithms for Minimizing Cross Validation Error},
    year = {1994}
 }
 

Valid options are:

 -R <0 = forward | 1 = backward race | 2 = schemata | 3 = rank>
  Type of race to perform.
  (default = 0).
 -L <significance>
  Significance level for comaparisons
  (default = 0.001(forward/backward/rank)/0.01(schemata)).
 -T <threshold>
  Threshold for error comparison.
  (default = 0.001).
 -A <attribute evaluator>
  Attribute ranker to use if doing a 
  rank search. Place any
  evaluator options LAST on 
  the command line following a "--".
  eg. -A weka.attributeSelection.GainRatioAttributeEval ... -- -M.
  (default = GainRatioAttributeEval)
 -F <0 = 10 fold | 1 = leave-one-out>
  Folds for cross validation
  (default = 0 (1 if schemata race)
 -Q
  Generate a ranked list of attributes.
  Forces the search to be forward
  and races until all attributes have
  selected, thus producing a ranking.
 -N <num to select>
  Specify number of attributes to retain from 
  the ranking. Overides -T. Use in conjunction with -Q
 -J <threshold>
  Specify a theshold by which attributes
  may be discarded from the ranking.
  Use in conjuction with -Q
 -Z
  Verbose output for monitoring the search.
 
 Options specific to evaluator weka.attributeSelection.GainRatioAttributeEval:
 
 -M
  treat missing values as a seperate value.
版本:
$Revision: 1.26 $
作者:
Mark Hall (mhall@cs.waikato.ac.nz)
另请参阅:
  • 字段详细资料

    • TAGS_SELECTION

      public static final Tag[] TAGS_SELECTION
    • XVALTAGS_SELECTION

      public static final Tag[] XVALTAGS_SELECTION
  • 构造器详细资料

    • RaceSearch

      public RaceSearch()
  • 方法详细资料

    • globalInfo

      public String globalInfo()
      Returns a string describing this search method
      返回:
      a description of the search method suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      指定者:
      getTechnicalInformation 在接口中 TechnicalInformationHandler
      返回:
      the technical information about this class
    • raceTypeTipText

      public String raceTypeTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setRaceType

      public void setRaceType(SelectedTag d)
      Set the race type
      参数:
      d - the type of race
    • getRaceType

      public SelectedTag getRaceType()
      Get the race type
      返回:
      the type of race
    • significanceLevelTipText

      public String significanceLevelTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSignificanceLevel

      public void setSignificanceLevel(double sig)
      Sets the significance level to use
      参数:
      sig - the significance level
    • getSignificanceLevel

      public double getSignificanceLevel()
      Get the significance level
      返回:
      the current significance level
    • thresholdTipText

      public String thresholdTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setThreshold

      public void setThreshold(double t)
      Sets the threshold for comparisons
      指定者:
      setThreshold 在接口中 RankedOutputSearch
      参数:
      t - the threshold to use
    • getThreshold

      public double getThreshold()
      Get the threshold
      指定者:
      getThreshold 在接口中 RankedOutputSearch
      返回:
      the current threshold
    • foldsTypeTipText

      public String foldsTypeTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setFoldsType

      public void setFoldsType(SelectedTag d)
      Set the xfold type
      参数:
      d - the type of xval
    • getFoldsType

      public SelectedTag getFoldsType()
      Get the xfold type
      返回:
      the type of xval
    • debugTipText

      public String debugTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDebug

      public void setDebug(boolean d)
      Set whether verbose output should be generated.
      参数:
      d - true if output is to be verbose.
    • getDebug

      public boolean getDebug()
      Get whether output is to be verbose
      返回:
      true if output will be verbose
    • attributeEvaluatorTipText

      public String attributeEvaluatorTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setAttributeEvaluator

      public void setAttributeEvaluator(ASEvaluation newEvaluator)
      Set the attribute evaluator to use for generating the ranking.
      参数:
      newEvaluator - the attribute evaluator to use.
    • getAttributeEvaluator

      public ASEvaluation getAttributeEvaluator()
      Get the attribute evaluator used to generate the ranking.
      返回:
      the evaluator used to generate the ranking.
    • generateRankingTipText

      public String generateRankingTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setGenerateRanking

      public void setGenerateRanking(boolean doRank)
      Records whether the user has requested a ranked list of attributes.
      指定者:
      setGenerateRanking 在接口中 RankedOutputSearch
      参数:
      doRank - true if ranking is requested
    • getGenerateRanking

      public boolean getGenerateRanking()
      Gets whether ranking has been requested. This is used by the AttributeSelection module to determine if rankedAttributes() should be called.
      指定者:
      getGenerateRanking 在接口中 RankedOutputSearch
      返回:
      true if ranking has been requested.
    • numToSelectTipText

      public String numToSelectTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumToSelect

      public void setNumToSelect(int n)
      Specify the number of attributes to select from the ranked list (if generating a ranking). -1 indicates that all attributes are to be retained.
      指定者:
      setNumToSelect 在接口中 RankedOutputSearch
      参数:
      n - the number of attributes to retain
    • getNumToSelect

      public int getNumToSelect()
      Gets the number of attributes to be retained.
      指定者:
      getNumToSelect 在接口中 RankedOutputSearch
      返回:
      the number of attributes to retain
    • getCalculatedNumToSelect

      public int getCalculatedNumToSelect()
      Gets the calculated number of attributes to retain. This is the actual number of attributes to retain. This is the same as getNumToSelect if the user specifies a number which is not less than zero. Otherwise it should be the number of attributes in the (potentially transformed) data.
      指定者:
      getCalculatedNumToSelect 在接口中 RankedOutputSearch
    • selectionThresholdTipText

      public String selectionThresholdTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSelectionThreshold

      public void setSelectionThreshold(double threshold)
      Set the threshold by which the AttributeSelection module can discard attributes.
      参数:
      threshold - the threshold.
    • getSelectionThreshold

      public double getSelectionThreshold()
      Returns the threshold so that the AttributeSelection module can discard attributes from the ranking.
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      指定者:
      listOptions 在接口中 OptionHandler
      返回:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -R <0 = forward | 1 = backward race | 2 = schemata | 3 = rank>
        Type of race to perform.
        (default = 0).
       -L <significance>
        Significance level for comaparisons
        (default = 0.001(forward/backward/rank)/0.01(schemata)).
       -T <threshold>
        Threshold for error comparison.
        (default = 0.001).
       -A <attribute evaluator>
        Attribute ranker to use if doing a 
        rank search. Place any
        evaluator options LAST on 
        the command line following a "--".
        eg. -A weka.attributeSelection.GainRatioAttributeEval ... -- -M.
        (default = GainRatioAttributeEval)
       -F <0 = 10 fold | 1 = leave-one-out>
        Folds for cross validation
        (default = 0 (1 if schemata race)
       -Q
        Generate a ranked list of attributes.
        Forces the search to be forward
        and races until all attributes have
        selected, thus producing a ranking.
       -N <num to select>
        Specify number of attributes to retain from 
        the ranking. Overides -T. Use in conjunction with -Q
       -J <threshold>
        Specify a theshold by which attributes
        may be discarded from the ranking.
        Use in conjuction with -Q
       -Z
        Verbose output for monitoring the search.
       
       Options specific to evaluator weka.attributeSelection.GainRatioAttributeEval:
       
       -M
        treat missing values as a seperate value.
      指定者:
      setOptions 在接口中 OptionHandler
      参数:
      options - the list of options as an array of strings
      抛出:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of BestFirst.
      指定者:
      getOptions 在接口中 OptionHandler
      返回:
      an array of strings suitable for passing to setOptions()
    • search

      public int[] search(ASEvaluation ASEval, Instances data) throws Exception
      Searches the attribute subset space by racing cross validation errors of competing subsets
      指定者:
      search 在类中 ASSearch
      参数:
      ASEval - the attribute evaluator to guide the search
      data - the training instances.
      返回:
      an array (not necessarily ordered) of selected attribute indexes
      抛出:
      Exception - if the search can't be completed
    • rankedAttributes

      public double[][] rankedAttributes() throws Exception
      从接口复制的说明: RankedOutputSearch
      Returns a X by 2 list of attribute indexes and corresponding evaluations from best (highest) to worst.
      指定者:
      rankedAttributes 在接口中 RankedOutputSearch
      返回:
      the ranked list of attribute indexes in an array of ints
      抛出:
      Exception - if the ranking can't be produced
    • toString

      public String toString()
      Returns a string represenation
      覆盖:
      toString 在类中 Object
      返回:
      a string representation
    • getRevision

      public String getRevision()
      Returns the revision string.
      指定者:
      getRevision 在接口中 RevisionHandler
      覆盖:
      getRevision 在类中 ASSearch
      返回:
      the revision