类 GridSearch

所有已实现的接口:
Serializable, Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, Summarizable

Performs a grid search of parameter pairs for the a classifier (Y-axis, default is LinearRegression with the "Ridge" parameter) and the PLSFilter (X-axis, "# of Components") and chooses the best pair found for the actual predicting.

The initial grid is worked on with 2-fold CV to determine the values of the parameter pairs for the selected type of evaluation (e.g., accuracy). The best point in the grid is then taken and a 10-fold CV is performed with the adjacent parameter pairs. If a better pair is found, then this will act as new center and another 10-fold CV will be performed (kind of hill-climbing). This process is repeated until no better pair is found or the best pair is on the border of the grid.
In case the best pair is on the border, one can let GridSearch automatically extend the grid and continue the search. Check out the properties 'gridIsExtendable' (option '-extend-grid') and 'maxGridExtensions' (option '-max-grid-extensions <num>').

GridSearch can handle doubles, integers (values are just cast to int) and booleans (0 is false, otherwise true). float, char and long are supported as well.

The best filter/classifier setup can be accessed after the buildClassifier call via the getBestFilter/getBestClassifier methods.
Note on the implementation: after the data has been passed through the filter, a default NumericCleaner filter is applied to the data in order to avoid numbers that are getting too small and might produce NaNs in other schemes.

Valid options are:

 -E <CC|RMSE|RRSE|MAE|RAE|COMB|ACC|KAP>
  Determines the parameter used for evaluation:
  CC = Correlation coefficient
  RMSE = Root mean squared error
  RRSE = Root relative squared error
  MAE = Mean absolute error
  RAE = Root absolute error
  COMB = Combined = (1-abs(CC)) + RRSE + RAE
  ACC = Accuracy
  KAP = Kappa
  (default: CC)
 -y-property <option>
  The Y option to test (without leading dash).
  (default: classifier.ridge)
 -y-min <num>
  The minimum for Y.
  (default: -10)
 -y-max <num>
  The maximum for Y.
  (default: +5)
 -y-step <num>
  The step size for Y.
  (default: 1)
 -y-base <num>
  The base for Y.
  (default: 10)
 -y-expression <expr>
  The expression for Y.
  Available parameters:
   BASE
   FROM
   TO
   STEP
   I - the current iteration value
   (from 'FROM' to 'TO' with stepsize 'STEP')
  (default: 'pow(BASE,I)')
 -filter <filter specification>
  The filter to use (on X axis). Full classname of filter to include, 
  followed by scheme options.
  (default: weka.filters.supervised.attribute.PLSFilter)
 -x-property <option>
  The X option to test (without leading dash).
  (default: filter.numComponents)
 -x-min <num>
  The minimum for X.
  (default: +5)
 -x-max <num>
  The maximum for X.
  (default: +20)
 -x-step <num>
  The step size for X.
  (default: 1)
 -x-base <num>
  The base for X.
  (default: 10)
 -x-expression <expr>
  The expression for the X value.
  Available parameters:
   BASE
   MIN
   MAX
   STEP
   I - the current iteration value
   (from 'FROM' to 'TO' with stepsize 'STEP')
  (default: 'pow(BASE,I)')
 -extend-grid
  Whether the grid can be extended.
  (default: no)
 -max-grid-extensions <num>
  The maximum number of grid extensions (-1 is unlimited).
  (default: 3)
 -sample-size <num>
  The size (in percent) of the sample to search the inital grid with.
  (default: 100)
 -traversal <ROW-WISE|COLUMN-WISE>
  The type of traversal for the grid.
  (default: COLUMN-WISE)
 -log-file <filename>
  The log file to log the messages to.
  (default: none)
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.functions.LinearRegression)
 
 Options specific to classifier weka.classifiers.functions.LinearRegression:
 
 -D
  Produce debugging output.
  (default no debugging output)
 -S <number of selection method>
  Set the attribute selection method to use. 1 = None, 2 = Greedy.
  (default 0 = M5' method)
 -C
  Do not try to eliminate colinear attributes.
 
 -R <double>
  Set ridge parameter (default 1.0e-8).
 
 
 Options specific to filter weka.filters.supervised.attribute.PLSFilter ('-filter'):
 
 -D
  Turns on output of debugging information.
 -C <num>
  The number of components to compute.
  (default: 20)
 -U
  Updates the class attribute as well.
  (default: off)
 -M
  Turns replacing of missing values on.
  (default: off)
 -A <SIMPLS|PLS1>
  The algorithm to use.
  (default: PLS1)
 -P <none|center|standardize>
  The type of preprocessing that is applied to the data.
  (default: center)
Examples:
  • Optimizing SMO with RBFKernel (C and gamma)
    • Set the evaluation to Accuracy.
    • Set the filter to weka.filters.AllFilter since we don't need any special data processing and we don't optimize the filter in this case (data gets always passed through filter!).
    • Set weka.classifiers.functions.SMO as classifier with weka.classifiers.functions.supportVector.RBFKernel as kernel.
    • Set the XProperty to "classifier.c", XMin to "1", XMax to "16", XStep to "1" and the XExpression to "I". This will test the "C" parameter of SMO for the values from 1 to 16.
    • Set the YProperty to "classifier.kernel.gamma", YMin to "-5", YMax to "2", YStep to "1" YBase to "10" and YExpression to "pow(BASE,I)". This will test the gamma of the RBFKernel with the values 10^-5, 10^-4,..,10^2.
  • Optimizing PLSFilter with LinearRegression (# of components and ridge) - default setup
    • Set the evaluation to Correlation coefficient.
    • Set the filter to weka.filters.supervised.attribute.PLSFilter.
    • Set weka.classifiers.functions.LinearRegression as classifier and use no attribute selection and no elimination of colinear attributes.
    • Set the XProperty to "filter.numComponents", XMin to "5", XMax to "20" (this depends heavily on your dataset, should be no more than the number of attributes!), XStep to "1" and XExpression to "I". This will test the number of components the PLSFilter will produce from 5 to 20.
    • Set the YProperty to "classifier.ridge", XMin to "-10", XMax to "5", YStep to "1" and YExpression to "pow(BASE,I)". This will try ridge parameters from 10^-10 to 10^5.
General notes:
  • Turn the debug flag on in order to see some progress output in the console
  • If you want to view the fitness landscape that GridSearch explores, select a log file. This log will then contain Gnuplot data and script block for viewing the landscape. Just copy paste those blocks into files named accordingly and run Gnuplot with them.
版本:
$Revision: 9733 $
作者:
Bernhard Pfahringer (bernhard at cs dot waikato dot ac dot nz), Geoff Holmes (geoff at cs dot waikato dot ac dot nz), fracpete (fracpete at waikato dot ac dot nz)
另请参阅:
  • 字段详细资料

    • EVALUATION_CC

      public static final int EVALUATION_CC
      evaluation via: Correlation coefficient
      另请参阅:
    • EVALUATION_RMSE

      public static final int EVALUATION_RMSE
      evaluation via: Root mean squared error
      另请参阅:
    • EVALUATION_RRSE

      public static final int EVALUATION_RRSE
      evaluation via: Root relative squared error
      另请参阅:
    • EVALUATION_MAE

      public static final int EVALUATION_MAE
      evaluation via: Mean absolute error
      另请参阅:
    • EVALUATION_RAE

      public static final int EVALUATION_RAE
      evaluation via: Relative absolute error
      另请参阅:
    • EVALUATION_COMBINED

      public static final int EVALUATION_COMBINED
      evaluation via: Combined = (1-CC) + RRSE + RAE
      另请参阅:
    • EVALUATION_ACC

      public static final int EVALUATION_ACC
      evaluation via: Accuracy
      另请参阅:
    • EVALUATION_KAPPA

      public static final int EVALUATION_KAPPA
      evaluation via: kappa statistic
      另请参阅:
    • TAGS_EVALUATION

      public static final Tag[] TAGS_EVALUATION
      evaluation
    • TRAVERSAL_BY_ROW

      public static final int TRAVERSAL_BY_ROW
      row-wise grid traversal
      另请参阅:
    • TRAVERSAL_BY_COLUMN

      public static final int TRAVERSAL_BY_COLUMN
      column-wise grid traversal
      另请参阅:
    • TAGS_TRAVERSAL

      public static final Tag[] TAGS_TRAVERSAL
      traversal
    • PREFIX_CLASSIFIER

      public static final String PREFIX_CLASSIFIER
      the prefix to indicate that the option is for the classifier
      另请参阅:
    • PREFIX_FILTER

      public static final String PREFIX_FILTER
      the prefix to indicate that the option is for the filter
      另请参阅:
  • 构造器详细资料

    • GridSearch

      public GridSearch()
      the default constructor
  • 方法详细资料

    • globalInfo

      public String globalInfo()
      Returns a string describing classifier
      返回:
      a description suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration listOptions()
      Gets an enumeration describing the available options.
      指定者:
      listOptions 在接口中 OptionHandler
      覆盖:
      listOptions 在类中 RandomizableSingleClassifierEnhancer
      返回:
      an enumeration of all the available options.
    • getOptions

      public String[] getOptions()
      returns the options of the current setup
      指定者:
      getOptions 在接口中 OptionHandler
      覆盖:
      getOptions 在类中 RandomizableSingleClassifierEnhancer
      返回:
      the current options
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses the options for this object.

      Valid options are:

       -E <CC|RMSE|RRSE|MAE|RAE|COMB|ACC|KAP>
        Determines the parameter used for evaluation:
        CC = Correlation coefficient
        RMSE = Root mean squared error
        RRSE = Root relative squared error
        MAE = Mean absolute error
        RAE = Root absolute error
        COMB = Combined = (1-abs(CC)) + RRSE + RAE
        ACC = Accuracy
        KAP = Kappa
        (default: CC)
       -y-property <option>
        The Y option to test (without leading dash).
        (default: classifier.ridge)
       -y-min <num>
        The minimum for Y.
        (default: -10)
       -y-max <num>
        The maximum for Y.
        (default: +5)
       -y-step <num>
        The step size for Y.
        (default: 1)
       -y-base <num>
        The base for Y.
        (default: 10)
       -y-expression <expr>
        The expression for Y.
        Available parameters:
         BASE
         FROM
         TO
         STEP
         I - the current iteration value
         (from 'FROM' to 'TO' with stepsize 'STEP')
        (default: 'pow(BASE,I)')
       -filter <filter specification>
        The filter to use (on X axis). Full classname of filter to include, 
        followed by scheme options.
        (default: weka.filters.supervised.attribute.PLSFilter)
       -x-property <option>
        The X option to test (without leading dash).
        (default: filter.numComponents)
       -x-min <num>
        The minimum for X.
        (default: +5)
       -x-max <num>
        The maximum for X.
        (default: +20)
       -x-step <num>
        The step size for X.
        (default: 1)
       -x-base <num>
        The base for X.
        (default: 10)
       -x-expression <expr>
        The expression for the X value.
        Available parameters:
         BASE
         MIN
         MAX
         STEP
         I - the current iteration value
         (from 'FROM' to 'TO' with stepsize 'STEP')
        (default: 'pow(BASE,I)')
       -extend-grid
        Whether the grid can be extended.
        (default: no)
       -max-grid-extensions <num>
        The maximum number of grid extensions (-1 is unlimited).
        (default: 3)
       -sample-size <num>
        The size (in percent) of the sample to search the inital grid with.
        (default: 100)
       -traversal <ROW-WISE|COLUMN-WISE>
        The type of traversal for the grid.
        (default: COLUMN-WISE)
       -log-file <filename>
        The log file to log the messages to.
        (default: none)
       -S <num>
        Random number seed.
        (default 1)
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
       -W
        Full name of base classifier.
        (default: weka.classifiers.functions.LinearRegression)
       
       Options specific to classifier weka.classifiers.functions.LinearRegression:
       
       -D
        Produce debugging output.
        (default no debugging output)
       -S <number of selection method>
        Set the attribute selection method to use. 1 = None, 2 = Greedy.
        (default 0 = M5' method)
       -C
        Do not try to eliminate colinear attributes.
       
       -R <double>
        Set ridge parameter (default 1.0e-8).
       
       
       Options specific to filter weka.filters.supervised.attribute.PLSFilter ('-filter'):
       
       -D
        Turns on output of debugging information.
       -C <num>
        The number of components to compute.
        (default: 20)
       -U
        Updates the class attribute as well.
        (default: off)
       -M
        Turns replacing of missing values on.
        (default: off)
       -A <SIMPLS|PLS1>
        The algorithm to use.
        (default: PLS1)
       -P <none|center|standardize>
        The type of preprocessing that is applied to the data.
        (default: center)
      指定者:
      setOptions 在接口中 OptionHandler
      覆盖:
      setOptions 在类中 RandomizableSingleClassifierEnhancer
      参数:
      options - the options to use
      抛出:
      Exception - if setting of options fails
    • setClassifier

      public void setClassifier(Classifier newClassifier)
      Set the base learner.
      覆盖:
      setClassifier 在类中 SingleClassifierEnhancer
      参数:
      newClassifier - the classifier to use.
    • filterTipText

      public String filterTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setFilter

      public void setFilter(Filter value)
      Set the kernel filter (only used for setup).
      参数:
      value - the kernel filter.
    • getFilter

      public Filter getFilter()
      Get the kernel filter.
      返回:
      the kernel filter
    • evaluationTipText

      public String evaluationTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setEvaluation

      public void setEvaluation(SelectedTag value)
      Sets the criterion to use for evaluating the classifier performance.
      参数:
      value - .the evaluation criterion
    • getEvaluation

      public SelectedTag getEvaluation()
      Gets the criterion used for evaluating the classifier performance.
      返回:
      the current evaluation criterion.
    • YPropertyTipText

      public String YPropertyTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getYProperty

      public String getYProperty()
      Get the Y property (normally the classifier).
      返回:
      Value of the property.
    • setYProperty

      public void setYProperty(String value)
      Set the Y property (normally the classifier).
      参数:
      value - the Y property.
    • YMinTipText

      public String YMinTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getYMin

      public double getYMin()
      Get the value of the minimum of Y.
      返回:
      Value of the minimum of Y.
    • setYMin

      public void setYMin(double value)
      Set the value of the minimum of Y.
      参数:
      value - Value to use as minimum of Y.
    • YMaxTipText

      public String YMaxTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getYMax

      public double getYMax()
      Get the value of the Maximum of Y.
      返回:
      Value of the Maximum of Y.
    • setYMax

      public void setYMax(double value)
      Set the value of the Maximum of Y.
      参数:
      value - Value to use as Maximum of Y.
    • YStepTipText

      public String YStepTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getYStep

      public double getYStep()
      Get the value of the step size for Y.
      返回:
      Value of the step size for Y.
    • setYStep

      public void setYStep(double value)
      Set the value of the step size for Y.
      参数:
      value - Value to use as the step size for Y.
    • YBaseTipText

      public String YBaseTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getYBase

      public double getYBase()
      Get the value of the base for Y.
      返回:
      Value of the base for Y.
    • setYBase

      public void setYBase(double value)
      Set the value of the base for Y.
      参数:
      value - Value to use as the base for Y.
    • YExpressionTipText

      public String YExpressionTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getYExpression

      public String getYExpression()
      Get the expression for the Y value.
      返回:
      Expression for the Y value.
    • setYExpression

      public void setYExpression(String value)
      Set the expression for the Y value.
      参数:
      value - Expression for the Y value.
    • XPropertyTipText

      public String XPropertyTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getXProperty

      public String getXProperty()
      Get the X property to test (normally the filter).
      返回:
      Value of the X property.
    • setXProperty

      public void setXProperty(String value)
      Set the X property.
      参数:
      value - the X property.
    • XMinTipText

      public String XMinTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getXMin

      public double getXMin()
      Get the value of the minimum of X.
      返回:
      Value of the minimum of X.
    • setXMin

      public void setXMin(double value)
      Set the value of the minimum of X.
      参数:
      value - Value to use as minimum of X.
    • XMaxTipText

      public String XMaxTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getXMax

      public double getXMax()
      Get the value of the Maximum of X.
      返回:
      Value of the Maximum of X.
    • setXMax

      public void setXMax(double value)
      Set the value of the Maximum of X.
      参数:
      value - Value to use as Maximum of X.
    • XStepTipText

      public String XStepTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getXStep

      public double getXStep()
      Get the value of the step size for X.
      返回:
      Value of the step size for X.
    • setXStep

      public void setXStep(double value)
      Set the value of the step size for X.
      参数:
      value - Value to use as the step size for X.
    • XBaseTipText

      public String XBaseTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getXBase

      public double getXBase()
      Get the value of the base for X.
      返回:
      Value of the base for X.
    • setXBase

      public void setXBase(double value)
      Set the value of the base for X.
      参数:
      value - Value to use as the base for X.
    • XExpressionTipText

      public String XExpressionTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getXExpression

      public String getXExpression()
      Get the expression for the X value.
      返回:
      Expression for the X value.
    • setXExpression

      public void setXExpression(String value)
      Set the expression for the X value.
      参数:
      value - Expression for the X value.
    • gridIsExtendableTipText

      public String gridIsExtendableTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getGridIsExtendable

      public boolean getGridIsExtendable()
      Get whether the grid can be extended dynamically.
      返回:
      true if the grid can be extended.
    • setGridIsExtendable

      public void setGridIsExtendable(boolean value)
      Set whether the grid can be extended dynamically.
      参数:
      value - whether the grid can be extended dynamically.
    • maxGridExtensionsTipText

      public String maxGridExtensionsTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getMaxGridExtensions

      public int getMaxGridExtensions()
      Gets the maximum number of grid extensions, -1 for unlimited.
      返回:
      the max number of grid extensions
    • setMaxGridExtensions

      public void setMaxGridExtensions(int value)
      Sets the maximum number of grid extensions, -1 for unlimited.
      参数:
      value - the maximum of grid extensions.
    • sampleSizePercentTipText

      public String sampleSizePercentTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getSampleSizePercent

      public double getSampleSizePercent()
      Gets the sample size for the initial grid search.
      返回:
      the sample size.
    • setSampleSizePercent

      public void setSampleSizePercent(double value)
      Sets the sample size for the initial grid search.
      参数:
      value - the sample size for the initial grid search.
    • traversalTipText

      public String traversalTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setTraversal

      public void setTraversal(SelectedTag value)
      Sets the type of traversal for the grid.
      参数:
      value - the traversal type
    • getTraversal

      public SelectedTag getTraversal()
      Gets the type of traversal for the grid.
      返回:
      the current traversal type.
    • logFileTipText

      public String logFileTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getLogFile

      public File getLogFile()
      Gets current log file.
      返回:
      the log file.
    • setLogFile

      public void setLogFile(File value)
      Sets the log file to use.
      参数:
      value - the log file.
    • getBestFilter

      public Filter getBestFilter()
      returns the best filter setup
      返回:
      the best filter setup
    • getBestClassifier

      public Classifier getBestClassifier()
      returns the best Classifier setup
      返回:
      the best Classifier setup
    • enumerateMeasures

      public Enumeration enumerateMeasures()
      Returns an enumeration of the measure names.
      指定者:
      enumerateMeasures 在接口中 AdditionalMeasureProducer
      返回:
      an enumeration of the measure names
    • getMeasure

      public double getMeasure(String measureName)
      Returns the value of the named measure
      指定者:
      getMeasure 在接口中 AdditionalMeasureProducer
      参数:
      measureName - the name of the measure to query for its value
      返回:
      the value of the named measure
    • getValues

      public weka.classifiers.meta.GridSearch.PointDouble getValues()
      returns the parameter pair that was found to work best
      返回:
      the best parameter combination
    • getGridExtensionsPerformed

      public int getGridExtensionsPerformed()
      returns the number of grid extensions that took place during the search (only applicable if the grid was extendable).
      返回:
      the number of grid extensions that were performed
      另请参阅:
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      指定者:
      getCapabilities 在接口中 CapabilitiesHandler
      覆盖:
      getCapabilities 在类中 SingleClassifierEnhancer
      返回:
      the capabilities of this classifier
      另请参阅:
    • buildClassifier

      public void buildClassifier(Instances data) throws Exception
      builds the classifier
      指定者:
      buildClassifier 在类中 Classifier
      参数:
      data - the training instances
      抛出:
      Exception - if something goes wrong
    • distributionForInstance

      public double[] distributionForInstance(Instance instance) throws Exception
      Computes the distribution for a given instance
      覆盖:
      distributionForInstance 在类中 Classifier
      参数:
      instance - the instance for which distribution is computed
      返回:
      the distribution
      抛出:
      Exception - if the distribution can't be computed successfully
    • toString

      public String toString()
      returns a string representation of the classifier
      覆盖:
      toString 在类中 Object
      返回:
      a string representation of the classifier
    • toSummaryString

      public String toSummaryString()
      Returns a string that summarizes the object.
      指定者:
      toSummaryString 在接口中 Summarizable
      返回:
      the object summarized as a string
    • getRevision

      public String getRevision()
      Returns the revision string.
      指定者:
      getRevision 在接口中 RevisionHandler
      覆盖:
      getRevision 在类中 Classifier
      返回:
      the revision
    • main

      public static void main(String[] args)
      Main method for running this classifier from commandline.
      参数:
      args - the options