类 BFTree

所有已实现的接口:
Serializable, Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

Class for building a best-first decision tree classifier. This class uses binary split for both nominal and numeric attributes. For missing values, the method of 'fractional' instances is used.

For more information, see:

Haijian Shi (2007). Best-first decision tree learning. Hamilton, NZ.

Jerome Friedman, Trevor Hastie, Robert Tibshirani (2000). Additive logistic regression : A statistical view of boosting. Annals of statistics. 28(2):337-407.

BibTeX:

 @mastersthesis{Shi2007,
    address = {Hamilton, NZ},
    author = {Haijian Shi},
    note = {COMP594},
    school = {University of Waikato},
    title = {Best-first decision tree learning},
    year = {2007}
 }
 
 @article{Friedman2000,
    author = {Jerome Friedman and Trevor Hastie and Robert Tibshirani},
    journal = {Annals of statistics},
    number = {2},
    pages = {337-407},
    title = {Additive logistic regression : A statistical view of boosting},
    volume = {28},
    year = {2000},
    ISSN = {0090-5364}
 }
 

Valid options are:

 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -P <UNPRUNED|POSTPRUNED|PREPRUNED>
  The pruning strategy.
  (default: POSTPRUNED)
 -M <min no>
  The minimal number of instances at the terminal nodes.
  (default 2)
 -N <num folds>
  The number of folds used in the pruning.
  (default 5)
 -H
  Don't use heuristic search for nominal attributes in multi-class
  problem (default yes).
 
 -G
  Don't use Gini index for splitting (default yes),
  if not information is used.
 -R
  Don't use error rate in internal cross-validation (default yes), 
  but root mean squared error.
 -A
  Use the 1 SE rule to make pruning decision.
  (default no).
 -C
  Percentage of training data size (0-1]
  (default 1).
版本:
$Revision: 6947 $
作者:
Haijian Shi (hs69@cs.waikato.ac.nz)
另请参阅:
  • 字段详细资料

    • PRUNING_UNPRUNED

      public static final int PRUNING_UNPRUNED
      pruning strategy: un-pruned
      另请参阅:
    • PRUNING_POSTPRUNING

      public static final int PRUNING_POSTPRUNING
      pruning strategy: post-pruning
      另请参阅:
    • PRUNING_PREPRUNING

      public static final int PRUNING_PREPRUNING
      pruning strategy: pre-pruning
      另请参阅:
    • TAGS_PRUNING

      public static final Tag[] TAGS_PRUNING
      pruning strategy
  • 构造器详细资料

    • BFTree

      public BFTree()
  • 方法详细资料

    • globalInfo

      public String globalInfo()
      Returns a string describing classifier
      返回:
      a description suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      指定者:
      getTechnicalInformation 在接口中 TechnicalInformationHandler
      返回:
      the technical information about this class
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      指定者:
      getCapabilities 在接口中 CapabilitiesHandler
      覆盖:
      getCapabilities 在类中 Classifier
      返回:
      the capabilities of this classifier
      另请参阅:
    • buildClassifier

      public void buildClassifier(Instances data) throws Exception
      Method for building a BestFirst decision tree classifier.
      指定者:
      buildClassifier 在类中 Classifier
      参数:
      data - set of instances serving as training data
      抛出:
      Exception - if decision tree cannot be built successfully
    • distributionForInstance

      public double[] distributionForInstance(Instance instance) throws Exception
      Computes class probabilities for instance using the decision tree.
      覆盖:
      distributionForInstance 在类中 Classifier
      参数:
      instance - the instance for which class probabilities is to be computed
      返回:
      the class probabilities for the given instance
      抛出:
      Exception - if something goes wrong
    • toString

      public String toString()
      Prints the decision tree using the protected toString method from below.
      覆盖:
      toString 在类中 Object
      返回:
      a textual description of the classifier
    • numNodes

      public int numNodes()
      Compute size of the tree.
      返回:
      size of the tree
    • numLeaves

      public int numLeaves()
      Compute number of leaf nodes.
      返回:
      number of leaf nodes
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      指定者:
      listOptions 在接口中 OptionHandler
      覆盖:
      listOptions 在类中 RandomizableClassifier
      返回:
      an enumeration describing the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses the options for this object.

      Valid options are:

       -S <num>
        Random number seed.
        (default 1)
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
       -P <UNPRUNED|POSTPRUNED|PREPRUNED>
        The pruning strategy.
        (default: POSTPRUNED)
       -M <min no>
        The minimal number of instances at the terminal nodes.
        (default 2)
       -N <num folds>
        The number of folds used in the pruning.
        (default 5)
       -H
        Don't use heuristic search for nominal attributes in multi-class
        problem (default yes).
       
       -G
        Don't use Gini index for splitting (default yes),
        if not information is used.
       -R
        Don't use error rate in internal cross-validation (default yes), 
        but root mean squared error.
       -A
        Use the 1 SE rule to make pruning decision.
        (default no).
       -C
        Percentage of training data size (0-1]
        (default 1).
      指定者:
      setOptions 在接口中 OptionHandler
      覆盖:
      setOptions 在类中 RandomizableClassifier
      参数:
      options - the options to use
      抛出:
      Exception - if setting of options fails
    • getOptions

      public String[] getOptions()
      Gets the current settings of the Classifier.
      指定者:
      getOptions 在接口中 OptionHandler
      覆盖:
      getOptions 在类中 RandomizableClassifier
      返回:
      the current settings of the Classifier
    • enumerateMeasures

      public Enumeration enumerateMeasures()
      Return an enumeration of the measure names.
      指定者:
      enumerateMeasures 在接口中 AdditionalMeasureProducer
      返回:
      an enumeration of the measure names
    • measureTreeSize

      public double measureTreeSize()
      Return number of tree size.
      返回:
      number of tree size
    • getMeasure

      public double getMeasure(String additionalMeasureName)
      Returns the value of the named measure
      指定者:
      getMeasure 在接口中 AdditionalMeasureProducer
      参数:
      additionalMeasureName - the name of the measure to query for its value
      返回:
      the value of the named measure
      抛出:
      IllegalArgumentException - if the named measure is not supported
    • pruningStrategyTipText

      public String pruningStrategyTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setPruningStrategy

      public void setPruningStrategy(SelectedTag value)
      Sets the pruning strategy.
      参数:
      value - the strategy
    • getPruningStrategy

      public SelectedTag getPruningStrategy()
      Gets the pruning strategy.
      返回:
      the current strategy.
    • minNumObjTipText

      public String minNumObjTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMinNumObj

      public void setMinNumObj(int value)
      Set minimal number of instances at the terminal nodes.
      参数:
      value - minimal number of instances at the terminal nodes
    • getMinNumObj

      public int getMinNumObj()
      Get minimal number of instances at the terminal nodes.
      返回:
      minimal number of instances at the terminal nodes
    • numFoldsPruningTipText

      public String numFoldsPruningTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumFoldsPruning

      public void setNumFoldsPruning(int value)
      Set number of folds in internal cross-validation.
      参数:
      value - the number of folds
    • getNumFoldsPruning

      public int getNumFoldsPruning()
      Set number of folds in internal cross-validation.
      返回:
      number of folds in internal cross-validation
    • heuristicTipText

      public String heuristicTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui.
    • setHeuristic

      public void setHeuristic(boolean value)
      Set if use heuristic search for nominal attributes in multi-class problems.
      参数:
      value - if use heuristic search for nominal attributes in multi-class problems
    • getHeuristic

      public boolean getHeuristic()
      Get if use heuristic search for nominal attributes in multi-class problems.
      返回:
      if use heuristic search for nominal attributes in multi-class problems
    • useGiniTipText

      public String useGiniTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui.
    • setUseGini

      public void setUseGini(boolean value)
      Set if use Gini index as splitting criterion.
      参数:
      value - if use Gini index splitting criterion
    • getUseGini

      public boolean getUseGini()
      Get if use Gini index as splitting criterion.
      返回:
      if use Gini index as splitting criterion
    • useErrorRateTipText

      public String useErrorRateTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui.
    • setUseErrorRate

      public void setUseErrorRate(boolean value)
      Set if use error rate in internal cross-validation.
      参数:
      value - if use error rate in internal cross-validation
    • getUseErrorRate

      public boolean getUseErrorRate()
      Get if use error rate in internal cross-validation.
      返回:
      if use error rate in internal cross-validation.
    • useOneSETipText

      public String useOneSETipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui.
    • setUseOneSE

      public void setUseOneSE(boolean value)
      Set if use the 1SE rule to choose final model.
      参数:
      value - if use the 1SE rule to choose final model
    • getUseOneSE

      public boolean getUseOneSE()
      Get if use the 1SE rule to choose final model.
      返回:
      if use the 1SE rule to choose final model
    • sizePerTipText

      public String sizePerTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui.
    • setSizePer

      public void setSizePer(double value)
      Set training set size.
      参数:
      value - training set size
    • getSizePer

      public double getSizePer()
      Get training set size.
      返回:
      training set size
    • getRevision

      public String getRevision()
      Returns the revision string.
      指定者:
      getRevision 在接口中 RevisionHandler
      覆盖:
      getRevision 在类中 Classifier
      返回:
      the revision
    • main

      public static void main(String[] args)
      Main method.
      参数:
      args - the options for the classifier