类 JRip

java.lang.Object
weka.classifiers.Classifier
weka.classifiers.rules.JRip
所有已实现的接口:
Serializable, Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

This class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized version of IREP.

The algorithm is briefly described as follows:

Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:

1. Building stage:
Repeat 1.1 and 1.2 until the descrition length (DL) of the ruleset and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.

1.1. Grow phase:
Grow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e. 100% accurate). The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).

1.2. Prune phase:
Incrementally prune each rule and allow the pruning of any final sequences of the antecedents;The pruning metric is (p-n)/(p+n) -- but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).

2. Optimization stage:
after generating the initial ruleset {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and 1.2. But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule. Moreover, the pruning metric used here is (TP+TN)/(P+N).Then the smallest possible DL for each variant and the original rule is computed. The variant with the minimal DL is selected as the final representative of Ri in the ruleset.After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again.
3. Delete the rules from the ruleset that would increase the DL of the whole ruleset if it were in it. and add resultant ruleset to RS.
ENDDO

Note that there seem to be 2 bugs in the original ripper program that would affect the ruleset size and accuracy slightly. This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation. Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances.

Details please see:

William W. Cohen: Fast Effective Rule Induction. In: Twelfth International Conference on Machine Learning, 115-123, 1995.

PS. We have compared this implementation with the original ripper implementation in aspects of accuracy, ruleset size and running time on both artificial data "ab+bcd+defg" and UCI datasets. In all these aspects it seems to be quite comparable to the original ripper implementation. However, we didn't consider memory consumption optimization in this implementation.

BibTeX:

 @inproceedings{Cohen1995,
    author = {William W. Cohen},
    booktitle = {Twelfth International Conference on Machine Learning},
    pages = {115-123},
    publisher = {Morgan Kaufmann},
    title = {Fast Effective Rule Induction},
    year = {1995}
 }
 

Valid options are:

 -F <number of folds>
  Set number of folds for REP
  One fold is used as pruning set.
  (default 3)
 -N <min. weights>
  Set the minimal weights of instances
  within a split.
  (default 2.0)
 -O <number of runs>
  Set the number of runs of
  optimizations. (Default: 2)
 -D
  Set whether turn on the
  debug mode (Default: false)
 -S <seed>
  The seed of randomization
  (Default: 1)
 -E
  Whether NOT check the error rate>=0.5
  in stopping criteria  (default: check)
 -P
  Whether NOT use pruning
  (default: use pruning)
版本:
$Revision: 8119 $
作者:
Xin Xu (xx5@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
另请参阅:
  • 构造器详细资料

    • JRip

      public JRip()
  • 方法详细资料

    • globalInfo

      public String globalInfo()
      Returns a string describing classifier
      返回:
      a description suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      指定者:
      getTechnicalInformation 在接口中 TechnicalInformationHandler
      返回:
      the technical information about this class
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options Valid options are:

      -F number
      The number of folds for reduced error pruning. One fold is used as the pruning set. (Default: 3)

      -N number
      The minimal weights of instances within a split. (Default: 2)

      -O number
      Set the number of runs of optimizations. (Default: 2)

      -D
      Whether turn on the debug mode -S number
      The seed of randomization used in Ripper.(Default: 1)

      -E
      Whether NOT check the error rate >= 0.5 in stopping criteria. (default: check)

      -P
      Whether NOT use pruning. (default: use pruning)

      指定者:
      listOptions 在接口中 OptionHandler
      覆盖:
      listOptions 在类中 Classifier
      返回:
      an enumeration of all the available options
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -F <number of folds>
        Set number of folds for REP
        One fold is used as pruning set.
        (default 3)
       -N <min. weights>
        Set the minimal weights of instances
        within a split.
        (default 2.0)
       -O <number of runs>
        Set the number of runs of
        optimizations. (Default: 2)
       -D
        Set whether turn on the
        debug mode (Default: false)
       -S <seed>
        The seed of randomization
        (Default: 1)
       -E
        Whether NOT check the error rate>=0.5
        in stopping criteria  (default: check)
       -P
        Whether NOT use pruning
        (default: use pruning)
      指定者:
      setOptions 在接口中 OptionHandler
      覆盖:
      setOptions 在类中 Classifier
      参数:
      options - the list of options as an array of strings
      抛出:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the Classifier.
      指定者:
      getOptions 在接口中 OptionHandler
      覆盖:
      getOptions 在类中 Classifier
      返回:
      an array of strings suitable for passing to setOptions
    • enumerateMeasures

      public Enumeration enumerateMeasures()
      Returns an enumeration of the additional measure names
      指定者:
      enumerateMeasures 在接口中 AdditionalMeasureProducer
      返回:
      an enumeration of the measure names
    • getMeasure

      public double getMeasure(String additionalMeasureName)
      Returns the value of the named measure
      指定者:
      getMeasure 在接口中 AdditionalMeasureProducer
      参数:
      additionalMeasureName - the name of the measure to query for its value
      返回:
      the value of the named measure
      抛出:
      IllegalArgumentException - if the named measure is not supported
    • foldsTipText

      public String foldsTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setFolds

      public void setFolds(int fold)
      Sets the number of folds to use
      参数:
      fold - the number of folds
    • getFolds

      public int getFolds()
      Gets the number of folds
      返回:
      the number of folds
    • minNoTipText

      public String minNoTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMinNo

      public void setMinNo(double m)
      Sets the minimum total weight of the instances in a rule
      参数:
      m - the minimum total weight of the instances in a rule
    • getMinNo

      public double getMinNo()
      Gets the minimum total weight of the instances in a rule
      返回:
      the minimum total weight of the instances in a rule
    • seedTipText

      public String seedTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSeed

      public void setSeed(long s)
      Sets the seed value to use in randomizing the data
      参数:
      s - the new seed value
    • getSeed

      public long getSeed()
      Gets the current seed value to use in randomizing the data
      返回:
      the seed value
    • optimizationsTipText

      public String optimizationsTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setOptimizations

      public void setOptimizations(int run)
      Sets the number of optimization runs
      参数:
      run - the number of optimization runs
    • getOptimizations

      public int getOptimizations()
      Gets the the number of optimization runs
      返回:
      the number of optimization runs
    • debugTipText

      public String debugTipText()
      Returns the tip text for this property
      覆盖:
      debugTipText 在类中 Classifier
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDebug

      public void setDebug(boolean d)
      Sets whether debug information is output to the console
      覆盖:
      setDebug 在类中 Classifier
      参数:
      d - whether debug information is output to the console
    • getDebug

      public boolean getDebug()
      Gets whether debug information is output to the console
      覆盖:
      getDebug 在类中 Classifier
      返回:
      whether debug information is output to the console
    • checkErrorRateTipText

      public String checkErrorRateTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setCheckErrorRate

      public void setCheckErrorRate(boolean d)
      Sets whether to check for error rate is in stopping criterion
      参数:
      d - whether to check for error rate is in stopping criterion
    • getCheckErrorRate

      public boolean getCheckErrorRate()
      Gets whether to check for error rate is in stopping criterion
      返回:
      true if checking for error rate is in stopping criterion
    • usePruningTipText

      public String usePruningTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setUsePruning

      public void setUsePruning(boolean d)
      Sets whether pruning is performed
      参数:
      d - Whether pruning is performed
    • getUsePruning

      public boolean getUsePruning()
      Gets whether pruning is performed
      返回:
      true if pruning is performed
    • getRuleset

      public FastVector getRuleset()
      Get the ruleset generated by Ripper
      返回:
      the ruleset
    • getRuleStats

      public RuleStats getRuleStats(int pos)
      Get the statistics of the ruleset in the given position
      参数:
      pos - the position of the stats, assuming correct
      返回:
      the statistics of the ruleset in the given position
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      指定者:
      getCapabilities 在接口中 CapabilitiesHandler
      覆盖:
      getCapabilities 在类中 Classifier
      返回:
      the capabilities of this classifier
      另请参阅:
    • buildClassifier

      public void buildClassifier(Instances instances) throws Exception
      Builds Ripper in the order of class frequencies. For each class it's built in two stages: building and optimization
      指定者:
      buildClassifier 在类中 Classifier
      参数:
      instances - the training data
      抛出:
      Exception - if classifier can't be built successfully
    • distributionForInstance

      public double[] distributionForInstance(Instance datum)
      Classify the test instance with the rule learner and provide the class distributions
      覆盖:
      distributionForInstance 在类中 Classifier
      参数:
      datum - the instance to be classified
      返回:
      the distribution
    • toString

      public String toString()
      Prints the all the rules of the rule learner.
      覆盖:
      toString 在类中 Object
      返回:
      a textual description of the classifier
    • getRevision

      public String getRevision()
      Returns the revision string.
      指定者:
      getRevision 在接口中 RevisionHandler
      覆盖:
      getRevision 在类中 Classifier
      返回:
      the revision
    • main

      public static void main(String[] args)
      Main method.
      参数:
      args - the options for the classifier