类 MINND

所有已实现的接口:
Serializable, Cloneable, CapabilitiesHandler, MultiInstanceCapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler

Multiple-Instance Nearest Neighbour with Distribution learner.

It uses gradient descent to find the weight for each dimension of each exeamplar from the starting point of 1.0. In order to avoid overfitting, it uses mean-square function (i.e. the Euclidean distance) to search for the weights.
It then uses the weights to cleanse the training data. After that it searches for the weights again from the starting points of the weights searched before.
Finally it uses the most updated weights to cleanse the test exemplar and then finds the nearest neighbour of the test exemplar using partly-weighted Kullback distance. But the variances in the Kullback distance are the ones before cleansing.

For more information see:

Xin Xu (2001). A nearest distribution approach to multiple-instance learning. Hamilton, NZ.

BibTeX:

 @misc{Xu2001,
    address = {Hamilton, NZ},
    author = {Xin Xu},
    note = {0657.591B},
    school = {University of Waikato},
    title = {A nearest distribution approach to multiple-instance learning},
    year = {2001}
 }
 

Valid options are:

 -K <number of neighbours>
  Set number of nearest neighbour for prediction
  (default 1)
 -S <number of neighbours>
  Set number of nearest neighbour for cleansing the training data
  (default 1)
 -E <number of neighbours>
  Set number of nearest neighbour for cleansing the testing data
  (default 1)
版本:
$Revision: 9144 $
作者:
Xin Xu (xx5@cs.waikato.ac.nz)
另请参阅:
  • 构造器详细资料

    • MINND

      public MINND()
  • 方法详细资料

    • globalInfo

      public String globalInfo()
      Returns a string describing this filter
      返回:
      a description of the filter suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      指定者:
      getTechnicalInformation 在接口中 TechnicalInformationHandler
      返回:
      the technical information about this class
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      指定者:
      getCapabilities 在接口中 CapabilitiesHandler
      覆盖:
      getCapabilities 在类中 Classifier
      返回:
      the capabilities of this classifier
      另请参阅:
    • getMultiInstanceCapabilities

      public Capabilities getMultiInstanceCapabilities()
      Returns the capabilities of this multi-instance classifier for the relational data.
      指定者:
      getMultiInstanceCapabilities 在接口中 MultiInstanceCapabilitiesHandler
      返回:
      the capabilities of this object
      另请参阅:
    • buildClassifier

      public void buildClassifier(Instances exs) throws Exception
      As normal Nearest Neighbour algorithm does, it's lazy and simply records the exemplar information (i.e. mean and variance for each dimension of each exemplar and their classes) when building the model. There is actually no need to store the exemplars themselves.
      指定者:
      buildClassifier 在类中 Classifier
      参数:
      exs - the training exemplars
      抛出:
      Exception - if the model cannot be built properly
    • preprocess

      public Instance preprocess(Instances data, int pos) throws Exception
      Pre-process the given exemplar according to the other exemplars in the given exemplars. It also updates noise data statistics.
      参数:
      data - the whole exemplars
      pos - the position of given exemplar in data
      返回:
      the processed exemplar
      抛出:
      Exception - if the returned exemplar is wrong
    • findWeights

      public void findWeights(int row, double[][] mean)
      Use gradient descent to distort the MU parameter for the exemplar. The exemplar can be in the specified row in the given matrix, which has numExemplar rows and numDimension columns; or not in the matrix.
      参数:
      row - the given row index
      mean -
    • target

      public double target(double[] x, double[][] X, int rowpos, double[] Y)
      Compute the target function to minimize in gradient descent The formula is:
      1/2*sum[i=1..p](f(X, Xi)-var(Y, Yi))^2

      where p is the number of exemplars and Y is the class label. In the case of X=MU, f() is the Euclidean distance between two exemplars together with the related weights and var() is sqrt(numDimension)*(Y-Yi) where Y-Yi is either 0 (when Y==Yi) or 1 (Y!=Yi)

      参数:
      x - the weights of the exemplar in question
      rowpos - row index of x in X
      Y - the observed class label
      返回:
      the result of the target function
    • classifyInstance

      public double classifyInstance(Instance ex) throws Exception
      Use Kullback Leibler distance to find the nearest neighbours of the given exemplar. It also uses K-Nearest Neighbour algorithm to classify the test exemplar
      覆盖:
      classifyInstance 在类中 Classifier
      参数:
      ex - the given test exemplar
      返回:
      the classification
      抛出:
      Exception - if the exemplar could not be classified successfully
    • cleanse

      public Instance cleanse(Instance before) throws Exception
      Cleanse the given exemplar according to the valid and noise data statistics
      参数:
      before - the given exemplar
      返回:
      the processed exemplar
      抛出:
      Exception - if the returned exemplar is wrong
    • kullback

      public double kullback(double[] mu1, double[] mu2, double[] var1, double[] var2, int pos)
      This function calculates the Kullback Leibler distance between two normal distributions. This distance is always positive. Kullback Leibler distance = integral{f(X)ln(f(X)/g(X))} Note that X is a vector. Since we assume dimensions are independent f(X)(g(X) the same) is actually the product of normal density functions of each dimensions. Also note that it should be log2 instead of (ln) in the formula, but we use (ln) simply for computational convenience. The result is as follows, suppose there are P dimensions, and f(X) is the first distribution and g(X) is the second: Kullback = sum[1..P](ln(SIGMA2/SIGMA1)) + sum[1..P](SIGMA1^2 / (2*(SIGMA2^2))) + sum[1..P]((MU1-MU2)^2 / (2*(SIGMA2^2))) - P/2
      参数:
      mu1 - mu of the first normal distribution
      mu2 - mu of the second normal distribution
      var1 - variance(SIGMA^2) of the first normal distribution
      var2 - variance(SIGMA^2) of the second normal distribution
      返回:
      the Kullback distance of two distributions
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options
      指定者:
      listOptions 在接口中 OptionHandler
      覆盖:
      listOptions 在类中 Classifier
      返回:
      an enumeration of all the available options
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -K <number of neighbours>
        Set number of nearest neighbour for prediction
        (default 1)
       -S <number of neighbours>
        Set number of nearest neighbour for cleansing the training data
        (default 1)
       -E <number of neighbours>
        Set number of nearest neighbour for cleansing the testing data
        (default 1)
      指定者:
      setOptions 在接口中 OptionHandler
      覆盖:
      setOptions 在类中 Classifier
      参数:
      options - the list of options as an array of strings
      抛出:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the Classifier.
      指定者:
      getOptions 在接口中 OptionHandler
      覆盖:
      getOptions 在类中 Classifier
      返回:
      an array of strings suitable for passing to setOptions
    • numNeighboursTipText

      public String numNeighboursTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumNeighbours

      public void setNumNeighbours(int numNeighbour)
      Sets the number of nearest neighbours to estimate the class prediction of tests bags
      参数:
      numNeighbour - the number of citers
    • getNumNeighbours

      public int getNumNeighbours()
      Returns the number of nearest neighbours to estimate the class prediction of tests bags
      返回:
      the number of neighbours
    • numTrainingNoisesTipText

      public String numTrainingNoisesTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumTrainingNoises

      public void setNumTrainingNoises(int numTraining)
      Sets the number of nearest neighbour instances in the selection of noises in the training data
      参数:
      numTraining - the number of noises in training data
    • getNumTrainingNoises

      public int getNumTrainingNoises()
      Returns the number of nearest neighbour instances in the selection of noises in the training data
      返回:
      the number of noises in training data
    • numTestingNoisesTipText

      public String numTestingNoisesTipText()
      Returns the tip text for this property
      返回:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getNumTestingNoises

      public int getNumTestingNoises()
      Returns The number of nearest neighbour instances in the selection of noises in the test data
      返回:
      the number of noises in test data
    • setNumTestingNoises

      public void setNumTestingNoises(int numTesting)
      Sets The number of nearest neighbour exemplars in the selection of noises in the test data
      参数:
      numTesting - the number of noises in test data
    • getRevision

      public String getRevision()
      Returns the revision string.
      指定者:
      getRevision 在接口中 RevisionHandler
      覆盖:
      getRevision 在类中 Classifier
      返回:
      the revision
    • main

      public static void main(String[] args)
      Main method for testing.
      参数:
      args - the options for the classifier