Package weka.classifiers.mi
Class MINND
- java.lang.Object
-
- weka.classifiers.Classifier
-
- weka.classifiers.mi.MINND
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.Cloneable
,CapabilitiesHandler
,MultiInstanceCapabilitiesHandler
,OptionHandler
,RevisionHandler
,TechnicalInformationHandler
public class MINND extends Classifier implements OptionHandler, MultiInstanceCapabilitiesHandler, TechnicalInformationHandler
Multiple-Instance Nearest Neighbour with Distribution learner.
It uses gradient descent to find the weight for each dimension of each exeamplar from the starting point of 1.0. In order to avoid overfitting, it uses mean-square function (i.e. the Euclidean distance) to search for the weights.
It then uses the weights to cleanse the training data. After that it searches for the weights again from the starting points of the weights searched before.
Finally it uses the most updated weights to cleanse the test exemplar and then finds the nearest neighbour of the test exemplar using partly-weighted Kullback distance. But the variances in the Kullback distance are the ones before cleansing.
For more information see:
Xin Xu (2001). A nearest distribution approach to multiple-instance learning. Hamilton, NZ. BibTeX:@misc{Xu2001, address = {Hamilton, NZ}, author = {Xin Xu}, note = {0657.591B}, school = {University of Waikato}, title = {A nearest distribution approach to multiple-instance learning}, year = {2001} }
Valid options are:-K <number of neighbours> Set number of nearest neighbour for prediction (default 1)
-S <number of neighbours> Set number of nearest neighbour for cleansing the training data (default 1)
-E <number of neighbours> Set number of nearest neighbour for cleansing the testing data (default 1)
- Version:
- $Revision: 9144 $
- Author:
- Xin Xu (xx5@cs.waikato.ac.nz)
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description MINND()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
buildClassifier(Instances exs)
As normal Nearest Neighbour algorithm does, it's lazy and simply records the exemplar information (i.e.double
classifyInstance(Instance ex)
Use Kullback Leibler distance to find the nearest neighbours of the given exemplar.Instance
cleanse(Instance before)
Cleanse the given exemplar according to the valid and noise data statisticsvoid
findWeights(int row, double[][] mean)
Use gradient descent to distort the MU parameter for the exemplar.Capabilities
getCapabilities()
Returns default capabilities of the classifier.Capabilities
getMultiInstanceCapabilities()
Returns the capabilities of this multi-instance classifier for the relational data.int
getNumNeighbours()
Returns the number of nearest neighbours to estimate the class prediction of tests bagsint
getNumTestingNoises()
Returns The number of nearest neighbour instances in the selection of noises in the test dataint
getNumTrainingNoises()
Returns the number of nearest neighbour instances in the selection of noises in the training datajava.lang.String[]
getOptions()
Gets the current settings of the Classifier.java.lang.String
getRevision()
Returns the revision string.TechnicalInformation
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.java.lang.String
globalInfo()
Returns a string describing this filterdouble
kullback(double[] mu1, double[] mu2, double[] var1, double[] var2, int pos)
This function calculates the Kullback Leibler distance between two normal distributions.java.util.Enumeration
listOptions()
Returns an enumeration describing the available optionsstatic void
main(java.lang.String[] args)
Main method for testing.java.lang.String
numNeighboursTipText()
Returns the tip text for this propertyjava.lang.String
numTestingNoisesTipText()
Returns the tip text for this propertyjava.lang.String
numTrainingNoisesTipText()
Returns the tip text for this propertyInstance
preprocess(Instances data, int pos)
Pre-process the given exemplar according to the other exemplars in the given exemplars.void
setNumNeighbours(int numNeighbour)
Sets the number of nearest neighbours to estimate the class prediction of tests bagsvoid
setNumTestingNoises(int numTesting)
Sets The number of nearest neighbour exemplars in the selection of noises in the test datavoid
setNumTrainingNoises(int numTraining)
Sets the number of nearest neighbour instances in the selection of noises in the training datavoid
setOptions(java.lang.String[] options)
Parses a given list of options.double
target(double[] x, double[][] X, int rowpos, double[] Y)
Compute the target function to minimize in gradient descent The formula is:
1/2*sum[i=1..p](f(X, Xi)-var(Y, Yi))^2-
Methods inherited from class weka.classifiers.Classifier
debugTipText, distributionForInstance, forName, getDebug, makeCopies, makeCopy, setDebug
-
-
-
-
Method Detail
-
globalInfo
public java.lang.String globalInfo()
Returns a string describing this filter- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
getCapabilities
public Capabilities getCapabilities()
Returns default capabilities of the classifier.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classClassifier
- Returns:
- the capabilities of this classifier
- See Also:
Capabilities
-
getMultiInstanceCapabilities
public Capabilities getMultiInstanceCapabilities()
Returns the capabilities of this multi-instance classifier for the relational data.- Specified by:
getMultiInstanceCapabilities
in interfaceMultiInstanceCapabilitiesHandler
- Returns:
- the capabilities of this object
- See Also:
Capabilities
-
buildClassifier
public void buildClassifier(Instances exs) throws java.lang.Exception
As normal Nearest Neighbour algorithm does, it's lazy and simply records the exemplar information (i.e. mean and variance for each dimension of each exemplar and their classes) when building the model. There is actually no need to store the exemplars themselves.- Specified by:
buildClassifier
in classClassifier
- Parameters:
exs
- the training exemplars- Throws:
java.lang.Exception
- if the model cannot be built properly
-
preprocess
public Instance preprocess(Instances data, int pos) throws java.lang.Exception
Pre-process the given exemplar according to the other exemplars in the given exemplars. It also updates noise data statistics.- Parameters:
data
- the whole exemplarspos
- the position of given exemplar in data- Returns:
- the processed exemplar
- Throws:
java.lang.Exception
- if the returned exemplar is wrong
-
findWeights
public void findWeights(int row, double[][] mean)
Use gradient descent to distort the MU parameter for the exemplar. The exemplar can be in the specified row in the given matrix, which has numExemplar rows and numDimension columns; or not in the matrix.- Parameters:
row
- the given row indexmean
-
-
target
public double target(double[] x, double[][] X, int rowpos, double[] Y)
Compute the target function to minimize in gradient descent The formula is:
1/2*sum[i=1..p](f(X, Xi)-var(Y, Yi))^2 where p is the number of exemplars and Y is the class label. In the case of X=MU, f() is the Euclidean distance between two exemplars together with the related weights and var() is sqrt(numDimension)*(Y-Yi) where Y-Yi is either 0 (when Y==Yi) or 1 (Y!=Yi)- Parameters:
x
- the weights of the exemplar in questionrowpos
- row index of x in XY
- the observed class label- Returns:
- the result of the target function
-
classifyInstance
public double classifyInstance(Instance ex) throws java.lang.Exception
Use Kullback Leibler distance to find the nearest neighbours of the given exemplar. It also uses K-Nearest Neighbour algorithm to classify the test exemplar- Overrides:
classifyInstance
in classClassifier
- Parameters:
ex
- the given test exemplar- Returns:
- the classification
- Throws:
java.lang.Exception
- if the exemplar could not be classified successfully
-
cleanse
public Instance cleanse(Instance before) throws java.lang.Exception
Cleanse the given exemplar according to the valid and noise data statistics- Parameters:
before
- the given exemplar- Returns:
- the processed exemplar
- Throws:
java.lang.Exception
- if the returned exemplar is wrong
-
kullback
public double kullback(double[] mu1, double[] mu2, double[] var1, double[] var2, int pos)
This function calculates the Kullback Leibler distance between two normal distributions. This distance is always positive. Kullback Leibler distance = integral{f(X)ln(f(X)/g(X))} Note that X is a vector. Since we assume dimensions are independent f(X)(g(X) the same) is actually the product of normal density functions of each dimensions. Also note that it should be log2 instead of (ln) in the formula, but we use (ln) simply for computational convenience. The result is as follows, suppose there are P dimensions, and f(X) is the first distribution and g(X) is the second: Kullback = sum[1..P](ln(SIGMA2/SIGMA1)) + sum[1..P](SIGMA1^2 / (2*(SIGMA2^2))) + sum[1..P]((MU1-MU2)^2 / (2*(SIGMA2^2))) - P/2- Parameters:
mu1
- mu of the first normal distributionmu2
- mu of the second normal distributionvar1
- variance(SIGMA^2) of the first normal distributionvar2
- variance(SIGMA^2) of the second normal distribution- Returns:
- the Kullback distance of two distributions
-
listOptions
public java.util.Enumeration listOptions()
Returns an enumeration describing the available options- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classClassifier
- Returns:
- an enumeration of all the available options
-
setOptions
public void setOptions(java.lang.String[] options) throws java.lang.Exception
Parses a given list of options. Valid options are:-K <number of neighbours> Set number of nearest neighbour for prediction (default 1)
-S <number of neighbours> Set number of nearest neighbour for cleansing the training data (default 1)
-E <number of neighbours> Set number of nearest neighbour for cleansing the testing data (default 1)
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classClassifier
- Parameters:
options
- the list of options as an array of strings- Throws:
java.lang.Exception
- if an option is not supported
-
getOptions
public java.lang.String[] getOptions()
Gets the current settings of the Classifier.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classClassifier
- Returns:
- an array of strings suitable for passing to setOptions
-
numNeighboursTipText
public java.lang.String numNeighboursTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNumNeighbours
public void setNumNeighbours(int numNeighbour)
Sets the number of nearest neighbours to estimate the class prediction of tests bags- Parameters:
numNeighbour
- the number of citers
-
getNumNeighbours
public int getNumNeighbours()
Returns the number of nearest neighbours to estimate the class prediction of tests bags- Returns:
- the number of neighbours
-
numTrainingNoisesTipText
public java.lang.String numTrainingNoisesTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNumTrainingNoises
public void setNumTrainingNoises(int numTraining)
Sets the number of nearest neighbour instances in the selection of noises in the training data- Parameters:
numTraining
- the number of noises in training data
-
getNumTrainingNoises
public int getNumTrainingNoises()
Returns the number of nearest neighbour instances in the selection of noises in the training data- Returns:
- the number of noises in training data
-
numTestingNoisesTipText
public java.lang.String numTestingNoisesTipText()
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNumTestingNoises
public int getNumTestingNoises()
Returns The number of nearest neighbour instances in the selection of noises in the test data- Returns:
- the number of noises in test data
-
setNumTestingNoises
public void setNumTestingNoises(int numTesting)
Sets The number of nearest neighbour exemplars in the selection of noises in the test data- Parameters:
numTesting
- the number of noises in test data
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classClassifier
- Returns:
- the revision
-
main
public static void main(java.lang.String[] args)
Main method for testing.- Parameters:
args
- the options for the classifier
-
-