Class sIB

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, Clusterer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

    public class sIB
    extends RandomizableClusterer
    implements TechnicalInformationHandler
    Cluster data using the sequential information bottleneck algorithm.

    Note: only hard clustering scheme is supported. sIB assign for each instance the cluster that have the minimum cost/distance to the instance. The trade-off beta is set to infinite so 1/beta is zero.

    For more information, see:

    Noam Slonim, Nir Friedman, Naftali Tishby: Unsupervised document classification using sequential information maximization. In: Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval, 129-136, 2002.

    BibTeX:

     @inproceedings{Slonim2002,
        author = {Noam Slonim and Nir Friedman and Naftali Tishby},
        booktitle = {Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval},
        pages = {129-136},
        title = {Unsupervised document classification using sequential information maximization},
        year = {2002}
     }
     

    Valid options are:

     -I <num>
      maximum number of iterations
      (default 100).
     -M <num>
      minimum number of changes in a single iteration
      (default 0).
     -N <num>
      number of clusters.
      (default 2).
     -R <num>
      number of restarts.
      (default 5).
     -U
      set not to normalize the data
      (default true).
     -V
      set to output debug info
      (default false).
     -S <num>
      Random number seed.
      (default 1)
    Version:
    $Revision: 5538 $
    Author:
    Noam Slonim, Anna Huang
    See Also:
    Serialized Form
    • Constructor Detail

      • sIB

        public sIB()
    • Method Detail

      • buildClusterer

        public void buildClusterer​(Instances data)
                            throws java.lang.Exception
        Generates a clusterer.
        Specified by:
        buildClusterer in interface Clusterer
        Specified by:
        buildClusterer in class AbstractClusterer
        Parameters:
        data - the training instances
        Throws:
        java.lang.Exception - if something goes wrong
      • clusterInstance

        public int clusterInstance​(Instance instance)
                            throws java.lang.Exception
        Cluster a given instance, this is the method defined in Clusterer interface do nothing but just return the cluster assigned to it
        Specified by:
        clusterInstance in interface Clusterer
        Overrides:
        clusterInstance in class AbstractClusterer
        Parameters:
        instance - the instance to be assigned to a cluster
        Returns:
        the number of the assigned cluster as an integer
        Throws:
        java.lang.Exception - if instance could not be clustered successfully
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -I <num>
          maximum number of iterations
          (default 100).
         -M <num>
          minimum number of changes in a single iteration
          (default 0).
         -N <num>
          number of clusters.
          (default 2).
         -R <num>
          number of restarts.
          (default 5).
         -U
          set not to normalize the data
          (default true).
         -V
          set to output debug info
          (default false).
         -S <num>
          Random number seed.
          (default 1)
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class RandomizableClusterer
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Overrides:
        listOptions in class RandomizableClusterer
        Returns:
        an enumeration of all the available options.
      • debugTipText

        public java.lang.String debugTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDebug

        public void setDebug​(boolean v)
        Set debug mode - verbose output
        Parameters:
        v - true for verbose output
      • getDebug

        public boolean getDebug()
        Get debug mode
        Returns:
        true if debug mode is set
      • maxIterationsTipText

        public java.lang.String maxIterationsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setMaxIterations

        public void setMaxIterations​(int i)
        Set the max number of iterations
        Parameters:
        i - max number of iterations
      • getMaxIterations

        public int getMaxIterations()
        Get the max number of iterations
        Returns:
        max number of iterations
      • minChangeTipText

        public java.lang.String minChangeTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setMinChange

        public void setMinChange​(int m)
        set the minimum number of changes
        Parameters:
        m - the minimum number of changes
      • getMinChange

        public int getMinChange()
        get the minimum number of changes
        Returns:
        the minimum number of changes
      • numClustersTipText

        public java.lang.String numClustersTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setNumClusters

        public void setNumClusters​(int n)
        Set the number of clusters
        Parameters:
        n - number of clusters
      • getNumClusters

        public int getNumClusters()
        Get the number of clusters
        Returns:
        the number of clusters
      • numRestartsTipText

        public java.lang.String numRestartsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setNumRestarts

        public void setNumRestarts​(int i)
        Set the number of restarts
        Parameters:
        i - number of restarts
      • getNumRestarts

        public int getNumRestarts()
        Get the number of restarts
        Returns:
        number of restarts
      • notUnifyNormTipText

        public java.lang.String notUnifyNormTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property
      • setNotUnifyNorm

        public void setNotUnifyNorm​(boolean b)
        Set whether to normalize instances to unify prior probability before building the clusterer
        Parameters:
        b - true to normalize, otherwise false
      • getNotUnifyNorm

        public boolean getNotUnifyNorm()
        Get whether to normalize instances to unify prior probability before building the clusterer
        Returns:
        true if set to normalize, false otherwise
      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this clusterer
        Returns:
        a description of the clusterer suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • main

        public static void main​(java.lang.String[] argv)