weka.attributeSelection
Class RerankingSearch

java.lang.Object
  extended by weka.attributeSelection.ASSearch
      extended by weka.attributeSelection.RerankingSearch
All Implemented Interfaces:
java.io.Serializable, weka.core.OptionHandler, weka.core.RevisionHandler

public class RerankingSearch
extends weka.attributeSelection.ASSearch
implements weka.core.OptionHandler

It first creates an univariate ranking of all attributes in decreasing order given an information-theory-based AttributeEvaluator; then, the ranking is split in block of size B, and a ASSearch is run for the first block. Given the selected attributes, the rest of the ranking is re-ranked and ASSearch is run again on the first current block. Search stops when no attribute is selected in current block. For more information, see

Pablo Bermejo et. al. Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking. Knowledge-Based Systems. doi:10.1016/j.knosys.2011.01.015.

BibTeX:

  @article{BermejoRerank,
 author = "Pablo Bermejo and Luis de la Ossa and Jose A. Gamez and Jose M. Puerta",   
 title = "Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking",
 journal = "Knowledge-Based Systems",
 number = "",
 pages = " - ",
 year = "2011",
 issn = "0950-7051",
 doi = "DOI: 10.1016/j.knosys.2011.01.015",
 }
 
 

Valid options are:

  -method 
  Specifies the method used to re-ranking attributes
  (default 0: CMIM)
 
  -blockSize 
  Specifies the size of blocks over which search is performed
  (default 20)
 
 -rankingMeasure 
  information-theory-based univariate attribute evaluator to create first ranking
  (default 0: Information Gain)
 
 -search 
 Class name of ASSearch search algorithm to be used over blocks.
 Place any options of the search algorithm LAST on the command line
 following a "--". eg.:
 -search weka.attributeSelection.GreedyStepwise ... -- -C
 

Version:
$Revision: 1.0 $
Author:
Pablo Bermejo (Pablo.Bermejo@uclm.es)
See Also:
Serialized Form

Field Summary
protected static int CMIM
          Fleuret's Conditional Mutual Information Maximization
protected static int IG
           
protected  weka.attributeSelection.ASEvaluation m_ASEval
          attribute set evaluator applied in search
protected  double[] m_attributes_merits_globalIndexes
          merit of each attribute evaluated by m_univariateEvaluator position i referes to attribute i in training data
protected  int m_B
          Block size
protected  int m_blocksSearched
          Total number of blocks over which search was performed
protected  int m_informationBasedEvaluator
          Uni-varaite Attribute evaluator respect to the class.
protected  int[] m_ranking
          ranking of attributes in decreasing order given m_univariateEvaluator
protected  double m_rerankingTime_ms
          Time (ms) spent in re-ranking
protected  int m_rerankMethod
          Re-rank method
protected  weka.attributeSelection.ASSearch m_searchAlgorithm
          search algorithm applied over block to select attributes
protected  double m_searchTime_ms
          Total time (ms) spent in search
protected  int[] m_selected
          selected attributes
protected static int MIFS
          Battiti's Mutual Information-Based Feature Selection
protected static int MRMR
          Peng's Max-Relevance and Min-Redundancy
(package private) static long serialVersionUID
          for serialisation
protected static int SU
           
static weka.core.Tag[] TAGS_INFORMATION_BASED_EVAL
           
static weka.core.Tag[] TAGS_RERANK
           
 
Constructor Summary
RerankingSearch()
           
 
Method Summary
 java.lang.String bTipText()
          Returns the tip text for this property.
protected  void createUnivariateRanking(weka.core.Instances data)
          It creates a ranking of attributes in decreasing order of merit, given the chosen attribute evaluator.
protected  boolean different(int[] a, int[] b)
          check if two arrays contain the same values, in any order
 int getB()
          get size of blocks over which search is performed
 int getBlocksSearched()
          get number of blocks attributes in ranking over which search has been performed.
protected  double getConditionalMutualInformation(int posX, int posY, int posZ, weka.core.Instances data)
          Computes I(X;Y|Z)
protected  int[] getGlobalIndexes(int[] relative, weka.core.Instances relativeData, weka.core.Instances globalData)
          It converts the indexes of attributes in relative[] refering to relativeData, to the indexes of the refered attributes in globalInstances data.
 weka.core.SelectedTag getInformationBasedEvaluator()
          * Get method used to crate first univariate ranking
protected  double[][] getJointXY(int pX, int pY, weka.core.Instances data)
          Compute joint probability por attribute pX and pY in data
protected  double[][][] getJointXYZ(int pX, int pY, int pZ, weka.core.Instances data)
          Compute joint probability por attribute pX, pY and pZ in data
protected  double[] getMarginalProb(int pos, weka.core.Instances data)
          Computes marginal probability for attribute with index pos, in data
protected  double getMutualInformation(int posX, int posY, weka.core.Instances data)
          Computes I(X;Y)
 java.lang.String[] getOptions()
          get a String[] describing the value set for all options
 double getRerankingTime_ms()
          time in milliseconds spent during the search in re-ranking computations
 weka.core.SelectedTag getRerankMethod()
          Get method used for re-ranking
 weka.attributeSelection.ASSearch getSearchAlgorithm()
          Get a deep copy of the search algorithm used over blocks
 double getSearchTime_ms()
          total time in milliseconds spent during the search
 int[] getSelected()
          get list of attributes selected in search
protected  java.lang.String getStartSetString(int numberSelected)
          Creates a string of attributes indexes separated by commas.
 java.lang.String globalInfo()
          Returns a string describing this search method
 java.lang.String informationBasedEvaluatorTipText()
          Returns the tip text for m_univariteEvaluator
protected  boolean isIn(int n, int[] array)
          Check if value n is in array
 java.util.Enumeration<weka.core.Option> listOptions()
          Returns an enumeration describing the available options.
protected  int[] minMaxOrder(double[][] I_XiC_givenXj)
           
protected  weka.core.Instances projectBlock(weka.core.Instances data)
          It projects data over attributes selected up to know + first m_B attributes in ranking + the class attribute
protected  void rerank(weka.core.Instances data)
          re-rank remaining attributes in m_ranking[] given m_selected[]
protected  void rerankCMIM(weka.core.Instances data)
           
 java.lang.String rerankMethodTipText()
          Returns the tip text for this property.
protected  void rerankMIFS_MRMR(weka.core.Instances data, double factor)
           
 void resetOptions()
          reset all options to their default values
 int[] search(weka.attributeSelection.ASEvaluation ASEval, weka.core.Instances data)
          Performs search
 java.lang.String searchAlgorithmTipText()
          Returns the tip text for this property.
 void setB(int B)
          set size of blocks (cardinality) over which search is performed
 void setInformationBasedEvaluator(weka.core.SelectedTag newType)
          set evaluator to create univariate ranking
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setRerankMethod(weka.core.SelectedTag newType)
          Set method to use for re-ranking
 void setSearchAlgorithm(weka.attributeSelection.ASSearch m_searchAlgorithm)
          set the search algorithm to use over blocks in ranking
 java.lang.String toString()
          Description of the search
 
Methods inherited from class weka.attributeSelection.ASSearch
forName, getRevision, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

serialVersionUID

static final long serialVersionUID
for serialisation

See Also:
Constant Field Values

m_rerankMethod

protected int m_rerankMethod
Re-rank method


m_B

protected int m_B
Block size


m_informationBasedEvaluator

protected int m_informationBasedEvaluator
Uni-varaite Attribute evaluator respect to the class. This is used to create the first ranking of attributes. It should be an information-based evaluation


m_searchAlgorithm

protected weka.attributeSelection.ASSearch m_searchAlgorithm
search algorithm applied over block to select attributes


m_ASEval

protected weka.attributeSelection.ASEvaluation m_ASEval
attribute set evaluator applied in search


m_searchTime_ms

protected double m_searchTime_ms
Total time (ms) spent in search


m_rerankingTime_ms

protected double m_rerankingTime_ms
Time (ms) spent in re-ranking


m_blocksSearched

protected int m_blocksSearched
Total number of blocks over which search was performed


m_selected

protected int[] m_selected
selected attributes


CMIM

protected static final int CMIM
Fleuret's Conditional Mutual Information Maximization

See Also:
Constant Field Values

MIFS

protected static final int MIFS
Battiti's Mutual Information-Based Feature Selection

See Also:
Constant Field Values

MRMR

protected static final int MRMR
Peng's Max-Relevance and Min-Redundancy

See Also:
Constant Field Values

IG

protected static final int IG
See Also:
Constant Field Values

SU

protected static final int SU
See Also:
Constant Field Values

m_ranking

protected int[] m_ranking
ranking of attributes in decreasing order given m_univariateEvaluator


m_attributes_merits_globalIndexes

protected double[] m_attributes_merits_globalIndexes
merit of each attribute evaluated by m_univariateEvaluator position i referes to attribute i in training data


TAGS_RERANK

public static final weka.core.Tag[] TAGS_RERANK

TAGS_INFORMATION_BASED_EVAL

public static final weka.core.Tag[] TAGS_INFORMATION_BASED_EVAL
Constructor Detail

RerankingSearch

public RerankingSearch()
Method Detail

search

public int[] search(weka.attributeSelection.ASEvaluation ASEval,
                    weka.core.Instances data)
             throws java.lang.Exception
Performs search

Specified by:
search in class weka.attributeSelection.ASSearch
Parameters:
ASEval - the attribute evaluator to guide the search
data - the training instances.
Returns:
an array (not necessarily ordered) of selected attribute indexes
Throws:
java.lang.Exception - if the search can't be completed

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

  -method 
  Specifies the method used to re-ranking attributes
  (default 0: CMIM)
 
  -blockSize 
  Specifies the size of blocks over which search is performed
  (default 20)
 
 -rankingMeasure 
  information-theory-based univariate attribute evaluator to create first ranking
  (default 0: Information Gain)
 
 -search 
 Class name of ASSearch search algorithm to be used over blocks.
 Place any options of the search algorithm LAST on the command line
 following a "--". eg.:
 -search weka.attributeSelection.GreedyStepwise ... -- -C
 

Specified by:
setOptions in interface weka.core.OptionHandler
Throws:
java.lang.Exception

getOptions

public java.lang.String[] getOptions()
get a String[] describing the value set for all options

Specified by:
getOptions in interface weka.core.OptionHandler
Returns:
String[] describing the options

createUnivariateRanking

protected void createUnivariateRanking(weka.core.Instances data)
                                throws java.lang.Exception
It creates a ranking of attributes in decreasing order of merit, given the chosen attribute evaluator. It sets object "ranking" to ranked atts indexes in decreasing order of merit It sets object "ranking_merits" to ranked atts merit evaluated in decreasing oder

Parameters:
data - from which to build evaluator
Throws:
java.lang.Exception

projectBlock

protected weka.core.Instances projectBlock(weka.core.Instances data)
                                    throws java.lang.Exception
It projects data over attributes selected up to know + first m_B attributes in ranking + the class attribute

Parameters:
data - from which to project attributes to create a block
Returns:
Intances object with block over which to perform search
Throws:
java.lang.Exception

getGlobalIndexes

protected int[] getGlobalIndexes(int[] relative,
                                 weka.core.Instances relativeData,
                                 weka.core.Instances globalData)
It converts the indexes of attributes in relative[] refering to relativeData, to the indexes of the refered attributes in globalInstances data.

Parameters:
relative - attributes indexes selected in previous block
relativeData - previous block
globalData - original data with all the attributes
Returns:
the global index of attributes selected in previous block

listOptions

public java.util.Enumeration<weka.core.Option> listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface weka.core.OptionHandler
Returns:
an enumeration of all the available options.

rerank

protected void rerank(weka.core.Instances data)
               throws java.lang.Exception
re-rank remaining attributes in m_ranking[] given m_selected[]

Parameters:
data - original training data
Throws:
java.lang.Exception

getStartSetString

protected java.lang.String getStartSetString(int numberSelected)
Creates a string of attributes indexes separated by commas. Since attributes selected in the preivous block are appended at the beginning of the new block, we know the relative indexes start by 0.

Parameters:
numberSelected - cardinaility of attributes selected in previous block
Returns:
String with start set indexes for next block

different

protected boolean different(int[] a,
                            int[] b)
check if two arrays contain the same values, in any order

Parameters:
a - []
b - []
Returns:
true if a[] and b[] do not have the same integer values

isIn

protected boolean isIn(int n,
                       int[] array)
Check if value n is in array

Parameters:
n -
array -
Returns:
true if n is in array in any position

getSearchAlgorithm

public weka.attributeSelection.ASSearch getSearchAlgorithm()
                                                    throws java.lang.Exception
Get a deep copy of the search algorithm used over blocks

Returns:
deep copy of current m_searchAlgorithm
Throws:
java.lang.Exception

setSearchAlgorithm

public void setSearchAlgorithm(weka.attributeSelection.ASSearch m_searchAlgorithm)
set the search algorithm to use over blocks in ranking

Parameters:
m_searchAlgorithm - new search algorithm to be used over blocks

getRerankMethod

public weka.core.SelectedTag getRerankMethod()
Get method used for re-ranking

Returns:
SelectedTag indicating the type of re-ranking

setRerankMethod

public void setRerankMethod(weka.core.SelectedTag newType)
                     throws java.lang.Exception
Set method to use for re-ranking

Parameters:
newType - the type of re-rerank method desired
Throws:
java.lang.Exception

getSearchTime_ms

public double getSearchTime_ms()
total time in milliseconds spent during the search

Returns:
double search time in milliseconds

getRerankingTime_ms

public double getRerankingTime_ms()
time in milliseconds spent during the search in re-ranking computations

Returns:
double time spent in re-ranking

getBlocksSearched

public int getBlocksSearched()
get number of blocks attributes in ranking over which search has been performed. This is the number of re-rankings performed + 1

Returns:
int number of blocks searched

getB

public int getB()
get size of blocks over which search is performed

Returns:
m_B size (cardinality) of blocks over which search is performed

setB

public void setB(int B)
set size of blocks (cardinality) over which search is performed

Parameters:
B -

getInformationBasedEvaluator

public weka.core.SelectedTag getInformationBasedEvaluator()
* Get method used to crate first univariate ranking

Returns:
SelectedTag evaluator used to generate the original ranking

setInformationBasedEvaluator

public void setInformationBasedEvaluator(weka.core.SelectedTag newType)
                                  throws java.lang.Exception
set evaluator to create univariate ranking

Parameters:
newType - the information-based univariate evaluation to create first ranking
Throws:
java.lang.Exception

resetOptions

public void resetOptions()
reset all options to their default values


rerankMethodTipText

public java.lang.String rerankMethodTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getSelected

public int[] getSelected()
get list of attributes selected in search

Returns:
int[] of attributes selected

bTipText

public java.lang.String bTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

informationBasedEvaluatorTipText

public java.lang.String informationBasedEvaluatorTipText()
Returns the tip text for m_univariteEvaluator

Returns:
tip text for displaying in the explorer/experimenter gui

searchAlgorithmTipText

public java.lang.String searchAlgorithmTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

globalInfo

public java.lang.String globalInfo()
Returns a string describing this search method

Returns:
a description of the search suitable for displaying in the explorer/experimenter gui

toString

public java.lang.String toString()
Description of the search

Overrides:
toString in class java.lang.Object
Returns:
String

rerankCMIM

protected void rerankCMIM(weka.core.Instances data)
                   throws java.lang.Exception
Parameters:
data - from which to compute conditional mutual informations Modifies m_ranking ordering attributes in decreasing order of Fleuret's CMIM approximation of I(Xi;C|m_selected) for all Xi in m_ranking. CMIM approximates this value with formula: max_Xi min_Xj I(Xi;C|m_selected) for all Xi in m_ranking and all Xj in m:selected
Throws:
java.lang.Exception

rerankMIFS_MRMR

protected void rerankMIFS_MRMR(weka.core.Instances data,
                               double factor)
                        throws java.lang.Exception
Parameters:
data - to compute mutual information values
factor - double value used to multiply by mutual informations Modifies m_ranking ordering attributes in decreasing order of Battiti's MIFS approximation of I(Xi;C|m_selected) for all Xi in m_ranking, and all Xj in m_selected. MIFS approximates this value with formula: I(Xi,C) - (0.5* sum I(Xi,Xj)) MRMR approximates this value with formula: I(Xi,C) - (1/|S| * sum I(Xi,Xj)) thus, the only difference between both methods is the multiplicaton factor
Throws:
java.lang.Exception

minMaxOrder

protected int[] minMaxOrder(double[][] I_XiC_givenXj)
Parameters:
I_XiC_givenXj - conditional information I(X;C|m_selected) for all Xi in training data
Returns:
int[] indexes in first dimension of I_XiC_givenXj , sorted in decreasing order of max_Xi min_Xj I(Xi;C|m_selected)

getConditionalMutualInformation

protected double getConditionalMutualInformation(int posX,
                                                 int posY,
                                                 int posZ,
                                                 weka.core.Instances data)
                                          throws java.lang.Exception
Computes I(X;Y|Z)

Parameters:
posX - index of att X
posY - index of att Y
posZ - index of conditioning attribute Z
data - training data
Returns:
double conditional mutual information I(X;Y|Z)
Throws:
java.lang.Exception

getMutualInformation

protected double getMutualInformation(int posX,
                                      int posY,
                                      weka.core.Instances data)
Computes I(X;Y)

Parameters:
posX - index of att X
posY - index of att Y
data - training data
Returns:
double mutual information I(X;Y)
Throws:
java.lang.Exception

getJointXY

protected double[][] getJointXY(int pX,
                                int pY,
                                weka.core.Instances data)
Compute joint probability por attribute pX and pY in data

Returns:
double[][] joint probabilities

getJointXYZ

protected double[][][] getJointXYZ(int pX,
                                   int pY,
                                   int pZ,
                                   weka.core.Instances data)
Compute joint probability por attribute pX, pY and pZ in data

Returns:
double[][] joint probabilities

getMarginalProb

protected double[] getMarginalProb(int pos,
                                   weka.core.Instances data)
Computes marginal probability for attribute with index pos, in data

Returns:
double[] marginal probabilities of index pos