Feature Selection ToolboxFST3 Library / Documentation

demo31.cpp File Reference

Example 31: Individual ranking (BIF) in very high-dimensional feature selection. More...

#include <boost/smart_ptr.hpp>
#include <exception>
#include <iostream>
#include <cstdlib>
#include <string>
#include <vector>
#include "error.hpp"
#include "global.hpp"
#include "subset.hpp"
#include "data_intervaller.hpp"
#include "data_splitter.hpp"
#include "data_splitter_randrand.hpp"
#include "data_scaler.hpp"
#include "data_scaler_void.hpp"
#include "data_accessor_splitting_memTRN.hpp"
#include "data_accessor_splitting_memARFF.hpp"
#include "criterion_multinom_bhattacharyya.hpp"
#include "criterion_wrapper.hpp"
#include "classifier_multinom_naivebayes.hpp"
#include "search_bif.hpp"
#include "seq_step_straight.hpp"
#include "search_seq_os.hpp"
Include dependency graph for demo31.cpp:

Functions

int main ()

Detailed Description

Example 31: Individual ranking (BIF) in very high-dimensional feature selection.


Function Documentation

int main (  ) 

Example 31: Individual ranking (BIF) in very high-dimensional feature selection

Very high-dimensional feature selection is applied, e.g., in text categorization, with dimensionality in the order of 10000 or 100000. Individual feature ranking (or Best Individual Feature, BIF) is the most commonly applied approach because of its key advantages -- speed and high stability. In this example we illustrate a less common but very effective approach based on the Multinomial Bhattacharyya distance feature selection criterion. Multinomial Bhattacharyya has been shown capable of overperforming traditional tools like Information Gain etc., cf. Novovicova et al., LNCS 4109, 2006. Randomly sampled 50% of data is used here for multinomial model parameter estimation to be used in the actual feature selection process, another (disjunct) 40% of data is randomly sampled for testing. The selected subset is eventually used for validation; multinomial Naive Bayes classifier is trained on the training data on the selected subset and classification accuracy is finally estimated on the test data.

References FST::Search_BIF< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::search(), and FST::Search< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::set_output_detail().


Generated on Thu Mar 31 11:36:04 2011 for FST3Library by  doxygen 1.6.1