Example 31: Individual ranking (BIF) in very high-dimensional feature selection. More...
#include <boost/smart_ptr.hpp>
#include <exception>
#include <iostream>
#include <cstdlib>
#include <string>
#include <vector>
#include "error.hpp"
#include "global.hpp"
#include "subset.hpp"
#include "data_intervaller.hpp"
#include "data_splitter.hpp"
#include "data_splitter_randrand.hpp"
#include "data_scaler.hpp"
#include "data_scaler_void.hpp"
#include "data_accessor_splitting_memTRN.hpp"
#include "data_accessor_splitting_memARFF.hpp"
#include "criterion_multinom_bhattacharyya.hpp"
#include "criterion_wrapper.hpp"
#include "classifier_multinom_naivebayes.hpp"
#include "search_bif.hpp"
#include "seq_step_straight.hpp"
#include "search_seq_os.hpp"
Functions | |
int | main () |
Example 31: Individual ranking (BIF) in very high-dimensional feature selection.
int main | ( | ) |
Very high-dimensional feature selection is applied, e.g., in text categorization, with dimensionality in the order of 10000 or 100000. Individual feature ranking (or Best Individual Feature, BIF) is the most commonly applied approach because of its key advantages -- speed and high stability. In this example we illustrate a less common but very effective approach based on the Multinomial Bhattacharyya distance feature selection criterion. Multinomial Bhattacharyya has been shown capable of overperforming traditional tools like Information Gain etc., cf. Novovicova et al., LNCS 4109, 2006. Randomly sampled 50% of data is used here for multinomial model parameter estimation to be used in the actual feature selection process, another (disjunct) 40% of data is randomly sampled for testing. The selected subset is eventually used for validation; multinomial Naive Bayes classifier is trained on the training data on the selected subset and classification accuracy is finally estimated on the test data.
References FST::Search_BIF< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::search(), and FST::Search< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::set_output_detail().