Example 33: Oscillating Search in very high-dimensional feature selection. More...
#include <boost/smart_ptr.hpp>
#include <exception>
#include <iostream>
#include <cstdlib>
#include <string>
#include <vector>
#include "error.hpp"
#include "global.hpp"
#include "subset.hpp"
#include "data_intervaller.hpp"
#include "data_splitter.hpp"
#include "data_splitter_randrand.hpp"
#include "data_scaler.hpp"
#include "data_scaler_void.hpp"
#include "data_accessor_splitting_memTRN.hpp"
#include "data_accessor_splitting_memARFF.hpp"
#include "criterion_multinom_bhattacharyya.hpp"
#include "criterion_wrapper.hpp"
#include "classifier_multinom_naivebayes.hpp"
#include "search_bif.hpp"
#include "seq_step_straight.hpp"
#include "search_seq_os.hpp"
Functions | |
int | main () |
Example 33: Oscillating Search in very high-dimensional feature selection.
int main | ( | ) |
Very high-dimensional feature selection in text categorization, with dimensionality in the order of 10000 or 100000. The standard approach is BIF, yet we show here that a non-trivial search procedure (OS) can be feasible. Here OS is applied in its fastest form (delta=1), initialized by means of BIF. We use Multinomial Bhattacharyya distance as the feature selection criterion (it has been shown capable of overperforming traditional tools like Information Gain etc., cf. Novovicova et al., LNCS 4109, 2006). Randomly sampled 50% of data is used for multinomial model parameter estimation to be used in the actual feature selection process, another (disjunct) 40% of data is randomly sampled for testing. The selected subset is eventually used for validation; multinomial Naive Bayes classifier is trained on the training data on the selected subset and classification accuracy is finally estimated on the test data.
References FST::Search_OS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::search(), FST::Search_BIF< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::search(), and FST::Search_OS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::set_delta().