demo30.cpp File Reference

Example 30: Feature selection on binary and/or natural-valued data. More...

#include <boost/smart_ptr.hpp>
#include <exception>
#include <iostream>
#include <cstdlib>
#include <string>
#include <vector>
#include "error.hpp"
#include "global.hpp"
#include "subset.hpp"
#include "data_intervaller.hpp"
#include "data_splitter.hpp"
#include "data_splitter_cv.hpp"
#include "data_splitter_randrand.hpp"
#include "data_scaler.hpp"
#include "data_scaler_void.hpp"
#include "data_accessor_splitting_memTRN.hpp"
#include "data_accessor_splitting_memARFF.hpp"
#include "criterion_wrapper.hpp"
#include "classifier_multinom_naivebayes.hpp"
#include "seq_step_straight.hpp"
#include "search_seq_dos.hpp"

Include dependency graph for demo30.cpp:

Functions
int	main ()

Detailed Description

Example 30: Feature selection on binary and/or natural-valued data.

Function Documentation

int main ( )

Example 30: Feature selection on binary and/or natural-valued data.

The mushroom dataset from UCI repository is originally 22-dimensional, with categorical features. Here we use the transformed dataset with features expanded to 125 binary ones. Features are selected using the DOS algorithm (search extent restricted by Delta=10) with the criterion being Multinomial Naive-like Bayes wrapper classification accuracy. 50% of data is randomly chosen to form the training dataset (remains the same for all the time), 40% of data is randomly chosen to be used at the end for validating the classification performance on the finally selected subspace. (selected training and test data parts are disjunct and altogether cover 90% of the original data). Classification accuracy (i.e, FS wrapper criterion value) is estimated on the training part of data by means of 5-fold cross-validation.

Note:: This approach is applicable to problems of high dimensionality, but may prove too slow if the dimensionality exceeds thousands or tens of thousands due to the direct use of DOS, starting from zero cardinality. For approaches more suitable for very high dimensional problems see Example 31: Individual ranking (BIF) in very high-dimensional feature selection, Example 32t: Threaded individual ranking (BIF) with SVM wrapper in very high-dimensional feature selection, Example 33: Oscillating Search in very high-dimensional feature selection., Example 33t: Threaded Oscillating Search in very high-dimensional feature selection., Example 34: Dependency-Aware Feature Ranking (DAF0). and Example 35t: Dependency-Aware Feature Ranking (DAF1) to enable Wrapper based FS on very-high-dimensional data..

References FST::Search_DOS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::search(), FST::Search_DOS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::set_delta(), and FST::Search< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::set_output_detail().

demo30.cpp File Reference

Functions

Detailed Description

Function Documentation

Example 30: Feature selection on binary and/or natural-valued data.