Example 30: Feature selection on binary and/or natural-valued data. More...
#include <boost/smart_ptr.hpp>
#include <exception>
#include <iostream>
#include <cstdlib>
#include <string>
#include <vector>
#include "error.hpp"
#include "global.hpp"
#include "subset.hpp"
#include "data_intervaller.hpp"
#include "data_splitter.hpp"
#include "data_splitter_cv.hpp"
#include "data_splitter_randrand.hpp"
#include "data_scaler.hpp"
#include "data_scaler_void.hpp"
#include "data_accessor_splitting_memTRN.hpp"
#include "data_accessor_splitting_memARFF.hpp"
#include "criterion_wrapper.hpp"
#include "classifier_multinom_naivebayes.hpp"
#include "seq_step_straight.hpp"
#include "search_seq_dos.hpp"
Functions | |
int | main () |
Example 30: Feature selection on binary and/or natural-valued data.
int main | ( | ) |
The mushroom dataset from UCI repository is originally 22-dimensional, with categorical features. Here we use the transformed dataset with features expanded to 125 binary ones. Features are selected using the DOS algorithm (search extent restricted by Delta=10) with the criterion being Multinomial Naive-like Bayes wrapper classification accuracy. 50% of data is randomly chosen to form the training dataset (remains the same for all the time), 40% of data is randomly chosen to be used at the end for validating the classification performance on the finally selected subspace. (selected training and test data parts are disjunct and altogether cover 90% of the original data). Classification accuracy (i.e, FS wrapper criterion value) is estimated on the training part of data by means of 5-fold cross-validation.
References FST::Search_DOS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::search(), FST::Search_DOS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::set_delta(), and FST::Search< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::set_output_detail().