Feature Selection ToolboxFST3 Library / Documentation

demo62.cpp File Reference

Example 62: (Missing data substitution) Combined feature subset contents, size and SVM parameters optimization. More...

#include <boost/smart_ptr.hpp>
#include <exception>
#include <iostream>
#include <cstdlib>
#include <string>
#include <vector>
#include "error.hpp"
#include "global.hpp"
#include "subset.hpp"
#include "data_intervaller.hpp"
#include "data_splitter.hpp"
#include "data_splitter_cv.hpp"
#include "data_splitter_randrand.hpp"
#include "data_scaler.hpp"
#include "data_scaler_void.hpp"
#include "data_accessor_splitting_memTRN.hpp"
#include "data_accessor_splitting_memARFF.hpp"
#include "criterion_wrapper.hpp"
#include "classifier_svm.hpp"
#include "seq_step_straight.hpp"
#include "search_seq_dos.hpp"
Include dependency graph for demo62.cpp:

Functions

int main ()

Detailed Description

Example 62: (Missing data substitution) Combined feature subset contents, size and SVM parameters optimization.


Function Documentation

int main (  ) 

Example 62: (Missing data substitution) Combined feature subset contents, size and SVM parameters optimization.

In many tasks some of the values in data set are missing. Provided such values are marked by a unique value, FST can handle such data -- in the pre-processing phase each missing value is substituted by the average value over all valid valued per feature. This example processess incomplete data in the same way as in Example 23: Combined feature subset contents, size and SVM parameters optimization. by means of a repeated sequence of two consecutive operations - feature subset search followed by SVM parameter optimization for the current subset. Features are selected using DOS algorithm and SVM (with sigmoid kernel) wrapper classification accuracy as FS criterion. 50% of data is randomly chosen to form the training dataset (remains the same for all the time), 40% of data is randomly chosen to be used at the end for validating the classification performance on the finally selected subspace (training and test data parts are disjunct and altogether cover 90% of the original data). The training data part is accessed by means of 3-fold cross-validation in the course of search. The optimization process consists of repeated consecutive calls of two procedures: SVM parameter optimization followed by DOS feature subset optimization. (SVM parameters are optimized on the currently best known feature subset, which is then used to initialize next DOS search). The calls are repeated as long as better SVM performance (on the training data) is achieved.

References FST::Search_DOS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::search(), and FST::Search_DOS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::set_delta().


Generated on Thu Mar 31 11:36:54 2011 for FST3Library by  doxygen 1.6.1