Feature Selection ToolboxFST3 Library / Documentation

demo35t.cpp File Reference

Example 35t: Dependency-Aware Feature Ranking (DAF1) to enable Wrapper based FS on very-high-dimensional data. More...

#include <boost/smart_ptr.hpp>
#include <exception>
#include <iostream>
#include <cstdlib>
#include <string>
#include <vector>
#include "error.hpp"
#include "global.hpp"
#include "subset.hpp"
#include "data_intervaller.hpp"
#include "data_splitter.hpp"
#include "data_splitter_5050.hpp"
#include "data_splitter_cv.hpp"
#include "data_splitter_randrand.hpp"
#include "data_scaler.hpp"
#include "data_scaler_void.hpp"
#include "data_accessor_splitting_memTRN.hpp"
#include "data_accessor_splitting_memARFF.hpp"
#include "criterion_wrapper.hpp"
#include "classifier_svm.hpp"
#include "search_monte_carlo_threaded.hpp"
#include "result_tracker_feature_stats.hpp"
Include dependency graph for demo35t.cpp:

Functions

int main ()

Detailed Description

Example 35t: Dependency-Aware Feature Ranking (DAF1) to enable Wrapper based FS on very-high-dimensional data.


Function Documentation

int main (  ) 

Example 35t: Dependency-Aware Feature Ranking (DAF1) to enable Wrapper based FS on very-high-dimensional data.

Dependency-Aware Feature Ranking (DAF) is a novel approach to feature selection especially suitable for very-high-dimensional problems and over-fitting-prone feature selection scenarios. DAF evaluates a chosen criterion on a series of probe subsets to eventually rank features according to their estimated contextual quality. Note that this approach makes it possible to apply even the complex Wrapper feature selection criteria in problems of very-high-dimensionality. DAF has been shown capable of overperforming BIF quite significantly in many cases in terms of the quality of selected feature subsets, yet its stability and resistance against over-fitting remains on par with BIF. For details see UTIA Technical Report No. 2295 from February 2011. We demonstrate two slightly different forms of DAF (DAF0 and DAF1) on examples Example 34: Dependency-Aware Feature Ranking (DAF0). and Example 35t: Dependency-Aware Feature Ranking (DAF1) to enable Wrapper based FS on very-high-dimensional data.. Example34 illustrates the approach with k-NN accuracy wrapper criterion. This example 35t illustrates DAF with SVM wrapper applied to very-high-dimensional (greater than 10000-dimensional) text categorization problem.

Note:
DAF (as BIF) ranks features but does not determine final subset size.
To achieve reasonable results in case of extreme dimensionality like here DAF requires at least hours of computation. (Standard wrapper based methods would need several orders more time in similar setting.) It is beneficial to allow for as many probes as possible. For instance, setting max_search_time to 20 hours instead of 200 minutes as seen below improves the final accuracy on independent test data roughly by 3%.
Warning:
This example needs large RAM memory (4GB may not be enough).

References FST::Search< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::enable_result_tracking(), FST::Search_Monte_Carlo_Threaded< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, max_threads >::search(), FST::Search_Monte_Carlo_Threaded< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, max_threads >::set_cardinality_randomization(), FST::Search< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::set_output_detail(), and FST::Search_Monte_Carlo_Threaded< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, max_threads >::set_stopping_condition().


Generated on Thu Mar 31 11:36:16 2011 for FST3Library by  doxygen 1.6.1