Example 35t: Dependency-Aware Feature Ranking (DAF1) to enable Wrapper based FS on very-high-dimensional data. More...
#include <boost/smart_ptr.hpp>
#include <exception>
#include <iostream>
#include <cstdlib>
#include <string>
#include <vector>
#include "error.hpp"
#include "global.hpp"
#include "subset.hpp"
#include "data_intervaller.hpp"
#include "data_splitter.hpp"
#include "data_splitter_5050.hpp"
#include "data_splitter_cv.hpp"
#include "data_splitter_randrand.hpp"
#include "data_scaler.hpp"
#include "data_scaler_void.hpp"
#include "data_accessor_splitting_memTRN.hpp"
#include "data_accessor_splitting_memARFF.hpp"
#include "criterion_wrapper.hpp"
#include "classifier_svm.hpp"
#include "search_monte_carlo_threaded.hpp"
#include "result_tracker_feature_stats.hpp"
Functions | |
int | main () |
Example 35t: Dependency-Aware Feature Ranking (DAF1) to enable Wrapper based FS on very-high-dimensional data.
int main | ( | ) |
Dependency-Aware Feature Ranking (DAF) is a novel approach to feature selection especially suitable for very-high-dimensional problems and over-fitting-prone feature selection scenarios. DAF evaluates a chosen criterion on a series of probe subsets to eventually rank features according to their estimated contextual quality. Note that this approach makes it possible to apply even the complex Wrapper feature selection criteria in problems of very-high-dimensionality. DAF has been shown capable of overperforming BIF quite significantly in many cases in terms of the quality of selected feature subsets, yet its stability and resistance against over-fitting remains on par with BIF. For details see UTIA Technical Report No. 2295 from February 2011. We demonstrate two slightly different forms of DAF (DAF0 and DAF1) on examples Example 34: Dependency-Aware Feature Ranking (DAF0). and Example 35t: Dependency-Aware Feature Ranking (DAF1) to enable Wrapper based FS on very-high-dimensional data.. Example34 illustrates the approach with k-NN accuracy wrapper criterion. This example 35t illustrates DAF with SVM wrapper applied to very-high-dimensional (greater than 10000-dimensional) text categorization problem.
References FST::Search< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::enable_result_tracking(), FST::Search_Monte_Carlo_Threaded< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, max_threads >::search(), FST::Search_Monte_Carlo_Threaded< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, max_threads >::set_cardinality_randomization(), FST::Search< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::set_output_detail(), and FST::Search_Monte_Carlo_Threaded< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, max_threads >::set_stopping_condition().