Example 51: (DOS) Result regularization using secondary criterion. More...
#include <boost/smart_ptr.hpp>
#include <exception>
#include <iostream>
#include <cstdlib>
#include <string>
#include <vector>
#include "error.hpp"
#include "global.hpp"
#include "subset.hpp"
#include "data_intervaller.hpp"
#include "data_splitter.hpp"
#include "data_splitter_5050.hpp"
#include "data_splitter_cv.hpp"
#include "data_scaler.hpp"
#include "data_scaler_void.hpp"
#include "data_accessor_splitting_memTRN.hpp"
#include "data_accessor_splitting_memARFF.hpp"
#include "criterion_wrapper.hpp"
#include "criterion_subsetsize.hpp"
#include "criterion_negative.hpp"
#include "distance_euclid.hpp"
#include "classifier_knn.hpp"
#include "seq_step_straight.hpp"
#include "search_seq_dos.hpp"
#include "result_tracker_regularizer.hpp"
Functions | |
int | main () |
Example 51: (DOS) Result regularization using secondary criterion.
int main | ( | ) |
It is known that feature selection may over-fit. As in the case of over-trained classifiers, over-selected feature subsets may generalize poorly. This unwanted effect can lead to serious degradation of generalization ability, i.e., model or decision-rule behavior on previously unknown data. It has been suggested (Raudys: Feature Over-Selection, LNCS 4109, 2006, or Somol et al., ICPR 2010) that preferring a subset with slightly-worse-than-maximal criterion value can actually improve generalization. FST3 makes this possible through result tracking and subsequent selection of alternative solution by means of secondary criterion maximization. In this example we show a 3-Nearest Neighbor Wrapper based feature selection process, where the final result is eventually chosen among a group of solutions close enough to the achieved maximum, so as to optimize the secondary criterion. The group of solutions to select from is defined by means of a user-selected margin value (permitted primary criterion value difference from the known maximum). In this case we show that even the simplest secondary criterion (mere preference of smaller subsets) can improve classifcation accuracy on previously unknown data.
References FST::Search_DOS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::search(), FST::Search_DOS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::set_delta(), and FST::Search< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::set_output_detail().