demo51.cpp File Reference

Example 51: (DOS) Result regularization using secondary criterion. More...

#include <boost/smart_ptr.hpp>
#include <exception>
#include <iostream>
#include <cstdlib>
#include <string>
#include <vector>
#include "error.hpp"
#include "global.hpp"
#include "subset.hpp"
#include "data_intervaller.hpp"
#include "data_splitter.hpp"
#include "data_splitter_5050.hpp"
#include "data_splitter_cv.hpp"
#include "data_scaler.hpp"
#include "data_scaler_void.hpp"
#include "data_accessor_splitting_memTRN.hpp"
#include "data_accessor_splitting_memARFF.hpp"
#include "criterion_wrapper.hpp"
#include "criterion_subsetsize.hpp"
#include "criterion_negative.hpp"
#include "distance_euclid.hpp"
#include "classifier_knn.hpp"
#include "seq_step_straight.hpp"
#include "search_seq_dos.hpp"
#include "result_tracker_regularizer.hpp"

Include dependency graph for demo51.cpp:

Functions
int	main ()

Detailed Description

Example 51: (DOS) Result regularization using secondary criterion.

Function Documentation

int main ( )

Example 51: (DOS) Result regularization using secondary criterion.

It is known that feature selection may over-fit. As in the case of over-trained classifiers, over-selected feature subsets may generalize poorly. This unwanted effect can lead to serious degradation of generalization ability, i.e., model or decision-rule behavior on previously unknown data. It has been suggested (Raudys: Feature Over-Selection, LNCS 4109, 2006, or Somol et al., ICPR 2010) that preferring a subset with slightly-worse-than-maximal criterion value can actually improve generalization. FST3 makes this possible through result tracking and subsequent selection of alternative solution by means of secondary criterion maximization. In this example we show a 3-Nearest Neighbor Wrapper based feature selection process, where the final result is eventually chosen among a group of solutions close enough to the achieved maximum, so as to optimize the secondary criterion. The group of solutions to select from is defined by means of a user-selected margin value (permitted primary criterion value difference from the known maximum). In this case we show that even the simplest secondary criterion (mere preference of smaller subsets) can improve classifcation accuracy on previously unknown data.

References FST::Search_DOS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::search(), FST::Search_DOS< RETURNTYPE, DIMTYPE, SUBSET, CRITERION, EVALUATOR >::set_delta(), and FST::Search< RETURNTYPE, DIMTYPE, SUBSET, CRITERION >::set_output_detail().

demo51.cpp File Reference

Functions

Detailed Description

Function Documentation

Example 51: (DOS) Result regularization using secondary criterion.