Feature Selection ToolboxFST3 Library / Documentation

FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER > Class Template Reference

partly-abstract class, defines support for data splitting More...

#include <data_accessor_splitting.hpp>

Inheritance diagram for FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >:
Collaboration diagram for FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >:

List of all members.

Classes

class  DataSplit
 Data splitting support structure; holds one set of intervals (train, test) per each splitting depth and (data)class. More...

Public Types

typedef boost::shared_ptr
< Data_Splitter
< INTERVALCONTAINER, IDXTYPE > > 
PSplitter
typedef boost::shared_ptr
< std::vector< PSplitter > > 
PSplitters
typedef const DATATYPE * PPattern

Public Member Functions

 Data_Accessor_Splitting (const PSplitters _dsp)
virtual unsigned int getNoOfClasses () const
 returns number of classes
virtual unsigned int getNoOfFeatures () const
 returns data dimensionality
virtual IDXTYPE getClassSize (const unsigned int c) const
 returns size (number of samples in) of class c
virtual IDXTYPE getClassSizeSum () const
 returns summed size (number of samples in) of all classes, i.e., no. of all patterns in data
virtual void setClass (const int c)
 sets active class -> from now on only data from class c will be considered
virtual int getClass () const
 returns active class
void setSplittingDepth (const unsigned int depth)
unsigned int getSplittingDepth () const
virtual unsigned int getNoOfSplits () const
 data access iteration (to support, e.g., loops in cross-validation)
virtual bool getFirstSplit ()
 data access iteration (to support, e.g., loops in cross-validation)
virtual bool getNextSplit ()
 data access iteration (to support, e.g., loops in cross-validation)
virtual unsigned int getSplitIndex () const
 data access iteration (to support, e.g., loops in cross-validation)
virtual IDXTYPE getNoOfBlocks (const DataPart ofwhat) const
virtual bool getFirstBlock (const DataPart ofwhat, PPattern &firstpattern, IDXTYPE &patterns, const unsigned int loopdepth=0)=0
 returns pointer to first consecutive block of data of requested DataPart type in the current split (access iteration)
virtual bool getNextBlock (const DataPart ofwhat, PPattern &firstpattern, IDXTYPE &patterns, const unsigned int loopdepth=0)=0
 returns pointer to next consecutive block of data of requested DataPart type in the current split (access iteration)
virtual IDXTYPE getBlockIndex (const unsigned int loopdepth=0) const
 returns index of the current consecutive block of data of requested DataPart type in the current split (access iteration)
virtual IDXTYPE getNoOfPatterns (const DataPart ofwhat) const
 returns number of patterns in all consecutive blocks of data of requested DataPart type in the current split (access iteration)
virtual void substitute (const DataPart source, const DataPart target)
 enables change of meaning of DataPart types, for use in specialized data access scenarios like in bias predicting wrappers
virtual void resubstitute ()
 resets standard DataPart types' meaning
virtual std::ostream & print (std::ostream &os) const

Protected Types

typedef std::vector< unsigned int > CLASSSIZES
typedef const Data_Interval
< IDXTYPE > * 
DATAINTERVAL
typedef boost::shared_ptr
< INTERVALCONTAINER > 
PIntervaller

Protected Member Functions

 Data_Accessor_Splitting (const Data_Accessor_Splitting &da)
void initialize (const unsigned int _features, const CLASSSIZES &_classes)
DataPart mappedDataPart (const DataPart ofwhat) const
virtual bool getFirstBlock (const DataPart ofwhat, DATAINTERVAL &tmp, const unsigned int loopdepth=0)
 returns Data_Interval record representing the first consecutive block of data of requested DataPart type in the current split (access iteration)
virtual bool getNextBlock (const DataPart ofwhat, DATAINTERVAL &tmp, const unsigned int loopdepth=0)
 returns Data_Interval record representing the next consecutive block of data of requested DataPart type in the current split (access iteration)
bool is_initialized () const
void assert_splits (const int splitting_check=-1) const

Protected Attributes

CLASSSIZES classes
unsigned int features
DataPart mappedTRAIN
DataPart mappedTEST
DataPart mappedTRAINTEST
DataPart mappedALL
PSplitters dsp
std::vector< std::vector
< DataSplit > > 
splits
 one set of splitters per each splitting depth and class
std::vector< IDXTYPE > enum_split
 current split.. 0~none
std::vector< std::vector
< IDXTYPE > > 
enum_block
 current block loop.. 0~none
std::vector< std::vector
< DataPart > > 
tt_phase
 in current block loop.. for DataPart==TRAINTEST indicates: 0-no loop, 1-train loop, 2-test loop
unsigned int splitting_depth
 switch between inner and outer loop get*Train*, get*Test* functionality
int active_class
 denotes from which class the get*Train* get*Validate* get*Test* methods return patterns, to be set using setClass()
bool _initialize_called

Detailed Description

template<typename DATATYPE, typename IDXTYPE, class INTERVALCONTAINER>
class FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >

partly-abstract class, defines support for data splitting

Data structures in Data_Accessor_Splitting are directly used in the splitting mechanism that enables structured access to data. In order to keep the implementation of data splitters (specializations of Data_Splitter) as simple as possible for the user, we moved most of the technicalities here. Correct state of key data structures in Data_Accessor_Splitting is as follows:

  • dsp represents a list of Data_Splitters while each splitter keeps for each class two pointers: a pointer to the list of train and test data (lists of data intervals). The splitters are, however, not the owners of the actual train and test lists, these are allocated within Data_Accessor_Splitting.

The splitting mechanism needs not just one train-test data structure pair, but two - the second pair denoted _reduced_train and _reduced_test. The "reduced" pair is constructed from the "base" pair separately in each splitting level, so as to correctly represent the subset of data defined by the respective splitter (each deeper level further reduces access to data visible in the preceeding level). The actual access to data through getFirstBlock() and getNextBlock() is commanded by data intervals stored in the "reduced" pair of interval lists only.

The two pairs of interval lists exist separately for each splitting level and each data class. In Data_Accessor_Splitting they are collected in the DataSplit subclass, of which the required number of instances is kept in the "splits" container. In correct representation "splits" must contain [number of classes]*[number of splitting levels] DataSplit instances.

The DataSplit that represent top splitting level differ from the deeper level - the "reduced" pair of lists is actually just referencing the "base" pair. This is because data indexes as produced by splitters are valid indexes usable to access the data. In deeper splitting levels this is not so because splitter indexes must be treated are relative to the data possibly restricted in higher level. Transforming the relative indexes to absolute indexes is achieved through the "reduce" method implemented in Data_Intervaller. In non-top splitting levels before data can be accessed, the "base" indexes/intervals are first transformed using the "reduce" method with the result stored in "_reduced" train and test lists.

The "base" train and test lists allocated here in Data_Accessor_Splitting need to be interlinked with respective data splitters. The splitters do not hold any allocated structures, they re-direct their output to the "base" train and test lists kept in Data_Accessor_Splitting, to enable data accessing routines to transform the indexes by means of "reduce" whenever needed and subsequently to access the correct subset of data.


Member Function Documentation

template<typename DATATYPE , typename IDXTYPE , class INTERVALCONTAINER >
IDXTYPE FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::getNoOfBlocks ( const DataPart  ofwhat  )  const [inline, virtual]
Note:
in block accessing methods use different loopdepth whenever two or more loops (of any DataPart type) should overlap, otherwise the behaviour is undefined
Warning:
getNoOfBlocks is not to be relied upon due to possible limitations in some implementations

Implements FST::Data_Accessor< DATATYPE, IDXTYPE >.

References FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::active_class, FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::splits, and FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::splitting_depth.

template<typename DATATYPE , typename IDXTYPE , class INTERVALCONTAINER >
void FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::initialize ( const unsigned int  _features,
const CLASSSIZES &  _classes 
) [inline, protected]

sets-up memory structures needed in the splitting mechanism data access is not needed here, the structures work with indexes only - the only information needed is dimensionality and sizes of data classes

References FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::splits.

Referenced by FST::Data_Accessor_Splitting_Mem< DATATYPE, IDXTYPE, INTERVALCONTAINER >::initialize().


The documentation for this class was generated from the following file:

Generated on Thu Mar 31 11:37:58 2011 for FST3Library by  doxygen 1.6.1