Feature Selection ToolboxFST3 Library / History

Version history

  • 3.1.1.beta
    • Fixed a bug in Sparse ARFF filter that prevented correct Sparse ARFF files from being read.
    • Standard ARFF filter is now less sensitive to header formatting and accepts more ARFF files straight away.
      Remark: Note that there is no difference between FST 3.1.0 and 3.1.1 except in ARFF filter (_src_dataio/data_file_ARFF.cpp) and Reuters ARFF sample data.
  • 3.1.0.beta
    • optimal Branch & Bound methods
      • BBB, Basic Branch & Bound
      • IBB, Improved Branch & Bound
      • BBPP, Branch & Bound with Partial Prediction (averaging predictor)
      • FBB, Fast Branch & Bound (averaging predictor)
    • DAF, Dependency-Aware Feature Ranking, is a new highly efficient method for very-high-dimensional FS; unlike BIF it does not ignore contextual information and, consequently, is capable of yielding considerably better results (enables wrapper-based feature selection with dimensionality in the order of 105-106; works with arbitrary wrapper)
      • DAF0 (standard)
      • DAF1 (normalized)
    • SFRS/SBRS - Sequential Retreating Search algorithm is related to Floating Search but more thorough. Suitable also for use with secondary criterion (result regularization)
    • 'generalized' variants of all sequential methods enabling more thorough search by testing feature g-tuples instead of just single features per step (see 1982 Devijver Kittler book)
      • (G)SFS, (G)SBS,
      • (G)SFFS, (G)SBFS,
      • (G)OS,
      • (G)DOS,
      • (G)SFRS, (G)SBRS
    • all sequential methods now allow start from arbitrary subset (useful for tuning of results using several different methods)
    • individual ranking (BIF) threaded implementation (handy in very high dimensional tasks)
    • Monte Carlo and threaded Monte Carlo method selects the best from a random sequence of feature subsets
    • SFS/SBS, SFFS/SBFS, and SFRS/SBRS now enable post-search retreival of best results of each subset size as observed in the course of search
    • modified SFFS implementation to fit more closely the original definition (now runs faster)
    • re-implmented threading in sequential methods now more efficient due to reduced number of thread creations/destructions
    • search method output is now redirectable to arbitrary output stream
    • search method output can be switched off (introduced output levels SILENT, NORMAL, DETAILED)
    • improved result trackers (cloning, joining, etc.)
    • arbitrary data part access substitution (TEST for TRAIN, etc.) to enable bias estimation
    • bias estimating wrapper
    • cleaner stopwatch implementation
    • now permits missing values in data - such values are substituted per feature by the mean value over valid values
    • classifiers now implement method classify(), enabling classification of an arbitrary sample
    • refactored directory structure
    • lots of new demos showing broader variety of usage scenarios
    • demos grouped according to purpose (for easier orientation especially of novice users)
    • various minor improvements and additions (e.g., alternative random initialization of subsets, etc.)
    • corrected several bugs and minor issues
  • 3.0.2.beta
    • added Exhaustive Search procedure in both sequential and threaded implementations to enable optimal feature selection
    • corrected minor issues to support LibSVM 3.0
    • result trackers now support cloning and memory usage limits
    • added logfile with captured output of all demos for verification purposes (rundemos.log)
    • corrected several minor issues
  • 3.0.1.beta
    • added support for reading ARFF (Waikato Weka) data files
    • corrected minor issues to enable compilation in Visual C++
  • 3.0.0.beta
    • initial public release
    • templated C++ code, using Boost library
    • feature selection criteria
      • classification accuracy estimation based (wrappers), see data access options below
        • normal Bayes classifier
        • k-Nearest Neighbor classifier (based on various L-distances)
        • Support Vectior Machine (optional, depends on external LibSVM library)
      • normal model based (filter)
        • Bhattacharyya distance
        • Divergence
        • Generalized Mahalanobis distance
      • multinomial model based (filter) - Bhattacharyya, Mutual Information
      • criteria ensembles
      • hybrids
    • feature selection methods
      • ranking (BIF, best individual features)
      • sequential search (hill-climbing)
        • sequential selection (SFS/SBS, restricted/unrestricted)
        • floating search (SFFS/SBFS, restricted/unrestricted)
        • oscillating search (OS, deterministic, randomized, restricted/unrestricted)
        • dynamic oscillating search (DOS, deterministic, randomized, restricted/unrestricted)
        • in any of the above: threaded, sequential, hybrid or ensemble based feature preference evaluation
      • supporting techniques (freely combinable with methods above)
        • subset size optimization vs. subset size as user parameter
        • result regularization (preference of solutions with slightly lower criterion value to counter over-fitting)
        • feature acquisition cost minimization
        • feature selection process stability evaluation
        • two-process similarity evaluation (to determine impact of parameter change etc.)
    • flexible data processing
      • nested multi-level sampling (splitting to training, valitation, test and possibly other data parts)
      • sampling through extendable objects (includes re-substitution, cross-valiation, hold-out, leave-one-out, random sampling, etc.)
      • normalization through extendable objects (interval shrinking, whitening)
      • support for textual flat data format TRN (see FST1)
  • pre-3.0.0