Feature Selection Toolbox

Frequently Asked Questions

GENERAL

What is the meaning of threaded process? Does it simply mean to speed up the search process with the search parameters the same? Will there be any difference when using a 32-bit or 64-bit windows OS?
It is only about speed. 32 vs 64 bit is unimportant, but the number of cpu cores is important. Feature selection is often very time consuming, and threaded search may effectively reduce search time on multi-processor systems.

COMPILATION

Why does FST3 not compile/link under my Linux distribution ?
With high probability the reason might be just a minor difference in naming of files to be included. When linking Boost library files, try to modify Makefile to include the right one of these: "-l boost_thread", or "-l boost_thread-mt", or "-l boost_thread-mt32", or "-l boost_thread-mt64", or whatever is valid on your system with your current configuration. (Sorry, automatic configurations is not available for FST3 yet.)
Are there any guides on how to integrate visual studio 2008, boost and FST3.1? I have followed the steps in boost to create a simple "example.cpp". How can it relate to FST?
A short but complete installation guide is in the readme.txt file in fst3 sources root directory.
Are there any examples I can follow on how can I run the demos in visual studio environment using FST?
See the previous answer.
Once the "Empty Project" was created in microsoft visual studio 2010, several folders was generated under the project. I put all those .hpp files into the Header Files folder and one single demo.cpp file into the Source Files folder. Is this correct or should I put all those files under the Source Files folder?
It is correct (hpp files belong to Headers, cpp files belong to Source Files).
The readme.txt said "In case of threaded demos add FST_THREADED to Preprocessor Definitions under Configuration Properties - C/C++ - Preprocessor. In case of debug target add DEBUG therein, in case of the release target add NDEBUG". I don't understand when do I need to add definitions into Preprocessor?
If you intend just to use the library and not to develop on top of it, set NDEBUG in all cases and ignore DEBUG. As for the FST_THREADED definition - use it whenever the name of the demo to be compiled is marked by 't' at the end (e.g. for demo24 do not set FST_THREADED, but for demo24t do set it).
Regarding the NDEBUG setting, do you mean that I have to set it in VC++ under both the Debug and Release Configuration if I am not using threaded demos?
Set DEBUG under Debug configuration and NDEBUG under Release configuration (regardless which demo you use).
For LibSVM, I extracted svm.cpp and svm.h and put them into the the sourcecode\libsvm folder as instructed. And I also add these two files into the Source Files folder of Visual C++. Is this correct?
cpp files belong to Source Files, but the h file is a header and belongs to Header Files. Note that this might not be necessary at all if LibSVM is already installed on your system - in such case it should be enough to manually add the inclusion path to FST3 Makefile.
After all things are done according to instructions, I firstly added demo11.cpp into the visual studio project and compile. The following warning regarding "error.hpp" popped up: ..\demo10.cpp(62): fatal error C1083: Cannot open include file: 'error.hpp': No such file or directory. I did check that "error.hpp" is in the project. I have tried to compile other demos, the same error message pop up. Please advise.
That only means that Visual C++ does not see file 'error.hpp'. Is it correctly added between the Header files?
Where should I put my TRN/ARFF file in order to run?
Access to data is defined in each demo??.cpp file as follows:

boost::shared_ptr<DATAACCESSOR> da(new DATAACCESSOR("data/speech_15.trn",splitters,dsc));

You can change the filename and path in this line according to the true location of your data file. In case you just want to run the default FST3 examples, then as seen in the line the data files should be places in 'data' subdirectory of the directory where the demo will be executed.
After VC++ successfully compiled the demo11, I can't find demo11.exe in my defined project folder. Instead I can only find demo11.obj under the Debug folder of my project. Am I missing any step so that the exe file disappeared ?
Look also elsewhere, it may differ depending on particular settings but VC++ tends to place debug executables to Debug subdirectory and optimized executables to Release subdirectory.
Although the VC++ said the it compile the demo successfully, it seemed that there are many warnings during the processes, are they important?
This is no problem, just ignore it. It just tells that some of called functions are old-style (in C++ vs. C terms), but that is meant just as suggestion for rewriting the code using newer notation; and it relates only to the least important part of the library - the import filters. We did not bother with the code that just reads numeric values from files, all our effort was concentrated on the FS methods themselves.
When I tried to use demo26, it failed to compile in VC++ and error message related to the header file, classifier_normal_bayes.hpp. It said, "classifier_normal_bayes.hpp(255): error C2065: 'M_PI' : undeclared identifier". Since I have no idea how to fix this, any way to this?
Add _USE_MATH_DEFINES to Preprocessor Definitions under Configuration Properties - C/C++ - Preprocessor. (This is a Visual C++ related problem - Microsoft just does not define the value of PI by default).

USAGE

A sentence in the demo script says "SFFS is called here in d-parametrized setting, invoked by nonzero parameter d in search(d,...)". Once the search starts, I guessed the search will go on for different subset size varies between 1<d<D by default. Is it possible to set a particular range of subset size of interest like in FST1? If yes, how can I do that?
In case of floating search (SFFS) just call it with 0, i.e., search(0,..) and it will produce results for all sizes 1 to D. It will point out the solution (and subset size) that yielded highest value, but in addition to that you can check all the results for all subset sizes as well. That's the advantage of floating search that it searches through all subset sizes at once.
In the demos scripts, I can see some optional items after the process "run the search". If I don't want them, can I just simply delete them without affect the whole process?
Yes, what is marked as optional is not part of the search process.
During the data splitting process, the program use SPLITTER5050. Does the process split the data randomly? Can I change the ratio to 70:30?
SPLITTER5050 splits data exactly to first and second half. If you need random choice of data samples in your training and testing parts, you need SPLITTERRANDRAND as in Example22. Ratio can be set then arbitrarily.
The parameters in the demos are pre-defined. Would it be difficult if I want to change the parameters settings? Take demo11 as an example, If I want to change to leave-one-out cross validation and SVM classifier...
Note that SVM depends on external library and thus the FST3 Makefile needs to take it into account. Instead of explaining this in detail I better suggest for you to find some example that looks similar to what you need, is not threaded, and already does contain SVM. If you modify such example with your data splitting and perhaps search procedure, then it would compile fine.
In most of the cases, the demos set to use FORWARD selection method. I had the experience in FST1 that BACKWARD search is more time consuming. Are there any reason that we may need BACKWARD search? Would the subset selection from FORWARD and BACKWARD diverge very much?
That's a theoretical question. there are cases when forward is better and others when backward may be better. I guess we discuss it somewhat in one of those papers than can be found in References on fst.utia.cz (look for the INTECH book chapter, pdf is there for download).
After the search finished or I terminated the search, where can I find the log file for the results?
There is no default logfile, all supplied demos just print everything to the screen. Textual output can be easily redirected though by supplying the requested output stream variable to any 'search()' call. E.g., in demo20.cpp in line srch.search(0,critval_train,sub,wknn,std::cout) replace std::cout by your output stream variable.
I've tested a few demos (22,26,10,53) using the supplied data and the original source codes. When I used sample data with only TWO classes, like the sonar_60.trn or speech_15.trn, the above demos are all running fine. However, when I tried data with more than two classes, like optdigits_64.trn or waveform_40.trn, the program failed. I don't know if my observation is correct. I wonder if those demos are for two-class problem only? Can I set the demos to run with more than two classes?
There are two possible problems. First, all normal model based criteria like Bhattacharyya, Divergence, or Mahalanobis are defined for 2-class cases only. FST3 does not have any workaround in this case for k-class (k>2) data and just produces meaningless output. That is the case with demo10, demo22 and demo53. Second, even if a tool is not restricted to 2-class data, it may produce useless output due to numerical problems. This is the case with demo26 which illustrates the use of normal Bayes classifier. If the data is far from normally distributed and the number of samples per class is relatively small, normal model parameters may not be possible to estimate reliably. This is likely to be the case with optdigits_60 data.
Remark: k-NN and SVM wrappers are considerably more robust than Bayes classifier in this sense.