'number of threads VS performance' is not as expected
Increasing number of threads does not constantly improves performance. After a certain threshold, increasing number of threads seems to have adverse effects on performance. This observation is more obvious when fcml is compiled with lapack library. The reason is not clear.
For now adjusting number of threads to the optimum value (which is not the maximum number of cores) can have considerable performance gains.
A side issue regarding correctness: in case of compilation with lapack, #threads = #cores/2 (and not other values) can lead to completely wrong results.