有没有人使用sklearn-classes的“n_jobs”?我在Anaconda 3.4 64位中使用sklearn. Spyder版本是2.3.8.在将某些sklearn-class的“n_jobs”参数设置为非零值后,我的脚本无法完成执行.为什么会发生这种情况?
一些scikit-learn工具,如GridSearchCV和cross_val_score,内部依赖于
Python的多处理模块,通过传递n_jobs>来将执行并行化到几个Python进程上. 1作为参数.
取自Sklearn文档:
The problem is that Python multiprocessing does a fork system call without following it with an exec system call for performance reasons. Many libraries like (some versions of) Accelerate / vecLib under OSX,(some versions of) MKL,the OpenMP runtime of GCC,nvidia’s Cuda (and probably many others),manage their own internal thread pool. Upon a call to fork,the thread pool state in the child process is corrupted: the thread pool believes it has many threads while only the main thread state has been forked. It is possible to change the libraries to make them detect when a fork happens and reinitialize the thread pool in that case: we did that for OpenBLAS (merged upstream in master since 0.2.10) and we contributed a patch to GCC’s OpenMP runtime (not yet reviewed).