python - Finding the optimal combination of algorithms in an sklearn machine learning toolchain -


in sklearn possible create pipeline optimize complete tool chain of machine learning setup, shown in following sample:

from sklearn.pipeline import pipeline sklearn.svm import svc sklearn.decomposition import pca estimators = [('reduce_dim', pca()), ('svm', svc())] clf = pipeline(estimators) 

now pipeline represents definition serial process. if want compare different algorithms on same level of pipeline? want try feature transformation algorithm additionally pca , machine learning algorithm such trees additionally svm, , best of 4 possible combinations? can represented kind of parallel pipe or there meta algorithm in sklearn?

a pipeline sequential:

data -> process input algorithm -> process input algorithm b -> ... 

something parallel, , think you're looking called "ensemble". example, in classification context can train several svms on different features:

      |-svm gets features x_1, ... x_n       -> vote class 1 -| data -|-svm b gets features x_{n+1}, ..., x_m  -> vote class 1 -| -> classify       |-svm c gets features x_{m+1}, ..., x_p  -> vote class 0 -| 

in small example 2 of 3 classifiers voted class 1, 3rd voted class 0. majority vote, ensemble classifies data class 1. (here, classifiers executed in parallel)

of course, can have several pipelines in ensemble.

see sklearns ensemble methods pretty summary.

a short image summary made while ago different ensemble methods:


Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -