python - Finding the optimal combination of algorithms in an sklearn machine learning toolchain -
in sklearn possible create pipeline optimize complete tool chain of machine learning setup, shown in following sample:
from sklearn.pipeline import pipeline sklearn.svm import svc sklearn.decomposition import pca estimators = [('reduce_dim', pca()), ('svm', svc())] clf = pipeline(estimators)
now pipeline represents definition serial process. if want compare different algorithms on same level of pipeline? want try feature transformation algorithm additionally pca , machine learning algorithm such trees additionally svm, , best of 4 possible combinations? can represented kind of parallel pipe or there meta algorithm in sklearn?
a pipeline sequential:
data -> process input algorithm -> process input algorithm b -> ...
something parallel, , think you're looking called "ensemble". example, in classification context can train several svms on different features:
|-svm gets features x_1, ... x_n -> vote class 1 -| data -|-svm b gets features x_{n+1}, ..., x_m -> vote class 1 -| -> classify |-svm c gets features x_{m+1}, ..., x_p -> vote class 0 -|
in small example 2 of 3 classifiers voted class 1, 3rd voted class 0. majority vote, ensemble classifies data class 1. (here, classifiers executed in parallel)
of course, can have several pipelines in ensemble.
see sklearns ensemble methods pretty summary.
a short image summary made while ago different ensemble methods:
Comments
Post a Comment