12. 选择合适的估计器#

解决机器学习问题最困难的部分往往是找到合适的估计器。不同的估计器更适合不同类型的数据和不同的问题。

下面的流程图旨在为用户提供一些关于如何处理问题以及在数据上尝试哪些估计器的粗略指南。点击下图中的任何估计器即可查看其文档。“尝试下一个”橙色箭头应理解为“如果此估计器没有达到预期的结果,则遵循箭头并尝试下一个”。使用滚轮放大和缩小,点击并拖动以平移。您也可以下载图表: ml_map.svg

START
START
>50
samples
>50...
get
more
data
get...
NO
NO
predicting a
category
predicting...
YES
YES
do you have
labeled
data
do you hav...
YES
YES
predicting a
quantity
predicting...
NO
NO
just
looking
just...
NO
NO
predicting
structure
predicting...
NO
NO
tough
luck
tough...
<100K
samples
<100K...
YES
YES
SGD
Classifier
SGD...
NO
NO
Linear
SVC
Linear...
YES
YES
text
data
text...
Kernel
Approximation
Kernel...
KNeighbors
Classifier
KNeighbors...
NO
NO
SVC
SVC
Ensemble
Classifiers
Ensemble...
Naive
Bayes
Naive...
YES
YES
classification
classification
number of
categories
known
number of...
NO
NO
<10K
samples
<10K...
<10K
samples
<10K...
NO
NO
NO
NO
YES
YES
MeanShift
MeanShift
VBGMM
VBGMM
YES
YES
MiniBatch
KMeans
MiniBatch...
NO
NO
clustering
clustering
KMeans
KMeans
YES
YES
Spectral
Clustering
Spectral...
GMM
GMM
<100K
samples
<100K...
YES
YES
few features
should be
important
few features...
YES
YES
SGD
Regressor
SGD...
NO
NO
Lasso
Lasso
ElasticNet
ElasticNet
YES
YES
RidgeRegression
RidgeRegression
SVR(kernel="linear")
SVR(kernel="linea...
NO
NO
SVR(kernel="rbf")
SVR(kernel="rbf...
Ensemble
Regressors
Ensemble...
regression
regression
Ramdomized
PCA
Ramdomized...
YES
YES
<10K
samples
<10K...
Kernel
Approximation
Kernel...
NO
NO
IsoMap
IsoMap
Spectral
Embedding
Spectral...
YES
YES
LLE
LLE
dimensionality
reduction
dimensionality...
scikit-learn
algorithm cheat sheet
scikit-learn...
TRY
NEXT
TRY...
TRY
NEXT
TRY...
TRY
NEXT
TRY...
TRY
NEXT
TRY...
TRY
NEXT
TRY...
TRY
NEXT
TRY...
TRY
NEXT
TRY...
Text is not SVG - cannot display