注意
转到结尾 下载完整的示例代码。或通过JupyterLite或Binder在浏览器中运行此示例
缓存最近邻#
此示例演示如何在KNeighborsClassifier中使用最近邻之前预先计算k个最近邻。KNeighborsClassifier可以在内部计算最近邻,但是预先计算它们可以带来一些好处,例如更精细的参数控制、用于多次使用的缓存或自定义实现。
这里我们使用管道的缓存属性来缓存KNeighborsClassifier多次拟合之间的最近邻图。第一次调用很慢,因为它计算邻近图,而后续调用则更快,因为它们不需要重新计算图。这里的持续时间很短,因为数据集很小,但是当数据集变大或要搜索的参数网格变大时,收益会更加显著。
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
from tempfile import TemporaryDirectory
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier, KNeighborsTransformer
from sklearn.pipeline import Pipeline
X, y = load_digits(return_X_y=True)
n_neighbors_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# The transformer computes the nearest neighbors graph using the maximum number
# of neighbors necessary in the grid search. The classifier model filters the
# nearest neighbors graph as required by its own n_neighbors parameter.
graph_model = KNeighborsTransformer(n_neighbors=max(n_neighbors_list), mode="distance")
classifier_model = KNeighborsClassifier(metric="precomputed")
# Note that we give `memory` a directory to cache the graph computation
# that will be used several times when tuning the hyperparameters of the
# classifier.
with TemporaryDirectory(prefix="sklearn_graph_cache_") as tmpdir:
full_model = Pipeline(
steps=[("graph", graph_model), ("classifier", classifier_model)], memory=tmpdir
)
param_grid = {"classifier__n_neighbors": n_neighbors_list}
grid_model = GridSearchCV(full_model, param_grid)
grid_model.fit(X, y)
# Plot the results of the grid search.
fig, axes = plt.subplots(1, 2, figsize=(8, 4))
axes[0].errorbar(
x=n_neighbors_list,
y=grid_model.cv_results_["mean_test_score"],
yerr=grid_model.cv_results_["std_test_score"],
)
axes[0].set(xlabel="n_neighbors", title="Classification accuracy")
axes[1].errorbar(
x=n_neighbors_list,
y=grid_model.cv_results_["mean_fit_time"],
yerr=grid_model.cv_results_["std_fit_time"],
color="r",
)
axes[1].set(xlabel="n_neighbors", title="Fit time (with caching)")
fig.tight_layout()
plt.show()
脚本总运行时间:(0分钟1.454秒)
相关示例
比较使用和不使用邻域成分分析的最近邻
最近邻分类
TSNE 中的近似最近邻
scikit-learn 0.22 发行亮点