注意

跳到末尾以下载完整示例代码，或通过 JupyterLite 或 Binder 在浏览器中运行此示例

高级局部依赖性绘图#

PartialDependenceDisplay 对象可用于绘图，无需重新计算局部依赖性。在此示例中，我们将展示如何绘制局部依赖图以及如何使用可视化 API 快速自定义绘图。

注意

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import pandas as pd

from sklearn.datasets import load_diabetes
from sklearn.inspection import PartialDependenceDisplay
from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeRegressor

在糖尿病数据集上训练模型#

首先，我们在糖尿病数据集上训练一个决策树和一个多层感知机。

diabetes = load_diabetes()
X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y = diabetes.target

tree = DecisionTreeRegressor()
mlp = make_pipeline(
    StandardScaler(),
    MLPRegressor(hidden_layer_sizes=(100, 100), tol=1e-2, max_iter=500, random_state=0),
)
tree.fit(X, y)
mlp.fit(X, y)

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('mlpregressor',
                 MLPRegressor(hidden_layer_sizes=(100, 100), max_iter=500,
                              random_state=0, tol=0.01))])

在 Jupyter 环境中，请重新运行此单元格以显示 HTML 表示或信任此 Notebook。
在 GitHub 上，HTML 表示无法渲染，请尝试使用 nbviewer.org 加载此页面。

Pipeline

?Pipeline 文档i已拟合

参数

	steps	[('standardscaler', ...), ('mlpregressor', ...)]
	transform_input	None
	memory	None
	verbose	False

StandardScaler

?StandardScaler 文档

参数

	copy	True
	with_mean	True
	with_std	True

MLPRegressor

?MLPRegressor 文档

参数

	loss	'squared_error'
	hidden_layer_sizes	(100, ...)
	activation	'relu'
	solver	'adam'
	alpha	0.0001
	batch_size	'auto'
	learning_rate	'constant'
	learning_rate_init	0.001
	power_t	0.5
	max_iter	500
	shuffle	True
	random_state	0
	tol	0.01
	verbose	False
	warm_start	False
	momentum	0.9
	nesterovs_momentum	True
	early_stopping	False
	validation_fraction	0.1
	beta_1	0.9
	beta_2	0.999
	epsilon	1e-08
	n_iter_no_change	10
	max_fun	15000

绘制两个特征的局部依赖性#

我们为决策树绘制了特征“年龄”和“BMI”（身体质量指数）的局部依赖曲线。对于两个特征，from_estimator 预期绘制两条曲线。此处，绘图函数使用 ax 定义的空间放置一个包含两个图的网格。

fig, ax = plt.subplots(figsize=(12, 6))
ax.set_title("Decision Tree")
tree_disp = PartialDependenceDisplay.from_estimator(tree, X, ["age", "bmi"], ax=ax)

可以为多层感知机绘制局部依赖曲线。在这种情况下，line_kw 被传递给 from_estimator 以改变曲线的颜色。

fig, ax = plt.subplots(figsize=(12, 6))
ax.set_title("Multi-layer Perceptron")
mlp_disp = PartialDependenceDisplay.from_estimator(
    mlp, X, ["age", "bmi"], ax=ax, line_kw={"color": "red"}
)

同时绘制两个模型的局部依赖性#

tree_disp 和 mlp_disp PartialDependenceDisplay 对象包含重新创建局部依赖曲线所需的所有计算信息。这意味着我们可以轻松创建额外的绘图，而无需重新计算曲线。

一种绘制曲线的方法是将它们放在同一图中，每个模型的曲线占据一行。首先，我们创建一个两行一列，包含两个坐标轴的图。这两个坐标轴被传递给 plot 函数的 tree_disp 和 mlp_disp。绘图函数将使用给定的坐标轴来绘制局部依赖性。生成的绘图将决策树的局部依赖曲线放置在第一行，多层感知机的曲线放置在第二行。

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
tree_disp.plot(ax=ax1)
ax1.set_title("Decision Tree")
mlp_disp.plot(ax=ax2, line_kw={"color": "red"})
ax2.set_title("Multi-layer Perceptron")

Text(0.5, 1.0, 'Multi-layer Perceptron')

另一种比较曲线的方法是将它们叠在一起绘制。在这里，我们创建一个一行两列的图。坐标轴以列表形式传递给 plot 函数，这将在相同的坐标轴上绘制每个模型的局部依赖曲线。坐标轴列表的长度必须等于绘制的图的数量。

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6))
tree_disp.plot(ax=[ax1, ax2], line_kw={"label": "Decision Tree"})
mlp_disp.plot(
    ax=[ax1, ax2], line_kw={"label": "Multi-layer Perceptron", "color": "red"}
)
ax1.legend()
ax2.legend()

plot partial dependence visualization api

<matplotlib.legend.Legend object at 0x7fad10a96050>

tree_disp.axes_ 是一个 NumPy 数组，包含用于绘制局部依赖图的坐标轴。这可以传递给 mlp_disp，以实现将图叠在一起绘制的相同效果。此外，mlp_disp.figure_ 存储着图形对象，这允许在调用 plot 后调整图形大小。在这种情况下，tree_disp.axes_ 具有两个维度，因此 plot 只会在最左侧的图上显示 Y 轴标签和 Y 轴刻度。

tree_disp.plot(line_kw={"label": "Decision Tree"})
mlp_disp.plot(
    line_kw={"label": "Multi-layer Perceptron", "color": "red"}, ax=tree_disp.axes_
)
tree_disp.figure_.set_size_inches(10, 6)
tree_disp.axes_[0, 0].legend()
tree_disp.axes_[0, 1].legend()
plt.show()

绘制单个特征的局部依赖性#

在这里，我们在相同的坐标轴上绘制了单个特征“年龄”的局部依赖曲线。在这种情况下，tree_disp.axes_ 被传递给第二个绘图函数。

tree_disp = PartialDependenceDisplay.from_estimator(tree, X, ["age"])
mlp_disp = PartialDependenceDisplay.from_estimator(
    mlp, X, ["age"], ax=tree_disp.axes_, line_kw={"color": "red"}
)

脚本总运行时间：(0 分 2.362 秒)