注意

转到末尾以下载完整示例代码。或者通过JupyterLite或Binder在浏览器中运行此示例

局部依赖图和个体条件期望图#

局部依赖图显示目标函数[2]与一组感兴趣特征之间的依赖关系，同时对所有其他特征（补集特征）的值进行边缘化处理。由于人类感知的限制，感兴趣特征集的大小必须很小（通常是一到两个），因此它们通常从最重要的特征中选择。

类似地，个体条件期望（ICE）图[3]显示目标函数与一个感兴趣特征之间的依赖关系。然而，与显示感兴趣特征平均效应的局部依赖图不同，ICE图分别可视化预测对每个样本特征的依赖性，每个样本对应一条线。ICE图只支持一个感兴趣特征。

本示例展示了如何从在共享单车数据集上训练的MLPRegressor和HistGradientBoostingRegressor中获取局部依赖图和ICE图。本示例的灵感来自[1]。

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

使用不同模型的单向局部依赖#

在本节中，我们将使用两种不同的机器学习模型计算单向局部依赖：(i) 多层感知器和 (ii) 梯度提升模型。通过这两个模型，我们将演示如何计算和解释数值特征和类别特征的局部依赖图（PDP）以及个体条件期望（ICE）。

多层感知器#

让我们拟合一个MLPRegressor并计算单变量局部依赖图。

from time import time

from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import make_pipeline

print("Training MLPRegressor...")
tic = time()
mlp_model = make_pipeline(
    mlp_preprocessor,
    MLPRegressor(
        hidden_layer_sizes=(30, 15),
        learning_rate_init=0.01,
        early_stopping=True,
        random_state=0,
    ),
)
mlp_model.fit(X_train, y_train)
print(f"done in {time() - tic:.3f}s")
print(f"Test R2 score: {mlp_model.score(X_test, y_test):.2f}")

Training MLPRegressor...
done in 0.591s
Test R2 score: 0.61

我们使用专门为神经网络创建的预处理器配置了一个管道，并调整了神经网络的大小和学习率，以在训练时间和测试集上的预测性能之间取得合理的折衷。

重要的是，这个表格数据集的特征动态范围差异很大。神经网络往往对具有不同尺度的特征非常敏感，忘记预处理数值特征会导致模型性能非常差。

使用更大的神经网络有可能获得更高的预测性能，但训练成本也会显著增加。

请注意，在绘制局部依赖图之前，检查模型在测试集上是否足够准确非常重要，因为解释一个预测性能差的模型的预测函数中给定特征的影响意义不大。在这方面，我们的MLP模型表现相当好。

我们将绘制平均局部依赖图。

import matplotlib.pyplot as plt

from sklearn.inspection import PartialDependenceDisplay

common_params = {
    "subsample": 50,
    "n_jobs": 2,
    "grid_resolution": 20,
    "random_state": 0,
}

print("Computing partial dependence plots...")
features_info = {
    # features of interest
    "features": ["temp", "humidity", "windspeed", "season", "weather", "hour"],
    # type of partial dependence plot
    "kind": "average",
    # information regarding categorical features
    "categorical_features": categorical_features,
}
tic = time()
_, ax = plt.subplots(ncols=3, nrows=2, figsize=(9, 8), constrained_layout=True)
display = PartialDependenceDisplay.from_estimator(
    mlp_model,
    X_train,
    **features_info,
    ax=ax,
    **common_params,
)
print(f"done in {time() - tic:.3f}s")
_ = display.figure_.suptitle(
    (
        "Partial dependence of the number of bike rentals\n"
        "for the bike rental dataset with an MLPRegressor"
    ),
    fontsize=16,
)

Partial dependence of the number of bike rentals for the bike rental dataset with an MLPRegressor

Computing partial dependence plots...
done in 0.549s

梯度提升#

现在让我们拟合一个HistGradientBoostingRegressor并计算相同特征的局部依赖。我们还使用了为该模型创建的专用预处理器。

from sklearn.ensemble import HistGradientBoostingRegressor

print("Training HistGradientBoostingRegressor...")
tic = time()
hgbdt_model = make_pipeline(
    hgbdt_preprocessor,
    HistGradientBoostingRegressor(
        categorical_features=categorical_features,
        random_state=0,
        max_iter=50,
    ),
)
hgbdt_model.fit(X_train, y_train)
print(f"done in {time() - tic:.3f}s")
print(f"Test R2 score: {hgbdt_model.score(X_test, y_test):.2f}")

Training HistGradientBoostingRegressor...
done in 0.123s
Test R2 score: 0.62

在这里，我们使用了梯度提升模型的默认超参数，没有进行任何预处理，因为基于树的模型对数值特征的单调变换具有天然的鲁棒性。

请注意，在此表格数据集上，梯度提升机在训练速度和准确性方面都显著优于神经网络。调整其超参数的成本也显著降低（默认值通常效果良好，而神经网络则不然）。

我们将绘制一些数值和类别特征的局部依赖图。

print("Computing partial dependence plots...")
tic = time()
_, ax = plt.subplots(ncols=3, nrows=2, figsize=(9, 8), constrained_layout=True)
display = PartialDependenceDisplay.from_estimator(
    hgbdt_model,
    X_train,
    **features_info,
    ax=ax,
    **common_params,
)
print(f"done in {time() - tic:.3f}s")
_ = display.figure_.suptitle(
    (
        "Partial dependence of the number of bike rentals\n"
        "for the bike rental dataset with a gradient boosting"
    ),
    fontsize=16,
)

Partial dependence of the number of bike rentals for the bike rental dataset with a gradient boosting

Computing partial dependence plots...
done in 1.015s

图分析#

我们将首先查看数值特征的PDPs。对于这两个模型，温度PDP的总体趋势是单车租赁数量随温度升高而增加。我们可以对湿度特征进行类似但趋势相反的分析。湿度增加时，单车租赁数量减少。最后，我们看到风速特征也呈现相同趋势。当风速增加时，两个模型的单车租赁数量都减少。我们还观察到MLPRegressor的预测比HistGradientBoostingRegressor平滑得多。

现在，我们将查看类别特征的局部依赖图。

我们观察到，对于季节特征，春季是最低的。对于天气特征，降雨类别是最低的。关于小时特征，我们看到上午7点和下午6点左右有两个高峰。这些发现与我们之前在数据集上进行的观察结果一致。

然而，值得注意的是，如果特征相关，我们可能会创建潜在的无意义合成样本。

ICE vs. PDP#

PDP是特征边际效应的平均值。我们正在平均所提供集合中所有样本的响应。因此，某些效应可能被隐藏。在这方面，可以绘制每个个体响应。这种表示被称为个体效应图（ICE）。在下面的图中，我们绘制了温度和湿度特征的50个随机选择的ICE。

print("Computing partial dependence plots and individual conditional expectation...")
tic = time()
_, ax = plt.subplots(ncols=2, figsize=(6, 4), sharey=True, constrained_layout=True)

features_info = {
    "features": ["temp", "humidity"],
    "kind": "both",
    "centered": True,
}

display = PartialDependenceDisplay.from_estimator(
    hgbdt_model,
    X_train,
    **features_info,
    ax=ax,
    **common_params,
)
print(f"done in {time() - tic:.3f}s")
_ = display.figure_.suptitle("ICE and PDP representations", fontsize=16)

Computing partial dependence plots and individual conditional expectation...
done in 0.471s

我们看到温度特征的ICE为我们提供了一些额外信息：一些ICE线是平坦的，而另一些则显示温度超过35摄氏度时依赖性下降。我们观察到湿度特征也有类似的模式：一些ICE线显示湿度超过80%时急剧下降。

并非所有ICE线都平行，这表明模型发现了特征之间的交互作用。我们可以通过使用参数interaction_cst约束梯度提升模型不使用任何特征交互来重复实验。

from sklearn.base import clone

interaction_cst = [[i] for i in range(X_train.shape[1])]
hgbdt_model_without_interactions = (
    clone(hgbdt_model)
    .set_params(histgradientboostingregressor__interaction_cst=interaction_cst)
    .fit(X_train, y_train)
)
print(f"Test R2 score: {hgbdt_model_without_interactions.score(X_test, y_test):.2f}")

Test R2 score: 0.38

_, ax = plt.subplots(ncols=2, figsize=(6, 4), sharey=True, constrained_layout=True)

features_info["centered"] = False
display = PartialDependenceDisplay.from_estimator(
    hgbdt_model_without_interactions,
    X_train,
    **features_info,
    ax=ax,
    **common_params,
)
_ = display.figure_.suptitle("ICE and PDP representations", fontsize=16)

2D交互图#

具有两个感兴趣特征的PDPs使我们能够可视化它们之间的交互作用。然而，ICEs无法以简单的方式绘制和解释。我们将展示from_estimator中提供的表示形式，即2D热力图。

print("Computing partial dependence plots...")
features_info = {
    "features": ["temp", "humidity", ("temp", "humidity")],
    "kind": "average",
}
_, ax = plt.subplots(ncols=3, figsize=(10, 4), constrained_layout=True)
tic = time()
display = PartialDependenceDisplay.from_estimator(
    hgbdt_model,
    X_train,
    **features_info,
    ax=ax,
    **common_params,
)
print(f"done in {time() - tic:.3f}s")
_ = display.figure_.suptitle(
    "1-way vs 2-way of numerical PDP using gradient boosting", fontsize=16
)

1-way vs 2-way of numerical PDP using gradient boosting

Computing partial dependence plots...
done in 7.227s

双向局部依赖图显示了单车租赁数量对温度和湿度的联合值的依赖性。我们清楚地看到了这两个特征之间的交互作用。当温度高于20摄氏度时，湿度对单车租赁数量的影响似乎与温度无关。

另一方面，当温度低于20摄氏度时，温度和湿度都持续影响单车租赁数量。

此外，20摄氏度阈值影响脊的斜率非常依赖于湿度水平：在干燥条件下，该脊很陡峭，但在湿度高于70%的潮湿条件下则平滑得多。

现在我们将这些结果与为受约束学习不依赖此类非线性特征交互的预测函数的模型计算的相同图表进行对比。

print("Computing partial dependence plots...")
features_info = {
    "features": ["temp", "humidity", ("temp", "humidity")],
    "kind": "average",
}
_, ax = plt.subplots(ncols=3, figsize=(10, 4), constrained_layout=True)
tic = time()
display = PartialDependenceDisplay.from_estimator(
    hgbdt_model_without_interactions,
    X_train,
    **features_info,
    ax=ax,
    **common_params,
)
print(f"done in {time() - tic:.3f}s")
_ = display.figure_.suptitle(
    "1-way vs 2-way of numerical PDP using gradient boosting", fontsize=16
)

Computing partial dependence plots...
done in 6.518s

限制不建模特征交互的模型的一维局部依赖图显示了每个特征的局部尖峰，特别是对于“湿度”特征。这些尖峰可能反映了模型的一种退化行为，它试图通过过拟合特定训练点来某种程度上补偿被禁止的交互。请注意，该模型在测试集上测量的预测性能显著低于原始的、不受约束的模型。

另请注意，这些图上可见的局部尖峰数量取决于PD图本身的网格分辨率参数。

这些局部尖峰导致了有噪声网格的2D PD图。由于湿度特征中的高频振荡，很难判断这些特征之间是否存在交互作用。然而，可以清楚地看到，当温度超过20度边界时观察到的简单交互效应对于该模型不再可见。

类别特征之间的局部依赖将提供一个离散表示，可以显示为热力图。例如，季节、天气和目标之间的交互作用如下：

print("Computing partial dependence plots...")
features_info = {
    "features": ["season", "weather", ("season", "weather")],
    "kind": "average",
    "categorical_features": categorical_features,
}
_, ax = plt.subplots(ncols=3, figsize=(14, 6), constrained_layout=True)
tic = time()
display = PartialDependenceDisplay.from_estimator(
    hgbdt_model,
    X_train,
    **features_info,
    ax=ax,
    **common_params,
)

print(f"done in {time() - tic:.3f}s")
_ = display.figure_.suptitle(
    "1-way vs 2-way PDP of categorical features using gradient boosting", fontsize=16
)

1-way vs 2-way PDP of categorical features using gradient boosting

Computing partial dependence plots...
done in 0.317s

3D表示#

让我们为这两个特征交互绘制相同的局部依赖图，这次以3D形式呈现。

# unused but required import for doing 3d projections with matplotlib < 3.2
import mpl_toolkits.mplot3d  # noqa: F401
import numpy as np

from sklearn.inspection import partial_dependence

fig = plt.figure(figsize=(5.5, 5))

features = ("temp", "humidity")
pdp = partial_dependence(
    hgbdt_model, X_train, features=features, kind="average", grid_resolution=10
)
XX, YY = np.meshgrid(pdp["grid_values"][0], pdp["grid_values"][1])
Z = pdp.average[0].T
ax = fig.add_subplot(projection="3d")
fig.add_axes(ax)

surf = ax.plot_surface(XX, YY, Z, rstride=1, cstride=1, cmap=plt.cm.BuPu, edgecolor="k")
ax.set_xlabel(features[0])
ax.set_ylabel(features[1])
fig.suptitle(
    "PD of number of bike rentals on\nthe temperature and humidity GBDT model",
    fontsize=16,
)
# pretty init view
ax.view_init(elev=22, azim=122)
clb = plt.colorbar(surf, pad=0.08, shrink=0.6, aspect=10)
clb.ax.set_title("Partial\ndependence")
plt.show()

PD of number of bike rentals on the temperature and humidity GBDT model, Partial dependence

自定义检查点#

到目前为止，所有示例都没有指定评估哪些点来创建局部依赖图。默认情况下，我们使用输入数据集定义的分位数。在某些情况下，指定您希望模型评估的精确点可能会有所帮助。例如，如果用户想要测试模型在分布外数据上的行为，或者比较在略微不同数据上拟合的两个模型。custom_values参数允许用户传入他们希望模型评估的值。这会覆盖grid_resolution和percentiles参数。让我们回到上面带有自定义值的梯度提升示例

print("Computing partial dependence plots with custom evaluation values...")
tic = time()
_, ax = plt.subplots(ncols=2, figsize=(6, 4), sharey=True, constrained_layout=True)

features_info = {
    "features": ["temp", "humidity"],
    "kind": "both",
}

display = PartialDependenceDisplay.from_estimator(
    hgbdt_model,
    X_train,
    **features_info,
    ax=ax,
    **common_params,
    # we set custom values for temp feature -
    # all other features are evaluated based on the data
    custom_values={"temp": np.linspace(0, 40, 10)},
)
print(f"done in {time() - tic:.3f}s")
_ = display.figure_.suptitle(
    (
        "Partial dependence of the number of bike rentals\n"
        "for the bike rental dataset with a gradient boosting"
    ),
    fontsize=16,
)

Computing partial dependence plots with custom evaluation values...
done in 0.425s

脚本总运行时间： (0分钟 22.278秒)

	转换器	[('num', ...), ('cat', ...)]
	剩余	'drop'
	稀疏阈值	0.3
	n_jobs	无
	转换器权重	无
	详细模式	False
	详细特征名称输出	True
	强制整数剩余列	'deprecated'

	n_分位数	100
	输出分布	'uniform'
	忽略隐式零	False
	子样本	10000
	随机状态	无
	复制	True

	类别	'auto'
	丢弃	无
	稀疏输出	True
	数据类型	<class 'numpy.float64'>
	处理未知值	'ignore'
	最小频率	无
	最大类别数	无
	特征名称组合器	'concat'

	转换器	[('cat', ...), ('num', ...)]
	剩余	'drop'
	稀疏阈值	1
	n_jobs	无
	转换器权重	无
	详细模式	False
	详细特征名称输出	False
	强制整数剩余列	'deprecated'

	类别	'auto'
	数据类型	<class 'numpy.float64'>
	处理未知值	'error'
	未知值	无
	编码缺失值	nan
	最小频率	无
	最大类别数	无

局部依赖图和个体条件期望图#

机器学习模型预处理器#

神经网络模型预处理器#

梯度提升模型预处理器#

使用不同模型的单向局部依赖#

多层感知器#

梯度提升#

图分析#

ICE vs. PDP#

2D交互图#

3D表示#

自定义检查点#

本页

局部依赖图和个体条件期望图#

共享单车数据集预处理#

机器学习模型预处理器#

神经网络模型预处理器#

梯度提升模型预处理器#

使用不同模型的单向局部依赖#

多层感知器#

梯度提升#

图分析#

ICE vs. PDP#

2D交互图#

3D表示#

自定义检查点#

本页