6. 可视化#

Scikit-learn 定义了一个简单的 API，用于创建机器学习可视化。此 API 的关键特性是允许快速绘图和视觉调整，而无需重新计算。我们提供了 Display 类，它们公开了两种创建绘图的方法：from_estimator 和 from_predictions。

方法 from_estimator 从一个已拟合的估计器、输入数据（X、y）和绘图生成一个 Display 对象。方法 from_predictions 从真实值和预测值（y_test、y_pred）和绘图创建一个 Display 对象。

使用 from_predictions 可以避免重新计算预测，但用户需要注意传递的预测值与 pos_label 相对应。对于 predict_proba，选择与 pos_label 类别对应的列；而对于 decision_function，如果 pos_label 不是估计器 classes_ 属性中的最后一个类别，则需要反转分数（即乘以 -1）。

Display 对象存储使用 Matplotlib 绘图所需的计算值（例如，度量值或特征重要性）。这些值是从传递给 from_predictions 的原始预测，或传递给 from_estimator 的估计器和 X 派生而来的结果。

一旦 Display 对象被初始化，它就有一个 plot 方法可以创建 Matplotlib 绘图（请注意，我们建议通过 from_estimator 或 from_predictions 创建 Display 对象，而不是直接初始化）。plot 方法允许通过将现有绘图的 matplotlib.axes.Axes 传递给 ax 参数来添加到现有绘图。

在以下示例中，我们使用 from_estimator 为已拟合的逻辑回归模型绘制 ROC 曲线。

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import RocCurveDisplay
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
y = y == 2  # make binary
X_train, X_test, y_train, y_test = train_test_split(
   X, y, test_size=.8, random_state=42
)
clf = LogisticRegression(random_state=42, C=.01)
clf.fit(X_train, y_train)

clf_disp = RocCurveDisplay.from_estimator(clf, X_test, y_test)

如果您已经有了预测值，您可以改为使用 from_predictions 来做同样的事情（并节省计算）。

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import RocCurveDisplay
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
y = y == 2  # make binary
X_train, X_test, y_train, y_test = train_test_split(
   X, y, test_size=.8, random_state=42
)
clf = LogisticRegression(random_state=42, C=.01)
clf.fit(X_train, y_train)

# select the probability of the class that we considered to be the positive label
y_pred = clf.predict_proba(X_test)[:, 1]

clf_disp = RocCurveDisplay.from_predictions(y_test, y_pred)

返回的 clf_disp 对象允许我们向已计算的 ROC 曲线添加另一条曲线。在这种情况下，clf_disp 是一个 RocCurveDisplay，它将计算值存储为名为 roc_auc`、fpr` 和 tpr` 的属性。


接下来，我们训练一个随机森林分类器，并通过使用 Display 对象的 plot 方法再次绘制之前计算的 ROC 曲线。
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(n_estimators=10, random_state=42)
rfc.fit(X_train, y_train)

ax = plt.gca()
rfc_disp = RocCurveDisplay.from_estimator(
  rfc, X_test, y_test, ax=ax, curve_kwargs={"alpha": 0.8}
)
clf_disp.plot(ax=ax, curve_kwargs={"alpha": 0.8})





请注意，我们向绘图函数传递了 alpha=0.8 以调整曲线的 alpha 值。
示例

使用可视化 API 的 ROC 曲线
高级偏依赖绘图
使用 Display 对象的 Visualization
分类器校准比较


6.1. 可用绘图工具#

6.1.1. Display 对象#


calibration.CalibrationDisplay(prob_true, ...)
校准曲线（也称为可靠性图）可视化。

inspection.PartialDependenceDisplay(...[, ...])
偏依赖图 (PDP) 和个体条件期望 (ICE)。

inspection.DecisionBoundaryDisplay(*, xx0, ...)
决策边界可视化。

metrics.ConfusionMatrixDisplay(...[, ...])
混淆矩阵可视化。

metrics.DetCurveDisplay(*, fpr, fnr[, ...])
检测错误权衡 (DET) 曲线可视化。

metrics.PrecisionRecallDisplay(precision, ...)
精确度-召回率可视化。

metrics.PredictionErrorDisplay(*, y_true, y_pred)
回归模型的预测误差可视化。

metrics.RocCurveDisplay(*, fpr, tpr[, ...])
ROC 曲线可视化。

model_selection.LearningCurveDisplay(*, ...)
学习曲线可视化。

model_selection.ValidationCurveDisplay(*, ...)
验证曲线可视化。


              
              
              
                
                  
  
    

       上一页
 5.2. 排列特征重要性
 
    下一页
 7. 数据集转换


            
            
              
                
                


  

     本页内容
  
    
6.1. 可用绘图工具
6.1.1. Display 对象



  

  
  
    本页
    
      显示源代码

`calibration.CalibrationDisplay`(prob_true, ...)	校准曲线（也称为可靠性图）可视化。
`inspection.PartialDependenceDisplay`(...[, ...])	偏依赖图 (PDP) 和个体条件期望 (ICE)。
`inspection.DecisionBoundaryDisplay`(*, xx0, ...)	决策边界可视化。
`metrics.ConfusionMatrixDisplay`(...[, ...])	混淆矩阵可视化。
`metrics.DetCurveDisplay`(*, fpr, fnr[, ...])	检测错误权衡 (DET) 曲线可视化。
`metrics.PrecisionRecallDisplay`(precision, ...)	精确度-召回率可视化。
`metrics.PredictionErrorDisplay`(*, y_true, y_pred)	回归模型的预测误差可视化。
`metrics.RocCurveDisplay`(*, fpr, tpr[, ...])	ROC 曲线可视化。
`model_selection.LearningCurveDisplay`(*, ...)	学习曲线可视化。
`model_selection.ValidationCurveDisplay`(*, ...)	验证曲线可视化。