calibration_curve#

sklearn.calibration.calibration_curve(y_true, y_prob, *, pos_label=None, n_bins=5, strategy='uniform')[源代码]#

计算校准曲线的真实概率和预测概率。

该方法假定输入来自二分类器，并将 [0, 1] 区间离散化为多个 bin。

校准曲线也可能被称为可靠性图。

在用户指南中阅读更多。

参数:

y_true形状为 (n_samples,) 的类数组

真实目标。

y_prob形状为 (n_samples,) 的类数组

正类的概率。

pos_labelint、float、bool 或 str，默认为 None

正类的标签。

在 1.1 版本中新增。

n_binsint，默认为 5

用于离散化 [0, 1] 区间的 bin 数量。数字越大，所需数据越多。不包含样本（即 y_prob 中没有对应值）的 bin 将不会被返回，因此返回的数组可能包含少于 n_bins 的值。

strategy{‘uniform’（均匀），‘quantile’（分位数）}，默认为 ‘uniform’

用于定义 bin 宽度的策略。

均匀: bin 具有相同的宽度。
分位数: bin 具有相同的样本数量，并取决于 y_prob。

返回:

prob_true形状为 (n_bins,) 或更小的 ndarray: 每个 bin 中类别为正类的样本比例（正样本的比例）。
prob_pred形状为 (n_bins,) 或更小的 ndarray: 每个 bin 中的平均预测概率。

另请参阅

CalibrationDisplay.from_predictions: 使用真实标签和预测标签绘制校准曲线。
CalibrationDisplay.from_estimator: 使用估计器和数据绘制校准曲线。

参考文献

Alexandru Niculescu-Mizil 和 Rich Caruana (2005) Predicting Good Probabilities With Supervised Learning（在监督学习中预测良好概率），收录于第 22 届国际机器学习会议 (ICML) 论文集。参见第 4 节（预测的定性分析）。

示例

>>> import numpy as np
>>> from sklearn.calibration import calibration_curve
>>> y_true = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1])
>>> y_pred = np.array([0.1, 0.2, 0.3, 0.4, 0.65, 0.7, 0.8, 0.9,  1.])
>>> prob_true, prob_pred = calibration_curve(y_true, y_pred, n_bins=3)
>>> prob_true
array([0. , 0.5, 1. ])
>>> prob_pred
array([0.2  , 0.525, 0.85 ])

calibration_curve#

本页