ValidationCurveDisplay#

class sklearn.model_selection.ValidationCurveDisplay(*, param_name, param_range, train_scores, test_scores, score_name=None)[source]#

验证曲线可视化。

建议使用 from_estimator 创建一个 ValidationCurveDisplay 实例。所有参数都存储为属性。

更多信息请参阅用户指南，了解可视化 API 的一般信息，以及详细文档，了解验证曲线可视化。

1.3 版本新增。

参数:

param_namestr: 已变化参数的名称。
param_rangearray-like of shape (n_ticks,): 已评估参数的值。
train_scoresndarray of shape (n_ticks, n_cv_folds): 训练集上的得分。
test_scoresndarray of shape (n_ticks, n_cv_folds): 测试集上的得分。
score_namestr, default=None: validation_curve 中使用的得分名称。它将覆盖从 scoring 参数推断出的名称。如果 score 为 None，则在 negate_score 为 False 时使用 "Score"，否则使用 "Negative score"。如果 scoring 是字符串或可调用对象，我们将推断名称。我们会将 _ 替换为空格，并使首字母大写。我们还会删除 neg_，并在 negate_score 为 False 时将其替换为 "Negative"，否则直接删除。

属性:

ax_matplotlib Axes: 包含验证曲线的坐标轴。
figure_matplotlib Figure: 包含验证曲线的图。
errorbar_list of matplotlib Artist or None: 当 std_display_style 为 "errorbar" 时，这是一个 matplotlib.container.ErrorbarContainer 对象的列表。如果使用其他样式，errorbar_ 为 None。
lines_list of matplotlib Artist or None: 当 std_display_style 为 "fill_between" 时，这是一个 matplotlib.lines.Line2D 对象的列表，对应于平均训练和测试得分。如果使用其他样式，line_ 为 None。
fill_between_list of matplotlib Artist or None: 当 std_display_style 为 "fill_between" 时，这是一个 matplotlib.collections.PolyCollection 对象的列表。如果使用其他样式，fill_between_ 为 None。

另请参阅

sklearn.model_selection.validation_curve: 计算验证曲线。

示例

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import ValidationCurveDisplay, validation_curve
>>> from sklearn.linear_model import LogisticRegression
>>> X, y = make_classification(n_samples=1_000, random_state=0)
>>> logistic_regression = LogisticRegression()
>>> param_name, param_range = "C", np.logspace(-8, 3, 10)
>>> train_scores, test_scores = validation_curve(
...     logistic_regression, X, y, param_name=param_name, param_range=param_range
... )
>>> display = ValidationCurveDisplay(
...     param_name=param_name, param_range=param_range,
...     train_scores=train_scores, test_scores=test_scores, score_name="Score"
... )
>>> display.plot()
<...>
>>> plt.show()

../../_images/sklearn-model_selection-ValidationCurveDisplay-1.png

classmethod from_estimator(estimator, X, y, *, param_name, param_range, groups=None, cv=None, scoring=None, n_jobs=None, pre_dispatch='all', verbose=0, error_score=nan, fit_params=None, ax=None, negate_score=False, score_name=None, score_type='both', std_display_style='fill_between', line_kw=None, fill_between_kw=None, errorbar_kw=None)[source]#

从估计器创建验证曲线显示。

更多信息请参阅用户指南，了解可视化 API 的一般信息，以及详细文档，了解验证曲线可视化。

参数:

estimator实现“fit”和“predict”方法的对象类型

每次验证都会克隆该类型的对象。

Xarray-like of shape (n_samples, n_features)

训练数据，其中 n_samples 是样本数，n_features 是特征数。

yarray-like of shape (n_samples,) or (n_samples, n_outputs) or None

相对于 X 的分类或回归目标；无监督学习为 None。

param_namestr

将被改变的参数的名称。

param_rangearray-like of shape (n_values,)

将被评估的参数的值。

groupsarray-like of shape (n_samples,), default=None

在将数据集拆分为训练/测试集时使用的样本的分组标签。仅与“Group”cv 实例（例如 GroupKFold）结合使用。

cvint, cross-validation generator or an iterable, default=None

确定交叉验证的拆分策略。cv 的可能输入包括：

None，使用默认的 5 折交叉验证，
int，指定 (Stratified)KFold 中的折叠数，
CV 拆分器,
一个可迭代对象，产生作为索引数组的 (训练, 测试) 拆分。

对于 int/None 输入，如果估计器是分类器且 y 是二元或多类，则使用 StratifiedKFold。在所有其他情况下，使用 KFold。这些拆分器以 shuffle=False 实例化，因此在不同调用中拆分将保持相同。

请参阅用户指南，以了解此处可用的各种交叉验证策略。

scoringstr or callable, default=None

计算验证曲线时使用的评分方法。选项：

str: 请参阅字符串名称评分器以了解选项。
callable: 一个可调用评分器对象（例如函数），其签名为 scorer(estimator, X, y)。请参阅可调用评分器以了解详情。
None: 使用 estimator 的默认评估标准。

n_jobsint, default=None

并行运行的作业数。估计器的训练和得分计算在不同的训练集和测试集上并行进行。None 表示 1，除非在 joblib.parallel_backend 上下文中使用。-1 表示使用所有处理器。更多详情请参阅术语表。

pre_dispatchint or str, default=’all’

并行执行的预调度作业数（默认为所有）。此选项可以减少分配的内存。字符串可以是一个表达式，例如 ‘2*n_jobs’。

verboseint, default=0

控制详细程度：值越高，消息越多。

error_score‘raise’ or numeric, default=np.nan

如果估计器拟合中发生错误，分配给得分的值。如果设置为 ‘raise’，则抛出错误。如果给定数值，则引发 FitFailedWarning。

fit_paramsdict, default=None

传递给估计器 fit 方法的参数。

axmatplotlib Axes, default=None

绘图所用的坐标轴对象。如果为 None，则会创建新的图和坐标轴。

negate_scorebool, default=False

是否否定通过 validation_curve 获得的得分。当在 scikit-learn 中使用由 neg_* 表示的误差时，这尤其有用。

score_namestr, default=None

用于装饰绘图y轴的得分名称。它将覆盖从 scoring 参数推断出的名称。如果 score 为 None，则在 negate_score 为 False 时使用 "Score"，否则使用 "Negative score"。如果 scoring 是字符串或可调用对象，我们将推断名称。我们会将 _ 替换为空格，并使首字母大写。我们还会删除 neg_，并在 negate_score 为 False 时将其替换为 "Negative"，否则直接删除。

score_type{“test”, “train”, “both”}, default=”both”

要绘制的得分类型。可以是 "test"、"train" 或 "both" 之一。

std_display_style{“errorbar”, “fill_between”} or None, default=”fill_between”

用于显示平均得分周围得分标准差的样式。如果为 None，则不显示标准差的表示。

line_kwdict, default=None

传递给用于绘制平均得分的 plt.plot 的额外关键字参数。

fill_between_kwdict, default=None

传递给用于绘制得分标准差的 plt.fill_between 的额外关键字参数。

errorbar_kwdict, default=None

传递给用于绘制平均得分和标准差得分的 plt.errorbar 的额外关键字参数。

返回:

displayValidationCurveDisplay: 存储计算值的对象。

示例

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import ValidationCurveDisplay
>>> from sklearn.linear_model import LogisticRegression
>>> X, y = make_classification(n_samples=1_000, random_state=0)
>>> logistic_regression = LogisticRegression()
>>> param_name, param_range = "C", np.logspace(-8, 3, 10)
>>> ValidationCurveDisplay.from_estimator(
...     logistic_regression, X, y, param_name=param_name,
...     param_range=param_range,
... )
<...>
>>> plt.show()

../../_images/sklearn-model_selection-ValidationCurveDisplay-2.png

plot(ax=None, *, negate_score=False, score_name=None, score_type='both', std_display_style='fill_between', line_kw=None, fill_between_kw=None, errorbar_kw=None)[source]#

绘制可视化图。

参数:

axmatplotlib Axes, default=None: 绘图所用的坐标轴对象。如果为 None，则会创建新的图和坐标轴。
negate_scorebool, default=False: 是否否定通过 validation_curve 获得的得分。当在 scikit-learn 中使用由 neg_* 表示的误差时，这尤其有用。
score_namestr, default=None: 用于装饰绘图y轴的得分名称。它将覆盖从 scoring 参数推断出的名称。如果 score 为 None，则在 negate_score 为 False 时使用 "Score"，否则使用 "Negative score"。如果 scoring 是字符串或可调用对象，我们将推断名称。我们会将 _ 替换为空格，并使首字母大写。我们还会删除 neg_，并在 negate_score 为 False 时将其替换为 "Negative"，否则直接删除。
score_type{“test”, “train”, “both”}, default=”both”: 要绘制的得分类型。可以是 "test"、"train" 或 "both" 之一。
std_display_style{“errorbar”, “fill_between”} or None, default=”fill_between”: 用于显示平均得分周围得分标准差的样式。如果为 None，则不显示标准差。
line_kwdict, default=None: 传递给用于绘制平均得分的 plt.plot 的额外关键字参数。
fill_between_kwdict, default=None: 传递给用于绘制得分标准差的 plt.fill_between 的额外关键字参数。
errorbar_kwdict, default=None: 传递给用于绘制平均得分和标准差得分的 plt.errorbar 的额外关键字参数。

返回:

displayValidationCurveDisplay: 存储计算值的对象。

画廊示例#

模型正则化对训练和测试误差的影响

scikit-learn 1.3 发布亮点

ValidationCurveDisplay#

画廊示例#

本页