注意

转到末尾以下载完整示例代码或通过 JupyterLite 或 Binder 在浏览器中运行此示例。

使用网格搜索进行模型的统计比较#

此示例说明了如何对使用 GridSearchCV 训练和评估的模型性能进行统计比较。

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

我们将从模拟月亮形状数据开始（其中类之间的理想分离是非线性的），并添加适度的噪声。数据点将属于两个可能的类别之一，由两个特征进行预测。我们将为每个类别模拟 50 个样本

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import make_moons

X, y = make_moons(noise=0.352, random_state=1, n_samples=100)

sns.scatterplot(
    x=X[:, 0], y=X[:, 1], hue=y, marker="o", s=25, edgecolor="k", legend=False
).set_title("Data")
plt.show()

我们将比较 SVC 估计器的性能，这些估计器在 kernel 参数上有所不同，以决定哪种超参数选择能最好地预测我们的模拟数据。我们将使用 RepeatedStratifiedKFold 评估模型的性能，重复 10 次 10 折分层交叉验证，每次重复使用不同的数据随机化。性能将使用 roc_auc_score 进行评估。

from sklearn.model_selection import GridSearchCV, RepeatedStratifiedKFold
from sklearn.svm import SVC

param_grid = [
    {"kernel": ["linear"]},
    {"kernel": ["poly"], "degree": [2, 3]},
    {"kernel": ["rbf"]},
]

svc = SVC(random_state=0)

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=10, random_state=0)

search = GridSearchCV(estimator=svc, param_grid=param_grid, scoring="roc_auc", cv=cv)
search.fit(X, y)

GridSearchCV(cv=RepeatedStratifiedKFold(n_repeats=10, n_splits=10, random_state=0),
             estimator=SVC(random_state=0),
             param_grid=[{'kernel': ['linear']},
                         {'degree': [2, 3], 'kernel': ['poly']},
                         {'kernel': ['rbf']}],
             scoring='roc_auc')

在 Jupyter 环境中，请重新运行此单元格以显示 HTML 表示形式或信任 notebook。
在 GitHub 上，HTML 表示形式无法渲染，请尝试使用 nbviewer.org 加载此页面。

我们现在可以检查搜索结果，按 mean_test_score 排序

import pandas as pd

results_df = pd.DataFrame(search.cv_results_)
results_df = results_df.sort_values(by=["rank_test_score"])
results_df = results_df.set_index(
    results_df["params"].apply(lambda x: "_".join(str(val) for val in x.values()))
).rename_axis("kernel")
results_df[["params", "rank_test_score", "mean_test_score", "std_test_score"]]

	params	rank_test_score	mean_test_score	std_test_score
kernel
rbf	{'kernel': 'rbf'}	1	0.9400	0.079297
linear	{'kernel': 'linear'}	2	0.9300	0.077846
3_poly	{'degree': 3, 'kernel': 'poly'}	3	0.9044	0.098776
2_poly	{'degree': 2, 'kernel': 'poly'}	4	0.6852	0.169106

我们可以看到使用 'rbf' 内核的估计器表现最好，紧随其后的是 'linear'。两个使用 'poly' 内核的估计器表现都较差，其中使用二次多项式的估计器性能远低于所有其他模型。

通常，分析到此结束，但故事只讲了一半。 GridSearchCV 的输出没有提供模型之间差异确定性的信息。我们不知道这些差异是否具有统计学意义。为了评估这一点，我们需要进行统计检验。具体来说，要对比两个模型的性能，我们应该统计比较它们的 AUC 分数。由于我们重复了 10 次 10 折交叉验证，因此每个模型有 100 个样本（AUC 分数）。

然而，模型的得分并非独立：所有模型都在相同的 100 个分区上进行评估，这增加了模型性能之间的相关性。由于某些数据分区可能使得所有模型都特别容易或难以找到类别区分，因此模型得分会共同变化。

让我们通过绘制每个折叠中所有模型的性能，并计算模型在折叠之间的相关性来检查这种分区效应

# create df of model scores ordered by performance
model_scores = results_df.filter(regex=r"split\d*_test_score")

# plot 30 examples of dependency between cv fold and AUC scores
fig, ax = plt.subplots()
sns.lineplot(
    data=model_scores.transpose().iloc[:30],
    dashes=False,
    palette="Set1",
    marker="o",
    alpha=0.5,
    ax=ax,
)
ax.set_xlabel("CV test fold", size=12, labelpad=10)
ax.set_ylabel("Model AUC", size=12)
ax.tick_params(bottom=True, labelbottom=False)
plt.show()

# print correlation of AUC scores across folds
print(f"Correlation of models:\n {model_scores.transpose().corr()}")

Correlation of models:
 kernel       rbf    linear    3_poly    2_poly
kernel
rbf     1.000000  0.882561  0.783392  0.351390
linear  0.882561  1.000000  0.746492  0.298688
3_poly  0.783392  0.746492  1.000000  0.355440
2_poly  0.351390  0.298688  0.355440  1.000000

我们可以观察到模型的性能高度依赖于折叠。

因此，如果我们假设样本之间独立，我们将低估统计检验中计算的方差，从而增加假阳性错误的数量（即，在不存在统计学意义差异的情况下检测到差异）[1]。

针对这些情况开发了几种方差校正的统计检验。在本例中，我们将展示如何在两种不同的统计框架下实现其中一种（即 Nadeau 和 Bengio 的校正 t 检验）：频率派和贝叶斯派。

比较两个模型：频率派方法#

我们可以从提问开始：“（按 mean_test_score 排名时）第一个模型是否显着优于第二个模型？”

要使用频率派方法回答这个问题，我们可以运行配对 t 检验并计算 p 值。这在预测文献中也称为 Diebold-Mariano 检验 [5]。为了解决上一节中描述的“样本非独立性问题”，已经开发了这种 t 检验的许多变体。我们将使用被证明能获得最高可复制性分数（评估模型在同一数据集的不同随机分区上评估时的性能相似程度）同时保持较低假阳性率和假阴性率的变体：Nadeau 和 Bengio 的校正 t 检验 [2]，该检验使用 10 次重复的 10 折交叉验证 [3]。

这个校正后的配对 t 检验计算如下

\[t=\frac{\frac{1}{k \cdot r}\sum_{i=1}^{k}\sum_{j=1}^{r}x_{ij}} {\sqrt{(\frac{1}{k \cdot r}+\frac{n_{test}}{n_{train}})\hat{\sigma}^2}}\]

其中 \(k\) 是折叠数，\(r\) 是交叉验证中的重复次数，\(x\) 是模型性能的差异，\(n_{test}\) 是用于测试的样本数，\(n_{train}\) 是用于训练的样本数，\(\hat{\sigma}^2\) 表示观察到的差异的方差。

让我们实现一个校正后的右尾配对 t 检验，以评估第一个模型的性能是否显着优于第二个模型。我们的原假设是第二个模型的性能至少与第一个模型一样好。

import numpy as np
from scipy.stats import t


def corrected_std(differences, n_train, n_test):
    """Corrects standard deviation using Nadeau and Bengio's approach.

    Parameters
    ----------
    differences : ndarray of shape (n_samples,)
        Vector containing the differences in the score metrics of two models.
    n_train : int
        Number of samples in the training set.
    n_test : int
        Number of samples in the testing set.

    Returns
    -------
    corrected_std : float
        Variance-corrected standard deviation of the set of differences.
    """
    # kr = k times r, r times repeated k-fold crossvalidation,
    # kr equals the number of times the model was evaluated
    kr = len(differences)
    corrected_var = np.var(differences, ddof=1) * (1 / kr + n_test / n_train)
    corrected_std = np.sqrt(corrected_var)
    return corrected_std


def compute_corrected_ttest(differences, df, n_train, n_test):
    """Computes right-tailed paired t-test with corrected variance.

    Parameters
    ----------
    differences : array-like of shape (n_samples,)
        Vector containing the differences in the score metrics of two models.
    df : int
        Degrees of freedom.
    n_train : int
        Number of samples in the training set.
    n_test : int
        Number of samples in the testing set.

    Returns
    -------
    t_stat : float
        Variance-corrected t-statistic.
    p_val : float
        Variance-corrected p-value.
    """
    mean = np.mean(differences)
    std = corrected_std(differences, n_train, n_test)
    t_stat = mean / std
    p_val = t.sf(np.abs(t_stat), df)  # right-tailed t-test
    return t_stat, p_val

model_1_scores = model_scores.iloc[0].values  # scores of the best model
model_2_scores = model_scores.iloc[1].values  # scores of the second-best model

differences = model_1_scores - model_2_scores

n = differences.shape[0]  # number of test sets
df = n - 1
n_train = len(next(iter(cv.split(X, y)))[0])
n_test = len(next(iter(cv.split(X, y)))[1])

t_stat, p_val = compute_corrected_ttest(differences, df, n_train, n_test)
print(f"Corrected t-value: {t_stat:.3f}\nCorrected p-value: {p_val:.3f}")

Corrected t-value: 0.750
Corrected p-value: 0.227

我们可以将校正后的 t 值和 p 值与未校正的进行比较

t_stat_uncorrected = np.mean(differences) / np.sqrt(np.var(differences, ddof=1) / n)
p_val_uncorrected = t.sf(np.abs(t_stat_uncorrected), df)

print(
    f"Uncorrected t-value: {t_stat_uncorrected:.3f}\n"
    f"Uncorrected p-value: {p_val_uncorrected:.3f}"
)

Uncorrected t-value: 2.611
Uncorrected p-value: 0.005

使用常规的显著性 alpha 水平 p=0.05，我们观察到未校正的 t 检验得出结论：第一个模型显着优于第二个模型。

相比之下，使用校正方法，我们未能检测到这种差异。

然而，在后一种情况下，频率派方法不允许我们得出第一个和第二个模型具有相同性能的结论。如果我们要做出这种断言，我们需要使用贝叶斯方法。

比较两个模型：贝叶斯方法#

我们可以使用贝叶斯估计来计算第一个模型优于第二个模型的概率。贝叶斯估计将输出一个分布，该分布的均值 \(\mu\) 是两个模型性能差异的均值。

为了获得后验分布，我们需要定义一个先验分布来建模我们在查看数据之前对均值分布的信念，并将其乘以一个似然函数，该函数计算给定均值可能取的值时，我们观察到的差异有多大可能性。

贝叶斯估计可以用多种形式来回答我们的问题，但在本例中，我们将实现 Benavoli 及其同事建议的方法 [4]。

使用闭式表达式定义后验的一种方法是选择一个与似然函数共轭的先验分布。Benavoli 及其同事 [4] 表明，在比较两个分类器的性能时，我们可以将先验建模为 Normal-Gamma 分布（均值和方差均未知），该分布与正态似然共轭，从而将后验表示为正态分布。通过对这个正态后验的方差进行边缘化，我们可以将均值参数的后验定义为 Student’s t 分布。具体来说

\[St(\mu;n-1,\overline{x},(\frac{1}{n}+\frac{n_{test}}{n_{train}}) \hat{\sigma}^2)\]

其中 \(n\) 是样本总数，\(\overline{x}\) 表示分数的平均差异，\(n_{test}\) 是用于测试的样本数，\(n_{train}\) 是用于训练的样本数，\(\hat{\sigma}^2\) 表示观察到的差异的方差。

请注意，我们也在贝叶斯方法中使用了 Nadeau 和 Bengio 的校正方差。

让我们计算并绘制后验分布

# initialize random variable
t_post = t(
    df, loc=np.mean(differences), scale=corrected_std(differences, n_train, n_test)
)

让我们绘制后验分布

x = np.linspace(t_post.ppf(0.001), t_post.ppf(0.999), 100)

plt.plot(x, t_post.pdf(x))
plt.xticks(np.arange(-0.04, 0.06, 0.01))
plt.fill_between(x, t_post.pdf(x), 0, facecolor="blue", alpha=0.2)
plt.ylabel("Probability density")
plt.xlabel(r"Mean difference ($\mu$)")
plt.title("Posterior distribution")
plt.show()

我们可以通过计算后验分布从零到无穷大的曲线下面积来计算第一个模型优于第二个模型的概率。反之亦然：我们可以通过计算从负无穷大到零的曲线下面积来计算第二个模型优于第一个模型的概率。

better_prob = 1 - t_post.cdf(0)

print(
    f"Probability of {model_scores.index[0]} being more accurate than "
    f"{model_scores.index[1]}: {better_prob:.3f}"
)
print(
    f"Probability of {model_scores.index[1]} being more accurate than "
    f"{model_scores.index[0]}: {1 - better_prob:.3f}"
)

Probability of rbf being more accurate than linear: 0.773
Probability of linear being more accurate than rbf: 0.227

与频率派方法相反，我们可以计算一个模型优于另一个模型的概率。

请注意，我们获得了与频率派方法相似的结果。鉴于我们对先验的选择，我们本质上执行相同的计算，但允许我们做出不同的断言。

实际等效区域#

有时我们对确定模型具有等效性能的概率感兴趣，其中“等效”以实际方式定义。一种朴素的方法 [4] 是当估计器在准确性上的差异小于 1% 时，将其定义为实际等效。但我们也可以根据我们试图解决的问题来定义这种实际等效性。例如，准确性差异 5% 可能意味着销售额增加 1000 美元，我们认为任何高于此数量的差异都与我们的业务相关。

在本例中，我们将实际等效区域（ROPE）定义为 \([-0.01, 0.01]\)。也就是说，如果两个模型在性能上的差异小于 1%，我们将认为它们实际等效。

为了计算分类器实际等效的概率，我们计算后验分布在 ROPE 区间上的曲线下面积

rope_interval = [-0.01, 0.01]
rope_prob = t_post.cdf(rope_interval[1]) - t_post.cdf(rope_interval[0])

print(
    f"Probability of {model_scores.index[0]} and {model_scores.index[1]} "
    f"being practically equivalent: {rope_prob:.3f}"
)

Probability of rbf and linear being practically equivalent: 0.432

我们可以绘制后验分布在 ROPE 区间上的分布情况

x_rope = np.linspace(rope_interval[0], rope_interval[1], 100)

plt.plot(x, t_post.pdf(x))
plt.xticks(np.arange(-0.04, 0.06, 0.01))
plt.vlines([-0.01, 0.01], ymin=0, ymax=(np.max(t_post.pdf(x)) + 1))
plt.fill_between(x_rope, t_post.pdf(x_rope), 0, facecolor="blue", alpha=0.2)
plt.ylabel("Probability density")
plt.xlabel(r"Mean difference ($\mu$)")
plt.title("Posterior distribution under the ROPE")
plt.show()

正如 [4] 中所建议的，我们可以使用与频率派方法相同的标准进一步解释这些概率：落入 ROPE 内部的概率是否大于 95%（alpha 值为 5%）？在这种情况下，我们可以得出结论，两个模型实际等效。

贝叶斯估计方法还允许我们计算我们对差异估计的不确定性。这可以使用可信区间来计算。对于给定的概率，它们显示了估计量（在本例中为性能的平均差异）可以取的值范围。例如，50% 的可信区间 [x, y] 告诉我们，模型之间真实（平均）性能差异在 x 和 y 之间的概率为 50%。

让我们使用 50%、75% 和 95% 来确定我们数据_的可信区间

cred_intervals = []
intervals = [0.5, 0.75, 0.95]

for interval in intervals:
    cred_interval = list(t_post.interval(interval))
    cred_intervals.append([interval, cred_interval[0], cred_interval[1]])

cred_int_df = pd.DataFrame(
    cred_intervals, columns=["interval", "lower value", "upper value"]
).set_index("interval")
cred_int_df

	lower value	upper value
interval
0.50	0.000977	0.019023
0.75	-0.005422	0.025422
0.95	-0.016445	0.036445

如表所示，真实模型平均差异在 0.000977 和 0.019023 之间的概率为 50%，在 -0.005422 和 0.025422 之间的概率为 70%，在 -0.016445 和 0.036445 之间的概率为 95%。

所有模型的成对比较：频率派方法#

我们也可能对比较使用 GridSearchCV 评估的所有模型的性能感兴趣。在这种情况下，我们将多次运行统计检验，这导致了多重比较问题。

有许多可能的方法可以解决这个问题，但一种标准方法是应用Bonferroni 校正。Bonferroni 校正可以通过将 p 值乘以我们正在测试的比较次数来计算。

让我们使用校正后的 t 检验比较模型的性能

from itertools import combinations
from math import factorial

n_comparisons = factorial(len(model_scores)) / (
    factorial(2) * factorial(len(model_scores) - 2)
)
pairwise_t_test = []

for model_i, model_k in combinations(range(len(model_scores)), 2):
    model_i_scores = model_scores.iloc[model_i].values
    model_k_scores = model_scores.iloc[model_k].values
    differences = model_i_scores - model_k_scores
    t_stat, p_val = compute_corrected_ttest(differences, df, n_train, n_test)
    p_val *= n_comparisons  # implement Bonferroni correction
    # Bonferroni can output p-values higher than 1
    p_val = 1 if p_val > 1 else p_val
    pairwise_t_test.append(
        [model_scores.index[model_i], model_scores.index[model_k], t_stat, p_val]
    )

pairwise_comp_df = pd.DataFrame(
    pairwise_t_test, columns=["model_1", "model_2", "t_stat", "p_val"]
).round(3)
pairwise_comp_df

	model_1	model_2	t_stat	p_val
0	rbf	linear	0.750	1.000
1	rbf	3_poly	1.657	0.302
2	rbf	2_poly	4.565	0.000
3	linear	3_poly	1.111	0.807
4	linear	2_poly	4.276	0.000
5	3_poly	2_poly	3.851	0.001

我们观察到，在校正多重比较后，唯一与其他模型显着不同的模型是 '2_poly'。 GridSearchCV 排名第一的模型 'rbf' 与 'linear' 或 '3_poly' 没有显着差异。

所有模型的成对比较：贝叶斯方法#

当使用贝叶斯估计比较多个模型时，我们不需要对多重比较进行校正（原因请参阅 [4]）。

我们可以像第一节一样进行成对比较

pairwise_bayesian = []

for model_i, model_k in combinations(range(len(model_scores)), 2):
    model_i_scores = model_scores.iloc[model_i].values
    model_k_scores = model_scores.iloc[model_k].values
    differences = model_i_scores - model_k_scores
    t_post = t(
        df, loc=np.mean(differences), scale=corrected_std(differences, n_train, n_test)
    )
    worse_prob = t_post.cdf(rope_interval[0])
    better_prob = 1 - t_post.cdf(rope_interval[1])
    rope_prob = t_post.cdf(rope_interval[1]) - t_post.cdf(rope_interval[0])

    pairwise_bayesian.append([worse_prob, better_prob, rope_prob])

pairwise_bayesian_df = pd.DataFrame(
    pairwise_bayesian, columns=["worse_prob", "better_prob", "rope_prob"]
).round(3)

pairwise_comp_df = pairwise_comp_df.join(pairwise_bayesian_df)
pairwise_comp_df

	model_1	model_2	t_stat	p_val	worse_prob	better_prob	rope_prob
0	rbf	linear	0.750	1.000	0.068	0.500	0.432
1	rbf	3_poly	1.657	0.302	0.018	0.882	0.100
2	rbf	2_poly	4.565	0.000	0.000	1.000	0.000
3	linear	3_poly	1.111	0.807	0.063	0.750	0.187
4	linear	2_poly	4.276	0.000	0.000	1.000	0.000
5	3_poly	2_poly	3.851	0.001	0.000	1.000	0.000

使用贝叶斯方法，我们可以计算一个模型比另一个模型更好、更差或实际等效的概率。

结果显示，GridSearchCV 排名第一的模型 'rbf'，有大约 6.8% 的可能性比 'linear' 差，有 1.8% 的可能性比 '3_poly' 差。 'rbf' 和 'linear' 有 43% 的概率实际等效，而 'rbf' 和 '3_poly' 有 10% 的概率实际等效。

与使用频率派方法得出的结论相似，所有模型都有 100% 的概率优于 '2_poly'，并且没有一个模型与后者具有实际等效的性能。

总结要点#

性能度量的微小差异很容易仅仅是偶然造成的，而不是因为一个模型系统性地优于另一个模型。如本例所示，统计数据可以告诉你这种可能性的高低。
当对在 GridSearchCV 中评估的两个模型的性能进行统计比较时，有必要校正计算出的方差，因为模型的得分不是相互独立的，这可能导致方差被低估。
使用（方差校正的）配对 t 检验的频率派方法可以告诉我们一个模型的性能是否以高于偶然的确定性优于另一个模型。
贝叶斯方法可以提供一个模型比另一个模型更好、更差或实际等效的概率。它还可以告诉我们对模型真实差异落在特定值范围内的确定程度。
如果对多个模型进行统计比较，使用频率派方法时需要进行多重比较校正。

References

脚本总运行时间： (0 minutes 1.443 seconds)

	estimator estimator: estimator object 假定它实现了 scikit-learn estimator 接口。要么 estimator 需要提供一个 ``score`` 函数，要么必须传入 ``scoring``。	SVC(random_state=0)
	param_grid param_grid: dict or list of dictionaries 字典的键为参数名称（`str`），值为要尝试的参数设置列表，或者此类字典的列表，在这种情况下，将探索列表中每个字典所涵盖的网格。这使得可以搜索任何参数设置序列。	[{'kernel': ['linear']}, {'degree': [2, 3], 'kernel': ['poly']}, ...]
	scoring scoring: str, callable, list, tuple or dict, default=None 用于评估交叉验证模型在测试集上性能的策略。如果 `scoring` 代表单个分数，可以使用： - 单个字符串（参见 :ref:`scoring_string_names`）； - 返回单个值的可调用对象（参见 :ref:`scoring_callable`）； - `None`，使用 `estimator` 的 :ref:`默认评估标准 `。如果 `scoring` 代表多个分数，可以使用： - 唯一字符串的列表或元组； - 返回字典的可调用对象，其中键是度量名称，值是度量分数； - 键为度量名称，值为可调用对象的字典。有关示例，请参见 :ref:`multimetric_grid_search`。	'roc_auc'
	n_jobs n_jobs: int, default=None 并行运行的作业数。 ``None`` 表示 1，除非在 :obj:`joblib.parallel_backend` 上下文中。 ``-1`` 表示使用所有处理器。有关详细信息，请参见 :term:`Glossary `。 .. versionchanged:: v0.20 `n_jobs` 默认值从 1 更改为 None	None
	refit refit: bool, str, or callable, default=True 使用在整个数据集上找到的最佳参数重新拟合估计器。对于多重度量评估，这需要是一个 `str`，表示将用于在最后找到最佳参数以重新拟合估计器的评分器。如果选择最佳估计器时有除最大分数之外的考虑因素，则可以将 ``refit`` 设置为一个函数，该函数在给定 ``cv_results_`` 的情况下返回选定的 ``best_index_``。在这种情况下，``best_estimator_`` 和 ``best_params_`` 将根据返回的 ``best_index_`` 设置，而 ``best_score_`` 属性将不可用。重新拟合的估计器在 ``best_estimator_`` 属性中可用，并允许直接在此 ``GridSearchCV`` 实例上使用 ``predict``。同样对于多重度量评估，只有在设置了 ``refit`` 并且所有这些属性都是根据此特定评分器确定的情况下，``best_index_``、``best_score_`` 和 ``best_params_`` 属性才可用。有关多重度量评估的更多信息，请参阅 ``scoring`` 参数。请参阅 :ref:`sphx_glr_auto_examples_model_selection_plot_grid_search_digits.py` 以了解如何使用可调用对象通过 `refit` 设计自定义选择策略。请参阅 :ref:`this example ` 以了解如何使用 ``refit=callable`` 来平衡模型复杂度和交叉验证分数。 .. versionchanged:: 0.20 添加了对可调用对象的支持。	True
	cv cv: int, cross-validation generator or an iterable, default=None 确定交叉验证拆分策略。 cv 的可能输入包括： - None，使用默认的 5 折交叉验证， - 整数，指定 `(Stratified)KFold` 中的折数， - :term:`CV splitter`， - 可迭代对象，生成 (train, test) 拆分作为索引数组。对于整数/None 输入，如果 estimator 是一个分类器且 ``y`` 是二进制或多类，则使用 :class:`StratifiedKFold`。在所有其他情况下，使用 :class:`KFold`。实例化这些 splitter 时 `shuffle=False`，因此拆分在不同调用中将是相同的。有关此处可使用的各种交叉验证策略，请参见 :ref:`User Guide `。 .. versionchanged:: 0.22 None 时 ``cv`` 默认值从 3 折更改为 5 折。	RepeatedStrat...andom_state=0)
	verbose verbose: int 控制详细程度：值越高，消息越多。 - >1 : 显示每个折叠和参数候选项的计算时间； - >2 : 也显示分数； - >3 : 显示折叠和候选项参数索引以及计算的开始时间。	0
	pre_dispatch pre_dispatch: int, or str, default='2n_jobs' 控制并行执行期间调度的作业数。减少此数字有助于避免在调度作业数多于 CPU 可处理数时内存消耗激增。此参数可以是： - None，在这种情况下，所有作业会立即创建和生成。用于轻量级和快速运行的作业，以避免因按需生成作业而导致的延迟。 - 一个 int，给出生成的总作业的确切数量。 - 一个 str，给出作为 n_jobs 函数的表达式，例如 '2n_jobs'。	'2*n_jobs'
	error_score error_score: 'raise' or numeric, default=np.nan 如果 estimator 拟合发生错误，分配给分数的值。如果设置为 'raise'，则会引发错误。如果给定数字值，则会引发 FitFailedWarning。此参数不影响 refit 步骤，refit 步骤始终会引发错误。	nan
	return_train_score return_train_score: bool, default=False 如果为 ``False``，则 ``cv_results_`` 属性将不包括训练分数。计算训练分数用于深入了解不同参数设置如何影响过拟合/欠拟合的权衡。但是，计算训练集上的分数可能会耗费计算资源，并且对于选择产生最佳泛化性能的参数不是严格必需的。 .. versionadded:: 0.19 .. versionchanged:: 0.21 默认值从 ``True`` 更改为 ``False``	False

	C C: float, default=1.0 正则化参数。正则化强度与 C 成反比。必须严格为正。惩罚项是平方 l2 惩罚。有关缩放正则化参数 C 的效果的直观可视化，请参阅 :ref:`sphx_glr_auto_examples_svm_plot_svm_scale_c.py`。	1.0
	kernel kernel: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable, default='rbf' 指定算法中要使用的核类型。如果未给出，将使用 'rbf'。如果给出了可调用对象，则用于从数据矩阵预先计算核矩阵；该矩阵的形状应为 ``(n_samples, n_samples)``。有关不同核类型的直观可视化，请参阅 :ref:`sphx_glr_auto_examples_svm_plot_svm_kernels.py`。	'rbf'
	degree degree: int, default=3 多项式核函数 ('poly') 的度。必须为非负数。被所有其他核忽略。	3
	gamma gamma: {'scale', 'auto'} or float, default='scale' 'rbf'、'poly' 和 'sigmoid' 的核系数。 - 如果传递 ``gamma='scale'``（默认值），则使用 1 / (n_features * X.var()) 作为 gamma 的值， - 如果为 'auto'，则使用 1 / n_features - 如果为 float，则必须为非负数。 .. versionchanged:: 0.22 ``gamma`` 的默认值从 'auto' 更改为 'scale'。	'scale'
	coef0 coef0: float, default=0.0 核函数中的独立项。它仅在 'poly' 和 'sigmoid' 中有意义。	0.0
	shrinking shrinking: bool, default=True 是否使用收缩启发式。请参阅 :ref:`User Guide `。	True
	probability probability: bool, default=False 是否启用概率估计。必须在调用 `fit` 之前启用此选项，这会使该方法变慢，因为它内部使用 5 折交叉验证，并且 `predict_proba` 可能与 `predict` 不一致。在 :ref:`User Guide ` 中阅读更多内容。	False
	tol tol: float, default=1e-3 停止准则的容差。	0.001
	cache_size cache_size: float, default=200 指定核缓存的大小（以 MB 为单位）。	200
	class_weight class_weight: dict or 'balanced', default=None 将类 i 的参数 C 设置为 class_weight[i]C，用于 SVC。如果未给出，则所有类都被假定权重为一。 “balanced”模式使用 y 的值自动调整权重，使其与输入数据中类频率成反比，即 ``n_samples / (n_classes np.bincount(y))``。	None
	verbose verbose: bool, default=False 启用详细输出。请注意，此设置利用 libsvm 中的每个进程运行时设置，如果启用，在多线程环境中可能无法正常工作。	False
	max_iter max_iter: int, default=-1 求解器中迭代的硬限制，-1 表示没有限制。	-1
	decision_function_shape decision_function_shape: {'ovo', 'ovr'}, default='ovr' 是返回形状为 (n_samples, n_classes) 的 one-vs-rest ('ovr') 决策函数（与所有其他分类器一样），还是返回 libsvm 原始的 one-vs-one ('ovo') 决策函数，其形状为 (n_samples, n_classes * (n_classes - 1) / 2)。但是请注意，在内部，one-vs-one ('ovo') 始终用作训练模型的多类策略；ovr 矩阵仅由 ovo 矩阵构建。对于二元分类，该参数将被忽略。 .. versionchanged:: 0.19 decision_function_shape 默认为 'ovr'。 .. versionadded:: 0.17 建议使用 decision_function_shape='ovr'。 .. versionchanged:: 0.17 弃用 decision_function_shape='ovo' 和 None。	'ovr'
	break_ties break_ties: bool, default=False 如果为 True，``decision_function_shape='ovr'``，且类数 > 2，则 :term:`predict` 将根据 :term:`decision_function` 的置信值打破平局；否则返回平局类中的第一个类。请注意，打破平局与简单的预测相比，计算成本相对较高。请参阅 :ref:`sphx_glr_auto_examples_svm_plot_svm_tie_breaking.py` 以获取其与 ``decision_function_shape='ovr'`` 一起使用的示例。 .. versionadded:: 0.22	False
	random_state random_state: int, RandomState instance or None, default=None 控制用于概率估计的数据洗牌的伪随机数生成。当 `probability` 为 False 时忽略。传递一个 int 以在多次函数调用中获得可重现的输出。请参阅 :term:`Glossary `。	0