注意

转至末尾以下载完整示例代码或通过 JupyterLite 或 Binder 在浏览器中运行此示例。

平衡模型复杂度和交叉验证分数#

此示例演示如何通过在最佳准确率分数的 1 个标准差内找到一个合适的准确率，同时最大限度地减少 PCA 组件的数量，来平衡模型复杂度和交叉验证分数 [1]。它使用带有自定义 refit callable 的 GridSearchCV 来选择最佳模型。

该图显示了交叉验证分数和 PCA 组件数量之间的权衡。平衡情况是当 n_components=10 且 accuracy=0.88 时，这落在最佳准确率分数的 1 个标准差范围内。

参考文献#

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np
import polars as pl

from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, ShuffleSplit
from sklearn.pipeline import Pipeline

引言#

在调整超参数时，我们通常希望平衡模型复杂度和性能。“一标准误”规则是一种常用方法：选择性能在最佳模型性能的一个标准误范围内的最简单模型。这有助于通过在性能统计上与更复杂的模型相当时偏好更简单的模型来避免过拟合。

辅助函数#

我们定义了两个辅助函数

lower_bound: 计算可接受性能的阈值（最佳分数 - 1 个标准差）
best_low_complexity: 选择 PCA 组件最少但超过此阈值的模型

def lower_bound(cv_results):
    """
    Calculate the lower bound within 1 standard deviation
    of the best `mean_test_scores`.

    Parameters
    ----------
    cv_results : dict of numpy(masked) ndarrays
        See attribute cv_results_ of `GridSearchCV`

    Returns
    -------
    float
        Lower bound within 1 standard deviation of the
        best `mean_test_score`.
    """
    best_score_idx = np.argmax(cv_results["mean_test_score"])

    return (
        cv_results["mean_test_score"][best_score_idx]
        - cv_results["std_test_score"][best_score_idx]
    )


def best_low_complexity(cv_results):
    """
    Balance model complexity with cross-validated score.

    Parameters
    ----------
    cv_results : dict of numpy(masked) ndarrays
        See attribute cv_results_ of `GridSearchCV`.

    Return
    ------
    int
        Index of a model that has the fewest PCA components
        while has its test score within 1 standard deviation of the best
        `mean_test_score`.
    """
    threshold = lower_bound(cv_results)
    candidate_idx = np.flatnonzero(cv_results["mean_test_score"] >= threshold)
    best_idx = candidate_idx[
        cv_results["param_reduce_dim__n_components"][candidate_idx].argmin()
    ]
    return best_idx

设置管道和参数网格#

我们创建一个包含两个步骤的管道

使用 PCA 进行降维
使用 LogisticRegression 进行分类

我们将搜索不同数量的 PCA 组件以找到最佳复杂度。

pipe = Pipeline(
    [
        ("reduce_dim", PCA(random_state=42)),
        ("classify", LogisticRegression(random_state=42, C=0.01, max_iter=1000)),
    ]
)

param_grid = {"reduce_dim__n_components": [6, 8, 10, 15, 20, 25, 35, 45, 55]}

使用 GridSearchCV 执行搜索#

我们使用 GridSearchCV 并将我们的自定义 best_low_complexity 函数作为 refit 参数。此函数将选择 PCA 组件最少但性能仍在最佳模型的一个标准差范围内的模型。

grid = GridSearchCV(
    pipe,
    # Use a non-stratified CV strategy to make sure that the inter-fold
    # standard deviation of the test scores is informative.
    cv=ShuffleSplit(n_splits=30, random_state=0),
    n_jobs=1,  # increase this on your machine to use more physical cores
    param_grid=param_grid,
    scoring="accuracy",
    refit=best_low_complexity,
    return_train_score=True,
)

加载 digits 数据集并拟合模型#

X, y = load_digits(return_X_y=True)
grid.fit(X, y)

GridSearchCV(cv=ShuffleSplit(n_splits=30, random_state=0, test_size=None, train_size=None),
             estimator=Pipeline(steps=[('reduce_dim', PCA(random_state=42)),
                                       ('classify',
                                        LogisticRegression(C=0.01,
                                                           max_iter=1000,
                                                           random_state=42))]),
             n_jobs=1,
             param_grid={'reduce_dim__n_components': [6, 8, 10, 15, 20, 25, 35,
                                                      45, 55]},
             refit=<function best_low_complexity at 0x7fb4a1b64b80>,
             return_train_score=True, scoring='accuracy')

在 Jupyter 环境中，请重新运行此单元格以显示 HTML 表示形式或信任 notebook。
在 GitHub 上，HTML 表示形式无法渲染，请尝试使用 nbviewer.org 加载此页面。

可视化结果#

我们将创建一个条形图，显示不同数量 PCA 组件的测试分数，以及表示最佳分数和一标准差阈值的水平线。

n_components = grid.cv_results_["param_reduce_dim__n_components"]
test_scores = grid.cv_results_["mean_test_score"]

# Create a polars DataFrame for better data manipulation and visualization
results_df = pl.DataFrame(
    {
        "n_components": n_components,
        "mean_test_score": test_scores,
        "std_test_score": grid.cv_results_["std_test_score"],
        "mean_train_score": grid.cv_results_["mean_train_score"],
        "std_train_score": grid.cv_results_["std_train_score"],
        "mean_fit_time": grid.cv_results_["mean_fit_time"],
        "rank_test_score": grid.cv_results_["rank_test_score"],
    }
)

# Sort by number of components
results_df = results_df.sort("n_components")

# Calculate the lower bound threshold
lower = lower_bound(grid.cv_results_)

# Get the best model information
best_index_ = grid.best_index_
best_components = n_components[best_index_]
best_score = grid.cv_results_["mean_test_score"][best_index_]

# Add a column to mark the selected model
results_df = results_df.with_columns(
    pl.when(pl.col("n_components") == best_components)
    .then(pl.lit("Selected"))
    .otherwise(pl.lit("Regular"))
    .alias("model_type")
)

# Get the number of CV splits from the results
n_splits = sum(
    1
    for key in grid.cv_results_.keys()
    if key.startswith("split") and key.endswith("test_score")
)

# Extract individual scores for each split
test_scores = np.array(
    [
        [grid.cv_results_[f"split{i}_test_score"][j] for i in range(n_splits)]
        for j in range(len(n_components))
    ]
)
train_scores = np.array(
    [
        [grid.cv_results_[f"split{i}_train_score"][j] for i in range(n_splits)]
        for j in range(len(n_components))
    ]
)

# Calculate mean and std of test scores
mean_test_scores = np.mean(test_scores, axis=1)
std_test_scores = np.std(test_scores, axis=1)

# Find best score and threshold
best_mean_score = np.max(mean_test_scores)
threshold = best_mean_score - std_test_scores[np.argmax(mean_test_scores)]

# Create a single figure for visualization
fig, ax = plt.subplots(figsize=(12, 8))

# Plot individual points
for i, comp in enumerate(n_components):
    # Plot individual test points
    plt.scatter(
        [comp] * n_splits,
        test_scores[i],
        alpha=0.2,
        color="blue",
        s=20,
        label="Individual test scores" if i == 0 else "",
    )
    # Plot individual train points
    plt.scatter(
        [comp] * n_splits,
        train_scores[i],
        alpha=0.2,
        color="green",
        s=20,
        label="Individual train scores" if i == 0 else "",
    )

# Plot mean lines with error bands
plt.plot(
    n_components,
    np.mean(test_scores, axis=1),
    "-",
    color="blue",
    linewidth=2,
    label="Mean test score",
)
plt.fill_between(
    n_components,
    np.mean(test_scores, axis=1) - np.std(test_scores, axis=1),
    np.mean(test_scores, axis=1) + np.std(test_scores, axis=1),
    alpha=0.15,
    color="blue",
)

plt.plot(
    n_components,
    np.mean(train_scores, axis=1),
    "-",
    color="green",
    linewidth=2,
    label="Mean train score",
)
plt.fill_between(
    n_components,
    np.mean(train_scores, axis=1) - np.std(train_scores, axis=1),
    np.mean(train_scores, axis=1) + np.std(train_scores, axis=1),
    alpha=0.15,
    color="green",
)

# Add threshold lines
plt.axhline(
    best_mean_score,
    color="#9b59b6",  # Purple
    linestyle="--",
    label="Best score",
    linewidth=2,
)
plt.axhline(
    threshold,
    color="#e67e22",  # Orange
    linestyle="--",
    label="Best score - 1 std",
    linewidth=2,
)

# Highlight selected model
plt.axvline(
    best_components,
    color="#9b59b6",  # Purple
    alpha=0.2,
    linewidth=8,
    label="Selected model",
)

# Set titles and labels
plt.xlabel("Number of PCA components", fontsize=12)
plt.ylabel("Score", fontsize=12)
plt.title("Model Selection: Balancing Complexity and Performance", fontsize=14)
plt.grid(True, linestyle="--", alpha=0.7)
plt.legend(
    bbox_to_anchor=(1.02, 1),
    loc="upper left",
    borderaxespad=0,
)

# Set axis properties
plt.xticks(n_components)
plt.ylim((0.85, 1.0))

# # Adjust layout
plt.tight_layout()

Model Selection: Balancing Complexity and Performance

打印结果#

我们打印有关所选模型的信息，包括其复杂度和性能。我们还使用 polars 显示所有模型的摘要表。

print("Best model selected by the one-standard-error rule:")
print(f"Number of PCA components: {best_components}")
print(f"Accuracy score: {best_score:.4f}")
print(f"Best possible accuracy: {np.max(test_scores):.4f}")
print(f"Accuracy threshold (best - 1 std): {lower:.4f}")

# Create a summary table with polars
summary_df = results_df.select(
    pl.col("n_components"),
    pl.col("mean_test_score").round(4).alias("test_score"),
    pl.col("std_test_score").round(4).alias("test_std"),
    pl.col("mean_train_score").round(4).alias("train_score"),
    pl.col("std_train_score").round(4).alias("train_std"),
    pl.col("mean_fit_time").round(3).alias("fit_time"),
    pl.col("rank_test_score").alias("rank"),
)

# Add a column to mark the selected model
summary_df = summary_df.with_columns(
    pl.when(pl.col("n_components") == best_components)
    .then(pl.lit("*"))
    .otherwise(pl.lit(""))
    .alias("selected")
)

print("\nModel comparison table:")
print(summary_df)

Best model selected by the one-standard-error rule:
Number of PCA components: 25
Accuracy score: 0.9643
Best possible accuracy: 0.9944
Accuracy threshold (best - 1 std): 0.9623

Model comparison table:
shape: (9, 8)
┌──────────────┬────────────┬──────────┬─────────────┬───────────┬──────────┬──────┬──────────┐
│ n_components ┆ test_score ┆ test_std ┆ train_score ┆ train_std ┆ fit_time ┆ rank ┆ selected │
│ ---          ┆ ---        ┆ ---      ┆ ---         ┆ ---       ┆ ---      ┆ ---  ┆ ---      │
│ i64          ┆ f64        ┆ f64      ┆ f64         ┆ f64       ┆ f64      ┆ i32  ┆ str      │
╞══════════════╪════════════╪══════════╪═════════════╪═══════════╪══════════╪══════╪══════════╡
│ 6            ┆ 0.8631     ┆ 0.0241   ┆ 0.8697      ┆ 0.0048    ┆ 0.092    ┆ 9    ┆          │
│ 8            ┆ 0.9037     ┆ 0.0192   ┆ 0.9146      ┆ 0.0028    ┆ 0.084    ┆ 8    ┆          │
│ 10           ┆ 0.9341     ┆ 0.0148   ┆ 0.9493      ┆ 0.0023    ┆ 0.058    ┆ 7    ┆          │
│ 15           ┆ 0.95       ┆ 0.0162   ┆ 0.9662      ┆ 0.0022    ┆ 0.055    ┆ 6    ┆          │
│ 20           ┆ 0.9563     ┆ 0.0144   ┆ 0.9759      ┆ 0.0019    ┆ 0.055    ┆ 5    ┆          │
│ 25           ┆ 0.9643     ┆ 0.0126   ┆ 0.9836      ┆ 0.0014    ┆ 0.052    ┆ 4    ┆ *        │
│ 35           ┆ 0.9685     ┆ 0.0115   ┆ 0.9903      ┆ 0.0013    ┆ 0.055    ┆ 3    ┆          │
│ 45           ┆ 0.9711     ┆ 0.0093   ┆ 0.9926      ┆ 0.001     ┆ 0.058    ┆ 2    ┆          │
│ 55           ┆ 0.9717     ┆ 0.0093   ┆ 0.993       ┆ 0.001     ┆ 0.061    ┆ 1    ┆          │
└──────────────┴────────────┴──────────┴─────────────┴───────────┴──────────┴──────┴──────────┘

结论#

一标准误规则帮助我们选择一个更简单的模型（更少的 PCA 组件），同时保持性能在统计上与最佳模型相当。这种方法有助于防止过拟合，提高模型的可解释性和效率。

在此示例中，我们展示了如何使用带有 GridSearchCV 的自定义 refit callable 来实现此规则。

关键要点

一标准误规则提供了一个很好的经验法则来选择更简单的模型
GridSearchCV 中的自定义 refit callable 允许灵活的模型选择策略
可视化训练和测试分数有助于识别潜在的过拟合

此方法可应用于其他需要平衡复杂度和性能的模型选择场景，或在需要特定于用例的最佳模型选择的情况下。

# Display the figure
plt.show()

脚本总运行时间： (0 minutes 18.332 seconds)

	estimator estimator: estimator object 假定它实现了 scikit-learn estimator 接口。要么 estimator 需要提供一个 ``score`` 函数，要么必须传入 ``scoring``。	Pipeline(step...m_state=42))])
	param_grid param_grid: dict or list of dictionaries 字典的键为参数名称（`str`），值为要尝试的参数设置列表，或者此类字典的列表，在这种情况下，将探索列表中每个字典所涵盖的网格。这使得可以搜索任何参数设置序列。	{'reduce_dim__n_components': [6, 8, ...]}
	scoring scoring: str, callable, list, tuple or dict, default=None 用于评估交叉验证模型在测试集上性能的策略。如果 `scoring` 代表单个分数，可以使用： - 单个字符串（参见 :ref:`scoring_string_names`）； - 返回单个值的可调用对象（参见 :ref:`scoring_callable`）； - `None`，使用 `estimator` 的 :ref:`默认评估标准 `。如果 `scoring` 代表多个分数，可以使用： - 唯一字符串的列表或元组； - 返回字典的可调用对象，其中键是度量名称，值是度量分数； - 键为度量名称，值为可调用对象的字典。有关示例，请参见 :ref:`multimetric_grid_search`。	'accuracy'
	n_jobs n_jobs: int, default=None 并行运行的作业数。 ``None`` 表示 1，除非在 :obj:`joblib.parallel_backend` 上下文中。 ``-1`` 表示使用所有处理器。有关详细信息，请参见 :term:`Glossary `。 .. versionchanged:: v0.20 `n_jobs` 默认值从 1 更改为 None	1
	refit refit: bool, str, or callable, default=True 使用找到的最佳参数在整个数据集上重新拟合 estimator。对于多重度量评估，这需要是一个 `str`，表示将用于在最后找到最佳参数以重新拟合 estimator 的 scorer。如果选择最佳 estimator 时除了最大分数还有其他考虑因素，可以将 ``refit`` 设置为一个函数，该函数在给定 ``cv_results_`` 的情况下返回选定的 ``best_index_``。在这种情况下，``best_estimator_`` 和 ``best_params_`` 将根据返回的 ``best_index_`` 进行设置，而 ``best_score_`` 属性将不可用。重新拟合的 estimator 可通过 ``best_estimator_`` 属性获得，并允许直接对此 ``GridSearchCV`` 实例使用 ``predict``。对于多重度量评估，属性 ``best_index_``、``best_score_`` 和 ``best_params_`` 仅在设置了 ``refit`` 且所有这些属性都将根据此特定 scorer 确定时才可用。有关多重度量评估的更多信息，请参见 ``scoring`` 参数。请参见 :ref:`sphx_glr_auto_examples_model_selection_plot_grid_search_digits.py` 以了解如何使用 `refit` 通过可调用对象设计自定义选择策略。请参见 :ref:`this example ` 以了解如何使用 ``refit=callable`` 来平衡模型复杂度和交叉验证分数。 .. versionchanged:: 0.20 添加了对可调用对象的支持。	<function bes...x7fb4a1b64b80>
	cv cv: int, cross-validation generator or an iterable, default=None 确定交叉验证拆分策略。 cv 的可能输入包括： - None，使用默认的 5 折交叉验证， - 整数，指定 `(Stratified)KFold` 中的折数， - :term:`CV splitter`， - 可迭代对象，生成 (train, test) 拆分作为索引数组。对于整数/None 输入，如果 estimator 是一个分类器且 ``y`` 是二进制或多类，则使用 :class:`StratifiedKFold`。在所有其他情况下，使用 :class:`KFold`。实例化这些 splitter 时 `shuffle=False`，因此拆分在不同调用中将是相同的。有关此处可使用的各种交叉验证策略，请参见 :ref:`User Guide `。 .. versionchanged:: 0.22 None 时 ``cv`` 默认值从 3 折更改为 5 折。	ShuffleSplit(...ain_size=None)
	verbose verbose: int 控制详细程度：值越高，消息越多。 - >1 : 显示每个折叠和参数候选项的计算时间； - >2 : 也显示分数； - >3 : 显示折叠和候选项参数索引以及计算的开始时间。	0
	pre_dispatch pre_dispatch: int, or str, default='2n_jobs' 控制并行执行期间调度的作业数。减少此数字有助于避免在调度作业数多于 CPU 可处理数时内存消耗激增。此参数可以是： - None，在这种情况下，所有作业会立即创建和生成。用于轻量级和快速运行的作业，以避免因按需生成作业而导致的延迟。 - 一个 int，给出生成的总作业的确切数量。 - 一个 str，给出作为 n_jobs 函数的表达式，例如 '2n_jobs'。	'2*n_jobs'
	error_score error_score: 'raise' or numeric, default=np.nan 如果 estimator 拟合发生错误，分配给分数的值。如果设置为 'raise'，则会引发错误。如果给定数字值，则会引发 FitFailedWarning。此参数不影响 refit 步骤，refit 步骤始终会引发错误。	nan
	return_train_score return_train_score: bool, default=False 如果为 ``False``，则 ``cv_results_`` 属性将不包括训练分数。计算训练分数用于深入了解不同参数设置如何影响过拟合/欠拟合的权衡。但是，计算训练集上的分数可能会耗费计算资源，并且对于选择产生最佳泛化性能的参数不是严格必需的。 .. versionadded:: 0.19 .. versionchanged:: 0.21 默认值从 ``True`` 更改为 ``False``	True

	n_components n_components: int, float or 'mle', default=None 要保留的组件数。如果未设置 n_components，则保留所有组件:: n_components == min(n_samples, n_features) 如果 ``n_components == 'mle'`` 且 ``svd_solver == 'full'``，则使用 Minka 的 MLE 来猜测维度。使用 ``n_components == 'mle'`` 会将 ``svd_solver == 'auto'`` 解释为 ``svd_solver == 'full'``。如果 ``0 < n_components < 1`` 且 ``svd_solver == 'full'``，则选择组件数，使得需要解释的方差量大于由 n_components 指定的百分比。如果 ``svd_solver == 'arpack'``，则组件数必须严格小于 n_features 和 n_samples 中的最小值。因此，None 情况会导致:: n_components == min(n_samples, n_features) - 1	25
	copy copy: bool, default=True 如果为 False，则传递给 fit 的数据将被覆盖，并且运行 fit(X).transform(X) 将不会产生预期的结果，请改用 fit_transform(X)。	True
	whiten whiten: bool, default=False 如果为 True（默认为 False），则将 `components_` 向量乘以 n_samples 的平方根，然后除以奇异值，以确保不相关的输出具有单位分量方差。白化会从转换后的信号中删除一些信息（组件的相对方差尺度），但有时可以通过使下游 estimator 的数据尊重一些硬性假设来提高预测准确性。	False
	svd_solver svd_solver: {'auto', 'full', 'covariance_eigh', 'arpack', 'randomized'}, default='auto' "auto" : solver 是根据 `X.shape` 和 `n_components` 通过默认 'auto' 策略选择的：如果输入数据具有少于 1000 个特征且样本数多于特征数的 10 倍，则使用 "covariance_eigh" solver。否则，如果输入数据大于 500x500 且要提取的组件数低于数据最小维度的 80%，则选择更高效的 "randomized" 方法。否则计算精确的 "full" SVD 并在之后可选地截断。 "full" : 运行精确的完整 SVD，通过 `scipy.linalg.svd` 调用标准 LAPACK solver，并通过后处理选择组件。 "covariance_eigh" : 预计算协方差矩阵（在中心化数据上），在协方差矩阵上运行经典特征值分解，通常使用 LAPACK，并通过后处理选择组件。此 solver 对于 n_samples >> n_features 且 n_features 较小的情况非常高效。但是，对于大的 n_features（需要大量内存来实例化协方差矩阵），它不可行。另请注意，与 "full" solver 相比，此 solver 有效地使条件数加倍，因此数值稳定性较差（例如，对于具有大范围奇异值的输入数据）。 "arpack" : 运行截断为 `n_components` 的 SVD，通过 `scipy.sparse.linalg.svds` 调用 ARPACK solver。它要求严格 ``0 < n_components < min(X.shape)``。 "randomized" : 运行 Halko 等人的方法进行随机 SVD。 .. versionadded:: 0.18.0 .. versionchanged:: 1.5 添加了 'covariance_eigh' solver。	'auto'
	tol tol: float, default=0.0 svd_solver == 'arpack' 计算的奇异值容差。必须在 [0.0, infinity) 范围内。 .. versionadded:: 0.18.0	0.0
	iterated_power iterated_power: int or 'auto', default='auto' svd_solver == 'randomized' 计算的幂方法迭代次数。必须在 [0, infinity) 范围内。 .. versionadded:: 0.18.0	'auto'
	n_oversamples n_oversamples: int, default=10 此参数仅在 `svd_solver="randomized"` 时相关。它对应于对 `X` 的范围进行采样的附加随机向量数，以确保适当的条件。有关详细信息，请参见 :func:`~sklearn.utils.extmath.randomized_svd`。 .. versionadded:: 1.1	10
	power_iteration_normalizer power_iteration_normalizer: {'auto', 'QR', 'LU', 'none'}, default='auto' 随机 SVD solver 的幂迭代归一化器。 ARPACK 不使用。有关详细信息，请参见 :func:`~sklearn.utils.extmath.randomized_svd`。 .. versionadded:: 1.1	'auto'
	random_state random_state: int, RandomState instance or None, default=None 当使用 'arpack' 或 'randomized' solver 时使用。传入 int 值以在多次函数调用中获得可重现的结果。有关详细信息，请参见 :term:`Glossary `。 .. versionadded:: 0.18.0	42

	penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' 指定惩罚项的范数： - `None`: 不添加惩罚项； - `'l2'`: 添加 L2 惩罚项，这是默认选择； - `'l1'`: 添加 L1 惩罚项； - `'elasticnet'`: L1 和 L2 惩罚项均添加。 .. warning:: 某些惩罚项可能不适用于某些 solver。请参见下面的参数 `solver`，以了解惩罚项与 solver 之间的兼容性。 .. versionadded:: 0.19 使用 SAGA solver 的 l1 惩罚（允许 'multinomial' + L1） .. deprecated:: 1.8 `penalty` 在版本 1.8 中已弃用，并将在 1.10 中删除。请改用 `l1_ratio`。`l1_ratio=0` 表示 `penalty='l2'`，`l1_ratio=1` 表示 `penalty='l1'`， `l1_ratio` 设置为 0 到 1 之间的任意浮点数表示 `'penalty='elasticnet'`。	'deprecated'
	C C: float, default=1.0 正则化强度的倒数；必须是正浮点数。与支持向量机类似，较小的值指定更强的正则化。`C=np.inf` 导致未惩罚的 Logistic Regression。有关使用 L1 惩罚调整 `C` 参数效果的视觉示例，请参见： :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`。	0.01
	l1_ratio l1_ratio: float, default=0.0 Elastic-Net 混合参数，`0 <= l1_ratio <= 1`。设置 `l1_ratio=1` 给出纯 L1 惩罚，设置 `l1_ratio=0` 给出纯 L2 惩罚。 0 到 1 之间的任何值给出形式为 `l1_ratio * L1 + (1 - l1_ratio) * L2` 的 Elastic-Net 惩罚。 .. warning:: 某些 `l1_ratio` 值（即某些惩罚项）可能不适用于某些 solver。请参见下面的参数 `solver`，以了解惩罚项与 solver 之间的兼容性。 .. versionchanged:: 1.8 默认值从 None 更改为 0.0。 .. deprecated:: 1.8 `None` 已弃用，并将在版本 1.10 中删除。始终使用 `l1_ratio` 来指定惩罚类型。	0.0
	dual dual: bool, default=False 对偶（受限）或原始（正则化，另请参见 :ref:`this equation `) 公式。对偶公式仅针对 liblinear solver 的 l2 惩罚实现。当 n_samples > n_features 时，فضل `dual=False`。	False
	tol tol: float, default=1e-4 停止标准的容差。	0.0001
	fit_intercept fit_intercept: bool, default=True 指定是否应将常量（也称为偏差或截距）添加到决策函数。	True
	intercept_scaling intercept_scaling: float, default=1 仅当使用 solver `liblinear` 且 `self.fit_intercept` 设置为 `True` 时有用。在这种情况下，`x` 变为 `[x, self.intercept_scaling]`，即一个常数值等于 `intercept_scaling` 的“合成”特征被附加到实例向量。截距变为 ``intercept_scaling * synthetic_feature_weight``。 .. note:: 合成特征权重像所有其他特征一样受到 L1 或 L2 正则化。为了减少正则化对合成特征权重（以及因此对截距）的影响，必须增加 `intercept_scaling`。	1
	class_weight class_weight: dict or 'balanced', default=None 与类关联的权重，形式为 ``{class_label: weight}``。如果未给出，则所有类假定权重为一。 “balanced”模式使用 y 的值根据输入数据中类频率的倒数自动调整权重，计算方式为 ``n_samples / (n_classes * np.bincount(y))``。请注意，如果指定了 sample_weight（通过 fit 方法传入），则这些权重将与 sample_weight 相乘。 .. versionadded:: 0.17 class_weight='balanced'	None
	random_state random_state: int, RandomState instance, default=None 当 ``solver`` == 'sag'、'saga' 或 'liblinear' 时用于打乱数据。有关详细信息，请参见 :term:`Glossary `。	42
	solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' 优化问题中使用的算法。默认为 'lbfgs'。要选择 solver，您可能需要考虑以下方面： - 'lbfgs' 是一个很好的默认 solver，因为它对各种问题都表现良好。 - 对于 :term:`multiclass` 问题 (`n_classes >= 3`)，除 'liblinear' 外的所有 solver 都最小化完整的 multinomial loss，'liblinear' 将引发错误。 - 'newton-cholesky' 是 `n_samples` >> `n_features * n_classes` 的一个很好的选择，尤其是在具有稀有类别的 one-hot encoded 分类特征的情况下。请注意，此 solver 的内存使用量与 `n_features * n_classes` 呈二次关系，因为它明确计算完整的 Hessian 矩阵。 - 对于小型数据集，'liblinear' 是一个很好的选择，而 'sag' 和 'saga' 对于大型数据集更快； - 'liblinear' 默认只能处理二进制分类。要应用多类设置的 one-versus-rest 方案，可以将其与 :class:`~sklearn.multiclass.OneVsRestClassifier` 包装在一起。 .. warning:: 算法的选择取决于所选的惩罚项（`l1_ratio=0` 表示 L2 惩罚，`l1_ratio=1` 表示 L1 惩罚， `0 < l1_ratio < 1` 表示 Elastic-Net）以及对（多项式）多类支持： ================= ======================== ====================== solver l1_ratio multinomial multiclass ================= ======================== ====================== 'lbfgs' l1_ratio=0 yes 'liblinear' l1_ratio=1 or l1_ratio=0 no 'newton-cg' l1_ratio=0 yes 'newton-cholesky' l1_ratio=0 yes 'sag' l1_ratio=0 yes 'saga' 0<=l1_ratio<=1 yes ================= ======================== ====================== .. note:: 'sag' 和 'saga' 的快速收敛仅在具有大致相同尺度的特征上得到保证。您可以使用 :mod:`sklearn.preprocessing` 中的 scaler 对数据进行预处理。 .. seealso:: 有关 :class:`LogisticRegression` 的更多信息，请参阅 :ref:`User Guide `，特别是 :ref:`Table ` 总结了 solver/惩罚支持。	'lbfgs'
	max_iter max_iter: int, default=100 solver 收敛所需的最大迭代次数。	1000
	verbose verbose: int, default=0 对于 liblinear 和 lbfgs solver，将 verbose 设置为任意正数以显示详细信息。	0
	warm_start warm_start: bool, default=False 设置为 True 时，重用上次调用 fit 的解决方案作为初始化，否则，擦除上一个解决方案。对于 liblinear solver 无用。有关详细信息，请参见 :term:`the Glossary `。 .. versionadded:: 0.17 warm_start 支持 lbfgs、newton-cg、sag、saga solver。	False
	n_jobs n_jobs: int, default=None 不产生任何效果。 .. deprecated:: 1.8 `n_jobs` 在版本 1.8 中已弃用，并将在 1.10 中删除。	None