等距回归#

等距回归在生成数据（具有同方差均匀噪声的非线性单调趋势）上的示例图示。

等距回归算法在训练数据上寻找函数的非递减近似值，同时最小化均方误差。这种非参数模型的好处在于，除了单调性之外，它不假定目标函数的任何形状。为进行比较，还展示了线性回归。

右侧的图表显示了通过阈值点线性插值得到的模型预测函数。阈值点是训练输入观测值的子集，其匹配的目标值是通过等距非参数拟合计算的。

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.collections import LineCollection

from sklearn.isotonic import IsotonicRegression
from sklearn.linear_model import LinearRegression
from sklearn.utils import check_random_state

n = 100
x = np.arange(n)
rs = check_random_state(0)
y = rs.randint(-50, 50, size=(n,)) + 50.0 * np.log1p(np.arange(n))

拟合 IsotonicRegression 和 LinearRegression 模型

ir = IsotonicRegression(out_of_bounds="clip")
y_ = ir.fit_transform(x, y)

lr = LinearRegression()
lr.fit(x[:, np.newaxis], y)  # x needs to be 2d for LinearRegression

LinearRegression()

在 Jupyter 环境中，请重新运行此单元格以显示 HTML 表示形式或信任 notebook。
在 GitHub 上，HTML 表示形式无法渲染，请尝试使用 nbviewer.org 加载此页面。

绘制结果

segments = [[[i, y[i]], [i, y_[i]]] for i in range(n)]
lc = LineCollection(segments, zorder=0)
lc.set_array(np.ones(len(y)))
lc.set_linewidths(np.full(n, 0.5))

fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(12, 6))

ax0.plot(x, y, "C0.", markersize=12)
ax0.plot(x, y_, "C1.-", markersize=12)
ax0.plot(x, lr.predict(x[:, np.newaxis]), "C2-")
ax0.add_collection(lc)
ax0.legend(("Training data", "Isotonic fit", "Linear fit"), loc="lower right")
ax0.set_title("Isotonic regression fit on noisy data (n=%d)" % n)

x_test = np.linspace(-10, 110, 1000)
ax1.plot(x_test, ir.predict(x_test), "C1-")
ax1.plot(ir.X_thresholds_, ir.y_thresholds_, "C1.", markersize=12)
ax1.set_title("Prediction function (%d thresholds)" % len(ir.X_thresholds_))

plt.show()

Isotonic regression fit on noisy data (n=100), Prediction function (36 thresholds)

请注意，我们明确地将 out_of_bounds="clip" 传递给 IsotonicRegression 的构造函数，以控制模型在训练集中观察到的数据范围之外进行外推的方式。这种“剪辑”外推可以在右侧的决策函数图上看到。

脚本总运行时间： (0 分钟 0.119 秒)

	fit_intercept fit_intercept: bool, default=True 是否为该模型计算截距。如果设置为 False，则计算中不使用截距（即数据应已居中）。	True
	copy_X copy_X: bool, default=True 如果为 True，X 将被复制；否则，它可能会被覆盖。	True
	tol tol: float, default=1e-6 解 (`coef_`) 的精度由 `tol` 确定，`tol` 为 `lsqr` 解算器指定了不同的收敛标准。当拟合稀疏训练数据时，`tol` 被设置为 :func:`scipy.sparse.linalg.lsqr` 的 `atol` 和 `btol`。当拟合密集数据时，此参数无效。 .. versionadded:: 1.7	1e-06
	n_jobs n_jobs: int, default=None 用于计算的作业数。这仅在问题足够大的情况下才能提供加速，即首先 `n_targets > 1`，其次 `X` 是稀疏的，或者 `positive` 设置为 `True`。在 :obj:`joblib.parallel_backend` 上下文之外，`None` 表示 1。`-1` 表示使用所有处理器。有关更多详细信息，请参阅 :term:`Glossary `。	None
	positive positive: bool, default=False 当设置为 ``True`` 时，强制系数为正数。此选项仅支持密集数组。要比较具有正系数约束的线性回归模型和没有此类约束的线性回归模型，请参阅 :ref:`sphx_glr_auto_examples_linear_model_plot_nnls.py`。 .. versionadded:: 0.24	False

等距回归#

本页