注意
转到末尾 下载完整的示例代码。或通过JupyterLite或Binder在您的浏览器中运行此示例
非负最小二乘法#
在这个例子中,我们拟合一个线性模型,对回归系数施加正约束,并将估计的系数与经典线性回归进行比较。
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import r2_score
生成一些随机数据
np.random.seed(42)
n_samples, n_features = 200, 50
X = np.random.randn(n_samples, n_features)
true_coef = 3 * np.random.randn(n_features)
# Threshold coefficients to render them non-negative
true_coef[true_coef < 0] = 0
y = np.dot(X, true_coef)
# Add some noise
y += 5 * np.random.normal(size=(n_samples,))
将数据分成训练集和测试集
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
拟合非负最小二乘法。
from sklearn.linear_model import LinearRegression
reg_nnls = LinearRegression(positive=True)
y_pred_nnls = reg_nnls.fit(X_train, y_train).predict(X_test)
r2_score_nnls = r2_score(y_test, y_pred_nnls)
print("NNLS R2 score", r2_score_nnls)
NNLS R2 score 0.8225220806196525
拟合OLS。
reg_ols = LinearRegression()
y_pred_ols = reg_ols.fit(X_train, y_train).predict(X_test)
r2_score_ols = r2_score(y_test, y_pred_ols)
print("OLS R2 score", r2_score_ols)
OLS R2 score 0.7436926291700353
比较OLS和NNLS的回归系数,我们可以观察到它们高度相关(虚线是恒等关系),但非负约束将一些系数缩小到0。非负最小二乘法固有地产生稀疏结果。
fig, ax = plt.subplots()
ax.plot(reg_ols.coef_, reg_nnls.coef_, linewidth=0, marker=".")
low_x, high_x = ax.get_xlim()
low_y, high_y = ax.get_ylim()
low = max(low_x, low_y)
high = min(high_x, high_y)
ax.plot([low, high], [low, high], ls="--", c=".3", alpha=0.5)
ax.set_xlabel("OLS regression coefficients", fontweight="bold")
ax.set_ylabel("NNLS regression coefficients", fontweight="bold")
Text(55.847222222222214, 0.5, 'NNLS regression coefficients')
脚本总运行时间:(0分钟0.068秒)
相关示例
模型正则化对训练和测试误差的影响
逻辑函数
机器学习未能推断因果关系
用于稀疏信号的基于 L1 的模型