注意
前往结尾下载完整的示例代码。或通过JupyterLite或Binder在您的浏览器中运行此示例
使用多任务Lasso进行联合特征选择#
多任务Lasso允许联合拟合多个回归问题,强制要求所选择的特征在所有任务中保持一致。本例模拟了顺序测量,每个任务代表一个时间点,相关特征的幅度随时间变化,但特征本身保持不变。多任务Lasso强制要求在一个时间点选择的特征在所有时间点都被选择。这使得Lasso的特征选择更加稳定。
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
生成数据#
import numpy as np
rng = np.random.RandomState(42)
# Generate some 2D coefficients with sine waves with random frequency and phase
n_samples, n_features, n_tasks = 100, 30, 40
n_relevant_features = 5
coef = np.zeros((n_tasks, n_features))
times = np.linspace(0, 2 * np.pi, n_tasks)
for k in range(n_relevant_features):
coef[:, k] = np.sin((1.0 + rng.randn(1)) * times + 3 * rng.randn(1))
X = rng.randn(n_samples, n_features)
Y = np.dot(X, coef.T) + rng.randn(n_samples, n_tasks)
拟合模型#
from sklearn.linear_model import Lasso, MultiTaskLasso
coef_lasso_ = np.array([Lasso(alpha=0.5).fit(X, y).coef_ for y in Y.T])
coef_multi_task_lasso_ = MultiTaskLasso(alpha=1.0).fit(X, Y).coef_
绘制支持向量和时间序列#
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(8, 5))
plt.subplot(1, 2, 1)
plt.spy(coef_lasso_)
plt.xlabel("Feature")
plt.ylabel("Time (or Task)")
plt.text(10, 5, "Lasso")
plt.subplot(1, 2, 2)
plt.spy(coef_multi_task_lasso_)
plt.xlabel("Feature")
plt.ylabel("Time (or Task)")
plt.text(10, 5, "MultiTaskLasso")
fig.suptitle("Coefficient non-zero location")
feature_to_plot = 0
plt.figure()
lw = 2
plt.plot(coef[:, feature_to_plot], color="seagreen", linewidth=lw, label="Ground truth")
plt.plot(
coef_lasso_[:, feature_to_plot], color="cornflowerblue", linewidth=lw, label="Lasso"
)
plt.plot(
coef_multi_task_lasso_[:, feature_to_plot],
color="gold",
linewidth=lw,
label="MultiTaskLasso",
)
plt.legend(loc="upper center")
plt.axis("tight")
plt.ylim([-1.1, 1.1])
plt.show()
脚本总运行时间: (0 分钟 0.320 秒)
相关示例
用于稀疏信号的基于 L1 的模型
稠密数据和稀疏数据上的Lasso
Lasso、Lasso-LARS和Elastic Net路径
Lasso、Lasso-LARS和Elastic Net路径
Lasso模型选择:AIC-BIC / 交叉验证