ridge_regression#
- sklearn.linear_model.ridge_regression(X, y, alpha, *, sample_weight=None, solver='auto', max_iter=None, tol=0.0001, verbose=0, positive=False, random_state=None, return_n_iter=False, return_intercept=False, check_input=True)[source]#
通过正规方程组方法求解岭方程。
在用户指南中阅读更多内容。
- 参数:
- X{array-like, sparse matrix, LinearOperator} of shape (n_samples, n_features)
训练数据。
- yshape 为 (n_samples,) 或 (n_samples, n_targets) 的 array-like
目标值。
- alphafloat or array-like of shape (n_targets,)
Constant that multiplies the L2 term, controlling regularization strength.
alphamust be a non-negative float i.e. in[0, inf).When
alpha = 0, the objective is equivalent to ordinary least squares, solved by theLinearRegressionobject. For numerical reasons, usingalpha = 0with theRidgeobject is not advised. Instead, you should use theLinearRegressionobject.If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number.
- sample_weightfloat or array-like of shape (n_samples,), default=None
Individual weights for each sample. If given a float, every sample will have the same weight. If sample_weight is not None and solver=’auto’, the solver will be set to ‘cholesky’.
版本0.17中新增。
- solver{‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’, ‘lbfgs’}, default=’auto’
计算例程中使用的求解器
‘auto’根据数据类型自动选择求解器。
‘svd’使用X的奇异值分解来计算岭系数。它是最稳定的求解器,对于奇异矩阵比‘cholesky’更稳定,代价是速度较慢。
‘cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution via a Cholesky decomposition of dot(X.T, X)
‘sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set
tolandmax_iter).‘lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.
‘sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
‘lbfgs’ uses L-BFGS-B algorithm implemented in
scipy.optimize.minimize. It can be used only whenpositiveis True.
All solvers except ‘svd’ support both dense and sparse data. However, only ‘lsqr’, ‘sag’, ‘sparse_cg’, and ‘lbfgs’ support sparse input when
fit_interceptis True.版本0.17中新增:随机平均梯度下降求解器。
版本0.19中新增:SAGA求解器。
- max_iterint, default=None
Maximum number of iterations for conjugate gradient solver. For the ‘sparse_cg’ and ‘lsqr’ solvers, the default value is determined by scipy.sparse.linalg. For ‘sag’ and saga solver, the default value is 1000. For ‘lbfgs’ solver, the default value is 15000.
- tolfloat, default=1e-4
Precision of the solution. Note that
tolhas no effect for solvers ‘svd’ and ‘cholesky’.版本1.2中更改:默认值从1e-3更改为1e-4,以便与其他线性模型保持一致。
- verboseint, default=0
Verbosity level. Setting verbose > 0 will display additional information depending on the solver used.
- positivebool, default=False
当设置为
True时,强制系数为正数。在这种情况下仅支持‘lbfgs’求解器。- random_stateint, RandomState instance, default=None
当
solver== ‘sag’或‘saga’时用于打乱数据。有关详细信息,请参阅词汇表。- return_n_iterbool, default=False
If True, the method also returns
n_iter, the actual number of iteration performed by the solver.版本0.17中新增。
- return_interceptbool, default=False
If True and if X is sparse, the method also returns the intercept, and the solver is automatically changed to ‘sag’. This is only a temporary fix for fitting the intercept with sparse data. For dense data, use sklearn.linear_model._preprocess_data before your regression.
版本0.17中新增。
- check_inputbool, default=True
If False, the input arrays X and y will not be checked.
0.21 版本新增。
- 返回:
- coefndarray of shape (n_features,) or (n_targets, n_features)
权重向量。
- n_iterint, optional
The actual number of iteration performed by the solver. Only returned if
return_n_iteris True.- interceptfloat or ndarray of shape (n_targets,)
The intercept of the model. Only returned if
return_interceptis True and if X is a scipy sparse array.
注意事项
This function won’t compute the intercept.
Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to
1 / (2C)in other linear models such asLogisticRegressionorLinearSVC. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number.示例
>>> import numpy as np >>> from sklearn.datasets import make_regression >>> from sklearn.linear_model import ridge_regression >>> rng = np.random.RandomState(0) >>> X = rng.randn(100, 4) >>> y = 2.0 * X[:, 0] - 1.0 * X[:, 1] + 0.1 * rng.standard_normal(100) >>> coef, intercept = ridge_regression(X, y, alpha=1.0, return_intercept=True, ... random_state=0) >>> coef array([ 1.97, -1., -2.69e-3, -9.27e-4 ]) >>> intercept np.float64(-.0012)