GraphicalLasso#

class sklearn.covariance.GraphicalLasso(alpha=0.01, *, mode='cd', covariance=None, tol=0.0001, enet_tol=0.0001, max_iter=100, verbose=False, eps=np.float64(2.220446049250313e-16), assume_centered=False)[source]#

使用 l1 惩罚估算器进行稀疏逆协方差估计。

For a usage example see Visualizing the stock market structure.

Read more in the User Guide.

Changed in version v0.20: GraphLasso has been renamed to GraphicalLasso

参数:

alphafloat, default=0.01: The regularization parameter: the higher alpha, the more regularization, the sparser the inverse covariance. Range is (0, inf].
mode{‘cd’, ‘lars’}, default=’cd’: The Lasso solver to use: coordinate descent or LARS. Use LARS for very sparse underlying graphs, where p > n. Elsewhere prefer cd which is more numerically stable.
covariance“precomputed”, default=None: If covariance is “precomputed”, the input data in fit is assumed to be the covariance matrix. If None, the empirical covariance is estimated from the data X.

在版本 1.3 中新增。
tolfloat, default=1e-4: The tolerance to declare convergence: if the dual gap goes below this value, iterations are stopped. Range is (0, inf].
enet_tolfloat, default=1e-4: The tolerance for the elastic net solver used to calculate the descent direction. This parameter controls the accuracy of the search direction for a given column update, not of the overall parameter estimate. Only used for mode=’cd’. Range is (0, inf].
max_iterint, default=100: 最大迭代次数。
verbosebool, default=False: If verbose is True, the objective function and dual gap are plotted at each iteration.
epsfloat, default=eps: The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Default is np.finfo(np.float64).eps.

在版本 1.3 中新增。
assume_centeredbool, default=False: If True, data are not centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False, data are centered before computation.

属性:

location_ndarray of shape (n_features,): 估计的位置，即估计的均值。
covariance_ndarray of shape (n_features, n_features): Estimated covariance matrix
precision_ndarray of shape (n_features, n_features): Estimated pseudo inverse matrix.
n_iter_int: 运行的迭代次数。
costs_list of (objective, dual_gap) pairs: The list of values of the objective function and the dual gap at each iteration. Returned only if return_costs is True.

在版本 1.3 中新增。
n_features_in_int: 在拟合期间看到的特征数。

0.24 版本新增。
feature_names_in_shape 为 (n_features_in_,) 的 ndarray: 在 fit 期间看到的特征名称。仅当 X 具有全部为字符串的特征名称时才定义。

1.0 版本新增。

另请参阅

graphical_lasso: L1 惩罚协方差估算器。
GraphicalLassoCV: Sparse inverse covariance with cross-validated choice of the l1 penalty.

示例

>>> import numpy as np
>>> from sklearn.covariance import GraphicalLasso
>>> true_cov = np.array([[0.8, 0.0, 0.2, 0.0],
...                      [0.0, 0.4, 0.0, 0.0],
...                      [0.2, 0.0, 0.3, 0.1],
...                      [0.0, 0.0, 0.1, 0.7]])
>>> np.random.seed(0)
>>> X = np.random.multivariate_normal(mean=[0, 0, 0, 0],
...                                   cov=true_cov,
...                                   size=200)
>>> cov = GraphicalLasso().fit(X)
>>> np.around(cov.covariance_, decimals=3)
array([[0.816, 0.049, 0.218, 0.019],
       [0.049, 0.364, 0.017, 0.034],
       [0.218, 0.017, 0.322, 0.093],
       [0.019, 0.034, 0.093, 0.69 ]])
>>> np.around(cov.location_, decimals=3)
array([0.073, 0.04 , 0.038, 0.143])

error_norm(comp_cov, norm='frobenius', scaling=True, squared=True)[source]#

计算两个协方差估计器之间的均方误差。

参数:

comp_covarray-like of shape (n_features, n_features): 用于比较的协方差。
norm{“frobenius”, “spectral”}, default=”frobenius”: 用于计算误差的范数类型。可用的误差类型：- 'frobenius'（默认值）：sqrt(tr(A^t.A)) - 'spectral'：sqrt(max(eigenvalues(A^t.A))，其中 A 是误差 (comp_cov - self.covariance_)。
scalingbool, default=True: 如果为 True（默认值），则将平方误差范数除以 n_features。如果为 False，则不重新缩放平方误差范数。
squaredbool, default=True: 是否计算平方误差范数或误差范数。如果为 True（默认值），则返回平方误差范数。如果为 False，则返回误差范数。

返回:

resultfloat: self 和 comp_cov 协方差估计器之间的均方误差（根据 Frobenius 范数）。

fit(X, y=None)[source]#

Fit the GraphicalLasso model to X.

参数:

Xshape 为 (n_samples, n_features) 的 array-like: Data from which to compute the covariance estimate.
y被忽略: 未使用，按照惯例为保持 API 一致性而存在。

返回:

selfobject: 返回实例本身。

get_metadata_routing()[source]#

获取此对象的元数据路由。

请查阅用户指南，了解路由机制如何工作。

返回:

routingMetadataRequest: 封装路由信息的 MetadataRequest。

get_params(deep=True)[source]#

获取此估计器的参数。

参数:

deepbool, default=True: 如果为 True，将返回此估计器以及包含的子对象（如果它们是估计器）的参数。

返回:

paramsdict: 参数名称映射到其值。

get_precision()[source]#

获取精度矩阵。

返回:

precision_array-like of shape (n_features, n_features): 与当前协方差对象关联的精度矩阵。

mahalanobis(X)[source]#

计算给定观测值的平方马哈拉诺比斯距离。

有关离群值如何影响马哈拉诺比斯距离的详细示例，请参阅稳健协方差估计和马哈拉诺比斯距离相关性。

参数:

Xshape 为 (n_samples, n_features) 的 array-like: 观测值，我们计算其马哈拉诺比斯距离。假定观测值来自与用于拟合的数据相同的分布。

返回:

distndarray of shape (n_samples,): 观测值的平方马哈拉诺比斯距离。

score(X_test, y=None)[source]#

计算 X_test 在估计的高斯模型下的对数似然。

高斯模型由其均值和协方差矩阵定义，分别由 self.location_ 和 self.covariance_ 表示。

参数:

X_testarray-like of shape (n_samples, n_features): 我们计算其似然的测试数据，其中 n_samples 是样本数，n_features 是特征数。X_test 假定来自与用于拟合的数据相同的分布（包括中心化）。
y被忽略: 未使用，按照惯例为保持 API 一致性而存在。

返回:

resfloat: X_test 的对数似然，其中 self.location_ 和 self.covariance_ 分别作为高斯模型均值和协方差矩阵的估计器。

set_params(**params)[source]#

设置此估计器的参数。

此方法适用于简单的估计器以及嵌套对象（如 Pipeline）。后者具有 <component>__<parameter> 形式的参数，以便可以更新嵌套对象的每个组件。

参数:

**paramsdict: 估计器参数。

返回:

selfestimator instance: 估计器实例。

GraphicalLasso#

本页