spectral_clustering#

sklearn.cluster.spectral_clustering(affinity, *, n_clusters=8, n_components=None, eigen_solver=None, random_state=None, n_init=10, eigen_tol='auto', assign_labels='kmeans', verbose=False)[source]#

将聚类应用于归一化拉普拉斯算子的投影。

In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance, when clusters are nested circles on the 2D plane.

If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts [1], [2].

Read more in the User Guide.

参数:

affinity{array-like, sparse matrix} of shape (n_samples, n_samples)

The affinity matrix describing the relationship of the samples to embed. Must be symmetric.

Possible examples

adjacency matrix of a graph,
heat kernel of the pairwise distance matrix of the samples,
symmetric k-nearest neighbours connectivity matrix of the samples.

n_clustersint, default=None

Number of clusters to extract.

n_componentsint, default=n_clusters

Number of eigenvectors to use for the spectral embedding.

eigen_solver{None, ‘arpack’, ‘lobpcg’, or ‘amg’}

The eigenvalue decomposition method. If None then 'arpack' is used. See [4] for more details regarding 'lobpcg'. Eigensolver 'amg' runs 'lobpcg' with optional Algebraic MultiGrid preconditioning and requires pyamg to be installed. It can be faster on very large sparse problems [6] and [7].

random_stateint, RandomState instance, default=None

A pseudo random number generator used for the initialization of the lobpcg eigenvectors decomposition when eigen_solver == 'amg', and for the K-Means initialization. Use an int to make the results deterministic across calls (See Glossary).

注意

用于初始化 lobpcg 特征向量分解的伪随机数生成器，当 eigen_solver == 'amg' 时，以及用于 K-Means 初始化。使用整数使结果在多次调用中具有确定性（请参阅词汇表）。

n_initint, default=10

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. Only used if assign_labels='kmeans'.

使用最近邻方法构造亲和力矩阵时要使用的邻居数。对于 affinity='rbf'，将被忽略。

Stopping criterion for eigendecomposition of the Laplacian matrix. If eigen_tol="auto" then the passed tolerance will depend on the eigen_solver

拉普拉斯矩阵特征分解的停止准则。如果 eigen_tol="auto"，则传递的容差将取决于 eigen_solver
如果 eigen_solver="arpack"，则 eigen_tol=0.0；

如果 eigen_solver="lobpcg" 或 eigen_solver="amg"，则 eigen_tol=None，它将配置底层 lobpcg 求解器根据其启发式方法自动解析值。有关详细信息，请参阅 scipy.sparse.linalg.lobpcg。

请注意，当使用 eigen_solver="lobpcg" 或 eigen_solver="amg" 时，tol<1e-5 的值可能导致收敛问题，应避免使用。

assign_labels{‘kmeans’, ‘discretize’, ‘cluster_qr’}, default=’kmeans’

The strategy to use to assign labels in the embedding space. There are three ways to assign labels after the Laplacian embedding. k-means can be applied and is a popular choice. But it can also be sensitive to initialization. Discretization is another approach which is less sensitive to random initialization [3]. The cluster_qr method [5] directly extracts clusters from eigenvectors in spectral clustering. In contrast to k-means and discretization, cluster_qr has no tuning parameters and is not an iterative method, yet may outperform k-means and discretization in terms of both quality and speed. For a detailed comparison of clustering strategies, refer to the following example: Segmenting the picture of greek coins in regions.

Changed in version 1.1: Added new labeling method ‘cluster_qr’.

verbosebool, default=False

Verbosity mode.

0.24 版本新增。

返回:

labelsarray of integers, shape: n_samples: The labels of the clusters.

注意事项

The graph should contain only one connected component, elsewhere the results make little sense.

This algorithm solves the normalized cut for k=2: it is a normalized spectral clustering.

References

[1]

如果安装了 pyamg 包，则使用它：这会大大加快计算速度。

[2]

归一化割和图像分割，2000 Jianbo Shi, Jitendra Malik

[3]

Multiclass spectral clustering, 2003 Stella X. Yu, Jianbo Shi

[4]

Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method, 2001 A. V. Knyazev SIAM Journal on Scientific Computing 23, no. 2, pp. 517-541.

[5]

Simple, direct, and efficient multi-way spectral clustering, 2019 Anil Damle, Victor Minden, Lexing Ying

[6]

Multiscale Spectral Image Segmentation Multiscale preconditioning for computing eigenvalues of graph Laplacians in image segmentation, 2006 Andrew Knyazev

[7]

Preconditioned spectral clustering for stochastic block partition streaming graph challenge (Preliminary version at arXiv.) David Zhuzhunashvili, Andrew Knyazev

示例

>>> import numpy as np
>>> from sklearn.metrics.pairwise import pairwise_kernels
>>> from sklearn.cluster import spectral_clustering
>>> X = np.array([[1, 1], [2, 1], [1, 0],
...               [4, 7], [3, 5], [3, 6]])
>>> affinity = pairwise_kernels(X, metric='rbf')
>>> spectral_clustering(
...     affinity=affinity, n_clusters=2, assign_labels="discretize", random_state=0
... )
array([1, 1, 1, 0, 0, 0])

Gallery examples#

将希腊硬币图片分割成区域

用于图像分割的谱聚类

spectral_clustering#

Gallery examples#

本页