kmeans++ #

sklearn.cluster.kmeans_plusplus(X, n_clusters, *, sample_weight=None, x_squared_norms=None, random_state=None, n_local_trials=None)[source]#

根据 k-means++ 初始化 n_clusters 个种子。

在 0.24 版本中添加。

参数：

X{array-like, sparse matrix} of shape (n_samples, n_features): 从中选择种子的数据。
n_clustersint: 要初始化的质心数量。
sample_weightarray-like of shape (n_samples,), default=None: X中每个观测值的权重。如果为None，则所有观测值都分配相同的权重。如果init是可调用对象或用户提供的数组，则忽略sample_weight。

版本1.3中添加。
x_squared_normsarray-like of shape (n_samples,), default=None: 每个数据点的平方欧几里得范数。
random_stateint 或 RandomState 实例，默认值为 None: 确定质心初始化的随机数生成。传递一个整数以在多次函数调用中获得可重复的输出。参见术语表。
n_local_trialsint，默认值为 None: 每个中心（第一个中心除外）的种子试验次数，其中最能减少惯性的试验将被贪婪地选择。设置为 None 以使试验次数与种子数对数相关（2+log(k)），这是推荐的设置。设置为 1 将禁用贪婪聚类选择，并恢复 vanilla k-means++ 算法，该算法经验证比其贪婪变体效果差。

返回:

centersndarray of shape (n_clusters, n_features): k-means 的初始中心。
indicesndarray of shape (n_clusters,): 数据数组 X 中所选中心的索引位置。对于给定的索引和中心，X[index] = center。

备注

以一种巧妙的方式选择 k 均值聚类的初始聚类中心，以加快收敛速度。参见：Arthur, D. and Vassilvitskii, S. “k-means++: the advantages of careful seeding”. ACM-SIAM symposium on Discrete algorithms. 2007

示例

>>> from sklearn.cluster import kmeans_plusplus
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [10, 2], [10, 4], [10, 0]])
>>> centers, indices = kmeans_plusplus(X, n_clusters=2, random_state=0)
>>> centers
array([[10,  2],
       [ 1,  0]])
>>> indices
array([3, 2])

图库示例#

K-Means++ 初始化示例