make_blobs#

sklearn.datasets.make_blobs(n_samples=100, n_features=2, *, centers=None, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None, return_centers=False)[source]#

生成用于聚类的各向同性高斯斑点。

更多信息请参阅用户指南。

参数:

n_samplesint 或 array-like, 默认值=100: 如果为 int，则表示均匀分配给各个簇的总点数。如果为 array-like，则序列中的每个元素表示每个簇的样本数。

v0.20 版本中的变化: 现在可以将 array-like 传递给 n_samples 参数
n_featuresint, 默认值=2: 每个样本的特征数量。
centersint 或 shape 为 (n_centers, n_features) 的 array-like, 默认值=None: 要生成的中心数量，或固定的中心位置。如果 n_samples 为 int 且 centers 为 None，则生成 3 个中心。如果 n_samples 为 array-like，则 centers 必须为 None 或长度与 n_samples 长度相等的数组。
cluster_stdfloat 或 float 的 array-like, 默认值=1.0: 簇的标准差。
center_boxfloat 元组 (min, max), 默认值=(-10.0, 10.0): 随机生成中心时，每个簇中心的边界框。
shufflebool, 默认值=True: 打乱样本。
random_stateint, RandomState 实例或 None, 默认值=None: 确定数据集创建的随机数生成。传入一个整数以在多次函数调用中获得可重现的输出。参见术语表。
return_centersbool, 默认值=False: 如果为 True，则返回每个簇的中心。

0.23 版本新增。

返回:

Xshape 为 (n_samples, n_features) 的 ndarray: 生成的样本。
yshape 为 (n_samples,) 的 ndarray: 每个样本的簇成员资格的整数标签。
centersshape 为 (n_centers, n_features) 的 ndarray: 每个簇的中心。仅当 return_centers=True 时返回。

另请参阅

make_classification: 一个更复杂的变体。

示例

>>> from sklearn.datasets import make_blobs
>>> X, y = make_blobs(n_samples=10, centers=3, n_features=2,
...                   random_state=0)
>>> print(X.shape)
(10, 2)
>>> y
array([0, 0, 1, 0, 2, 2, 2, 1, 1, 0])
>>> X, y = make_blobs(n_samples=[3, 3, 4], centers=None, n_features=2,
...                   random_state=0)
>>> print(X.shape)
(10, 2)
>>> y
array([0, 1, 2, 0, 2, 2, 2, 1, 1, 0])