注意

转到末尾下载完整示例代码。或通过JupyterLite或Binder在浏览器中运行此示例

人脸局部字典的在线学习#

本示例使用大型人脸数据集来学习一组构成人脸的20x20图像块。

从编程角度来看，它很有趣，因为它展示了如何使用scikit-learn的在线API分块处理非常大的数据集。我们采取的方法是，一次加载一张图像，并从该图像中随机提取50个图像块。一旦我们积累了500个这样的图像块（使用10张图像），我们就会运行在线KMeans对象MiniBatchKMeans的partial_fit方法。

MiniBatchKMeans上的verbose设置使我们能够看到，在对partial-fit的连续调用期间，一些簇被重新分配。这是因为它们所代表的图像块数量变得太低，最好选择一个新的随机簇。

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

加载数据#

from sklearn import datasets

faces = datasets.fetch_olivetti_faces()

学习图像字典#

import time

import numpy as np

from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.image import extract_patches_2d

print("Learning the dictionary... ")
rng = np.random.RandomState(0)
kmeans = MiniBatchKMeans(n_clusters=81, random_state=rng, verbose=True, n_init=3)
patch_size = (20, 20)

buffer = []
t0 = time.time()

# The online learning part: cycle over the whole dataset 6 times
index = 0
for _ in range(6):
    for img in faces.images:
        data = extract_patches_2d(img, patch_size, max_patches=50, random_state=rng)
        data = np.reshape(data, (len(data), -1))
        buffer.append(data)
        index += 1
        if index % 10 == 0:
            data = np.concatenate(buffer, axis=0)
            data -= np.mean(data, axis=0)
            data /= np.std(data, axis=0)
            kmeans.partial_fit(data)
            buffer = []
        if index % 100 == 0:
            print("Partial fit of %4i out of %i" % (index, 6 * len(faces.images)))

dt = time.time() - t0
print("done in %.2fs." % dt)

Learning the dictionary...
[MiniBatchKMeans] Reassigning 8 cluster centers.
[MiniBatchKMeans] Reassigning 5 cluster centers.
Partial fit of  100 out of 2400
[MiniBatchKMeans] Reassigning 3 cluster centers.
Partial fit of  200 out of 2400
[MiniBatchKMeans] Reassigning 1 cluster centers.
Partial fit of  300 out of 2400
[MiniBatchKMeans] Reassigning 3 cluster centers.
Partial fit of  400 out of 2400
Partial fit of  500 out of 2400
Partial fit of  600 out of 2400
Partial fit of  700 out of 2400
Partial fit of  800 out of 2400
Partial fit of  900 out of 2400
Partial fit of 1000 out of 2400
Partial fit of 1100 out of 2400
Partial fit of 1200 out of 2400
Partial fit of 1300 out of 2400
Partial fit of 1400 out of 2400
Partial fit of 1500 out of 2400
Partial fit of 1600 out of 2400
Partial fit of 1700 out of 2400
Partial fit of 1800 out of 2400
Partial fit of 1900 out of 2400
Partial fit of 2000 out of 2400
Partial fit of 2100 out of 2400
Partial fit of 2200 out of 2400
Partial fit of 2300 out of 2400
Partial fit of 2400 out of 2400
done in 1.16s.

绘制结果#

import matplotlib.pyplot as plt

plt.figure(figsize=(4.2, 4))
for i, patch in enumerate(kmeans.cluster_centers_):
    plt.subplot(9, 9, i + 1)
    plt.imshow(patch.reshape(patch_size), cmap=plt.cm.gray, interpolation="nearest")
    plt.xticks(())
    plt.yticks(())


plt.suptitle(
    "Patches of faces\nTrain time %.1fs on %d patches" % (dt, 8 * len(faces.images)),
    fontsize=16,
)
plt.subplots_adjust(0.08, 0.02, 0.92, 0.85, 0.08, 0.23)

plt.show()

Patches of faces Train time 1.2s on 3200 patches

脚本总运行时间： (0 分 2.129 秒)

相关示例

人脸数据集分解

人脸数据集分解

使用字典学习进行图像去噪

使用字典学习进行图像去噪

特征聚合

K-Means和MiniBatchKMeans聚类算法的比较

K-Means和MiniBatchKMeans聚类算法的比较

由Sphinx-Gallery生成