注意
转到结尾 下载完整的示例代码。或者通过 JupyterLite 或 Binder 在您的浏览器中运行此示例
在线学习人脸部件字典#
此示例使用大型人脸数据集来学习一组构成人脸的 20 x 20 像素图像块。
从编程的角度来看,这很有趣,因为它展示了如何使用 scikit-learn 的在线 API 分块处理非常大的数据集。我们的方法是每次加载一张图像,并从该图像中随机提取 50 个图像块。一旦我们积累了 500 个这样的图像块(使用 10 张图像),我们就运行在线 KMeans 对象 MiniBatchKMeans 的partial_fit
方法。
MiniBatchKMeans 上的详细设置使我们能够看到在连续调用 partial-fit 时某些集群被重新分配。这是因为它们所代表的图像块数量变得太低,最好选择一个新的随机集群。
# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause
加载数据#
from sklearn import datasets
faces = datasets.fetch_olivetti_faces()
学习图像字典#
import time
import numpy as np
from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.image import extract_patches_2d
print("Learning the dictionary... ")
rng = np.random.RandomState(0)
kmeans = MiniBatchKMeans(n_clusters=81, random_state=rng, verbose=True, n_init=3)
patch_size = (20, 20)
buffer = []
t0 = time.time()
# The online learning part: cycle over the whole dataset 6 times
index = 0
for _ in range(6):
for img in faces.images:
data = extract_patches_2d(img, patch_size, max_patches=50, random_state=rng)
data = np.reshape(data, (len(data), -1))
buffer.append(data)
index += 1
if index % 10 == 0:
data = np.concatenate(buffer, axis=0)
data -= np.mean(data, axis=0)
data /= np.std(data, axis=0)
kmeans.partial_fit(data)
buffer = []
if index % 100 == 0:
print("Partial fit of %4i out of %i" % (index, 6 * len(faces.images)))
dt = time.time() - t0
print("done in %.2fs." % dt)
Learning the dictionary...
[MiniBatchKMeans] Reassigning 8 cluster centers.
[MiniBatchKMeans] Reassigning 5 cluster centers.
Partial fit of 100 out of 2400
[MiniBatchKMeans] Reassigning 3 cluster centers.
Partial fit of 200 out of 2400
[MiniBatchKMeans] Reassigning 1 cluster centers.
Partial fit of 300 out of 2400
[MiniBatchKMeans] Reassigning 3 cluster centers.
Partial fit of 400 out of 2400
Partial fit of 500 out of 2400
Partial fit of 600 out of 2400
Partial fit of 700 out of 2400
Partial fit of 800 out of 2400
Partial fit of 900 out of 2400
Partial fit of 1000 out of 2400
Partial fit of 1100 out of 2400
Partial fit of 1200 out of 2400
Partial fit of 1300 out of 2400
Partial fit of 1400 out of 2400
Partial fit of 1500 out of 2400
Partial fit of 1600 out of 2400
Partial fit of 1700 out of 2400
Partial fit of 1800 out of 2400
Partial fit of 1900 out of 2400
Partial fit of 2000 out of 2400
Partial fit of 2100 out of 2400
Partial fit of 2200 out of 2400
Partial fit of 2300 out of 2400
Partial fit of 2400 out of 2400
done in 1.38s.
绘制结果#
import matplotlib.pyplot as plt
plt.figure(figsize=(4.2, 4))
for i, patch in enumerate(kmeans.cluster_centers_):
plt.subplot(9, 9, i + 1)
plt.imshow(patch.reshape(patch_size), cmap=plt.cm.gray, interpolation="nearest")
plt.xticks(())
plt.yticks(())
plt.suptitle(
"Patches of faces\nTrain time %.1fs on %d patches" % (dt, 8 * len(faces.images)),
fontsize=16,
)
plt.subplots_adjust(0.08, 0.02, 0.92, 0.85, 0.08, 0.23)
plt.show()
脚本总运行时间:(0 分钟 2.729 秒)
相关示例
使用字典学习进行图像去噪
人脸数据集分解
特征聚合
比较 K 均值和 MiniBatchKMeans 聚类算法