Fowlkes-Mallows 指标#

sklearn.metrics.fowlkes_mallows_score(labels_true, labels_pred, *, sparse=False)[source]#

测量一组点的两个聚类的相似性。

在0.18版本中添加。

Fowlkes-Mallows 指数 (FMI) 定义为精确率和召回率的几何平均数。

FMI = TP / sqrt((TP + FP) * (TP + FN))

其中 TP 是真阳性的数量（即在 labels_true 和 labels_pred 中都属于相同聚类的点对数量），FP 是假阳性的数量（即在 labels_pred 中属于相同聚类但在 labels_true 中不属于相同聚类的点对数量），FN 是假阴性的数量（即在 labels_true 中属于相同聚类但在 labels_pred 中不属于相同聚类的点对数量）。

分数范围从 0 到 1。高值表示两个聚类之间具有良好的相似性。

在用户指南中了解更多信息。

参数：

labels_true形状为 (n_samples,) 的类数组，dtype=int: 将数据聚类成不相交子集。
labels_pred形状为 (n_samples,) 的类数组，dtype=int: 将数据聚类成不相交子集。
sparse布尔值，默认值为 False: 使用稀疏矩阵在内部计算列联表。

返回值：

score浮点数: 生成的 Fowlkes-Mallows 分数。

参考文献

[1]

E. B. Fowkles 和 C. L. Mallows，1983。“一种比较两个层次聚类的方法”。美国统计协会杂志

[2]

Fowlkes-Mallows 指数的维基百科词条

示例

完美的标签既是同质的又是完整的，因此分数为 1.0。

>>> from sklearn.metrics.cluster import fowlkes_mallows_score
>>> fowlkes_mallows_score([0, 0, 1, 1], [0, 0, 1, 1])
np.float64(1.0)
>>> fowlkes_mallows_score([0, 0, 1, 1], [1, 1, 0, 0])
np.float64(1.0)

如果类成员完全分散在不同的聚类中，则赋值完全随机，因此 FMI 为零。

>>> fowlkes_mallows_score([0, 0, 0, 0], [0, 1, 2, 3])
0.0