adjusted_rand_score#

sklearn.metrics.adjusted_rand_score(labels_true, labels_pred)[source]#

经过偶然性调整的Rand指数。

Rand指数通过考虑所有样本对，并计算在预测聚类和真实聚类中分配到相同或不同簇的样本对，来衡量两个聚类之间的相似度。

原始的RI分数随后根据以下方案“经过偶然性调整”为ARI分数

ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)

因此，经过调整的Rand指数（ARI）在随机标签情况下（独立于簇的数量和样本数量）值接近0.0，当聚类完全相同（允许排列）时值为1.0。对于特别不一致的聚类，调整后的Rand指数的下限为-0.5。

ARI是一个对称度量

adjusted_rand_score(a, b) == adjusted_rand_score(b, a)

在用户指南中阅读更多。

参数:

labels_true形状为 (n_samples,) 且 dtype=int 的类数组: 用作参考的真实类别标签。
labels_pred形状为 (n_samples,) 且 dtype=int 的类数组: 要评估的聚类标签。

返回:

ARI浮点数: 介于 -0.5 和 1.0 之间的相似度得分。随机标签的 ARI 接近 0.0。1.0 表示完美匹配。

另请参阅

adjusted_mutual_info_score: 调整互信息。

参考文献

[Hubert1985]

L. Hubert 和 P. Arabie，《比较分区》，Journal of Classification 1985 https://link.springer.com/article/10.1007%2FBF01908075

[Steinley2004]

D. Steinley，《Hubert-Arabie调整Rand指数的性质》，Psychological Methods 2004

[wk]

https://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index

[Chacon]

给定大小的两个聚类的最小调整Rand指数，2022，J. E. Chacón 和 A. I. Rastrojo

示例

完美匹配的标签得分仍为1

>>> from sklearn.metrics.cluster import adjusted_rand_score
>>> adjusted_rand_score([0, 0, 1, 1], [0, 0, 1, 1])
1.0
>>> adjusted_rand_score([0, 0, 1, 1], [1, 1, 0, 0])
1.0

将所有类别成员分配到相同簇的标签是完整的，但可能不总是纯净的，因此会受到惩罚

>>> adjusted_rand_score([0, 0, 1, 2], [0, 0, 1, 1])
0.57

ARI是对称的，因此具有纯净簇（成员来自同一类别但存在不必要的拆分）的标签会受到惩罚

>>> adjusted_rand_score([0, 0, 1, 1], [0, 0, 1, 2])
0.57

如果类别成员完全分散到不同的簇中，则分配是完全不完整的，因此ARI非常低

>>> adjusted_rand_score([0, 0, 0, 0], [0, 1, 2, 3])
0.0

对于特别不一致的标签，ARI可能会取负值，这比随机标签的期望值更差

>>> adjusted_rand_score([0, 0, 1, 1], [0, 1, 0, 1])
-0.5

有关更详细的示例，请参阅聚类性能评估中的偶然性调整。

图库示例#

adjusted_rand_score#

图库示例#

本页