注意

跳转到末尾下载完整示例代码。或通过 JupyterLite 或 Binder 在浏览器中运行此示例

邻域成分分析演示#

本示例演示了一种学习到的距离度量，该度量可最大化最近邻分类的准确性。它提供了与原始点空间相比，此度量的可视化表示。请参阅用户指南获取更多信息。

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm
from scipy.special import logsumexp

from sklearn.datasets import make_classification
from sklearn.neighbors import NeighborhoodComponentsAnalysis

原始点#

首先，我们从3个类别中创建9个样本的数据集，并在原始空间中绘制这些点。在本示例中，我们关注点编号3的分类。点编号3与另一点之间链接的粗细与它们之间的距离成比例。

X, y = make_classification(
    n_samples=9,
    n_features=2,
    n_informative=2,
    n_redundant=0,
    n_classes=3,
    n_clusters_per_class=1,
    class_sep=1.0,
    random_state=0,
)

plt.figure(1)
ax = plt.gca()
for i in range(X.shape[0]):
    ax.text(X[i, 0], X[i, 1], str(i), va="center", ha="center")
    ax.scatter(X[i, 0], X[i, 1], s=300, c=cm.Set1(y[[i]]), alpha=0.4)

ax.set_title("Original points")
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
ax.axis("equal")  # so that boundaries are displayed correctly as circles


def link_thickness_i(X, i):
    diff_embedded = X[i] - X
    dist_embedded = np.einsum("ij,ij->i", diff_embedded, diff_embedded)
    dist_embedded[i] = np.inf

    # compute exponentiated distances (use the log-sum-exp trick to
    # avoid numerical instabilities
    exp_dist_embedded = np.exp(-dist_embedded - logsumexp(-dist_embedded))
    return exp_dist_embedded


def relate_point(X, i, ax):
    pt_i = X[i]
    for j, pt_j in enumerate(X):
        thickness = link_thickness_i(X, i)
        if i != j:
            line = ([pt_i[0], pt_j[0]], [pt_i[1], pt_j[1]])
            ax.plot(*line, c=cm.Set1(y[j]), linewidth=5 * thickness[j])


i = 3
relate_point(X, i, ax)
plt.show()

Original points

学习嵌入#

我们使用 NeighborhoodComponentsAnalysis 学习一个嵌入并在转换后绘制点。然后我们获取嵌入并找到最近邻。

nca = NeighborhoodComponentsAnalysis(max_iter=30, random_state=0)
nca = nca.fit(X, y)

plt.figure(2)
ax2 = plt.gca()
X_embedded = nca.transform(X)
relate_point(X_embedded, i, ax2)

for i in range(len(X)):
    ax2.text(X_embedded[i, 0], X_embedded[i, 1], str(i), va="center", ha="center")
    ax2.scatter(X_embedded[i, 0], X_embedded[i, 1], s=300, c=cm.Set1(y[[i]]), alpha=0.4)

ax2.set_title("NCA embedding")
ax2.axes.get_xaxis().set_visible(False)
ax2.axes.get_yaxis().set_visible(False)
ax2.axis("equal")
plt.show()

NCA embedding

脚本总运行时间： (0 分钟 0.150 秒)

相关示例

比较带和不带邻域成分分析的最近邻

比较带和不带邻域成分分析的最近邻

使用邻域成分分析进行降维

使用邻域成分分析进行降维

手写数字上的流形学习：局部线性嵌入、Isomap…

手写数字上的流形学习：局部线性嵌入、Isomap...

变分贝叶斯高斯混合的集中先验类型分析

变分贝叶斯高斯混合的集中先验类型分析

由 Sphinx-Gallery 生成的画廊