LabelSpreading#

class sklearn.semi_supervised.LabelSpreading(kernel='rbf', *, gamma=20, n_neighbors=7, alpha=0.2, max_iter=30, tol=0.001, n_jobs=None)[源代码]#

用于半监督学习的 LabelSpreading 模型。

该模型与基本的标签传播算法类似，但使用了基于归一化图拉普拉斯的亲和矩阵以及跨标签的软约束。

请在用户指南中阅读更多内容。

参数:

kernel{‘knn’, ‘rbf’} 或可调用，默认为 ‘rbf’: 用于指定核函数的字符串标识符或核函数本身。仅支持字符串 ‘rbf’ 和 ‘knn’。传入的函数应接受两个输入，每个输入的形状为 (n_samples, n_features)，并返回形状为 (n_samples, n_samples) 的权重矩阵。
gammafloat，默认为 20: rbf 核的参数。
n_neighborsint，默认为 7: knn 核的参数，必须是严格正整数。
alphafloat，默认为 0.2: 约束因子。取值在 (0, 1) 之间，指定实例在多大程度上采纳其邻居的信息，而不是其初始标签。alpha=0 表示保留初始标签信息；alpha=1 表示替换所有初始信息。
max_iterint，默认为 30: 允许的最大迭代次数。
tolfloat, default=1e-3: 收敛容差：用于判断系统是否处于稳态的阈值。
n_jobsint, default=None: The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

属性:

X_ndarray，形状为 (n_samples, n_features): 输入数组。
classes_ndarray of shape (n_classes,): 用于对实例进行分类的区分性标签。
label_distributions_ndarray，形状为 (n_samples, n_classes): 每个样本的类别分布。
transduction_ndarray，形状为 (n_samples,): fit 过程中分配给每个样本的标签。
n_features_in_int: 在拟合期间看到的特征数。

0.24 版本新增。
feature_names_in_shape 为 (n_features_in_,) 的 ndarray: 在 fit 期间看到的特征名称。仅当 X 具有全部为字符串的特征名称时才定义。

1.0 版本新增。
n_iter_int: 运行的迭代次数。

另请参阅

LabelPropagation: 无正则化的基于图的半监督学习。

References

Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, Bernhard Schoelkopf. Learning with local and global consistency (2004)

示例

>>> import numpy as np
>>> from sklearn import datasets
>>> from sklearn.semi_supervised import LabelSpreading
>>> label_prop_model = LabelSpreading()
>>> iris = datasets.load_iris()
>>> rng = np.random.RandomState(42)
>>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
LabelSpreading(...)

fit(X, y)[源代码]#

为 X 拟合一个半监督标签传播模型。

输入样本（标记和未标记）由矩阵 X 提供，目标标签由矩阵 y 提供。在半监督分类中，我们通常在矩阵 y 中用 -1 来标记未标记样本。

参数:

Xshape 为 (n_samples, n_features) 的 {array-like, sparse matrix}: 训练数据，其中 n_samples 是样本数，n_features 是特征数。
yarray-like of shape (n_samples,): 具有未标记点标记为 -1 的目标类别值。所有未标记样本将在内部被归纳式地分配标签，这些标签存储在 transduction_ 中。

返回: