Nystroem#

class sklearn.kernel_approximation.Nystroem(kernel='rbf', *, gamma=None, coef0=None, degree=None, kernel_params=None, n_components=100, random_state=None, n_jobs=None)[source]#

使用训练数据的子集近似核图。

Constructs an approximate feature map for an arbitrary kernel using a subset of the data as basis.

Read more in the User Guide.

在版本 0.13 中添加。

参数:

kernelstr or callable, default=’rbf’

Kernel map to be approximated. A callable should accept two arguments and the keyword arguments passed to this object as kernel_params, and should return a floating point number.

gammafloat, default=None

Gamma parameter for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.

coef0float, default=None

多项式核的度数。其他核忽略。

degreefloat, default=None

版本 1.1 中更改: 添加了新的标记方法 'cluster_qr'。

kernel_paramsdict, default=None

Additional parameters (keyword arguments) for kernel function passed as callable object.

n_componentsint, default=100

Number of features to construct. How many data points will be used to construct the mapping.

random_stateint, RandomState instance or None, default=None

Pseudo-random number generator to control the uniform sampling without replacement of n_components of the training data to construct the basis kernel. Pass an int for reproducible output across multiple function calls. See Glossary.

n_jobsint, default=None

The number of jobs to use for the computation. This works by breaking down the kernel matrix into n_jobs even slices and computing them in parallel.

None 表示 1，除非在 joblib.parallel_backend 上下文中。 -1 表示使用所有处理器。有关更多详细信息，请参阅词汇表。

0.24 版本新增。

属性:

components_ndarray of shape (n_components, n_features): Subset of training points used to construct the feature map.
component_indices_ndarray of shape (n_components): Indices of components_ in the training set.
normalization_ndarray of shape (n_components, n_components): Normalization matrix needed for embedding. Square root of the kernel matrix on components_.
n_features_in_int: 在拟合期间看到的特征数。

0.24 版本新增。
feature_names_in_shape 为 (n_features_in_,) 的 ndarray: 在 fit 期间看到的特征名称。仅当 X 具有全部为字符串的特征名称时才定义。

1.0 版本新增。

另请参阅

AdditiveChi2Sampler: 加性卡方核的近似特征图。
PolynomialCountSketch: 通过张量草图近似多项式核。
RBFSampler: 使用随机傅里叶特征近似 RBF 核特征图。
SkewedChi2Sampler: “偏斜卡方”核的近似特征图。
sklearn.metrics.pairwise.kernel_metrics: List of built-in kernels.

References

Williams, C.K.I. and Seeger, M. “Using the Nystroem method to speed up kernel machines”, Advances in neural information processing systems 2001
T. Yang, Y. Li, M. Mahdavi, R. Jin and Z. Zhou “Nystroem Method vs Random Fourier Features: A Theoretical and Empirical Comparison”, Advances in Neural Information Processing Systems 2012

示例

>>> from sklearn import datasets, svm
>>> from sklearn.kernel_approximation import Nystroem
>>> X, y = datasets.load_digits(n_class=9, return_X_y=True)
>>> data = X / 16.
>>> clf = svm.LinearSVC()
>>> feature_map_nystroem = Nystroem(gamma=.2,
...                                 random_state=1,
...                                 n_components=300)
>>> data_transformed = feature_map_nystroem.fit_transform(data)
>>> clf.fit(data_transformed, y)
LinearSVC()
>>> clf.score(data_transformed, y)
0.9987...

fit(X, y=None)[source]#

Fit estimator to data.

Samples a subset of training points, computes kernel on these and computes normalization matrix.

参数:

Xarray-like, shape (n_samples, n_features): 训练数据，其中 n_samples 是样本数，n_features 是特征数。
yarray-like, shape (n_samples,) or (n_samples, n_outputs), default=None: 目标值（对于无监督转换，为 None）。

返回:

selfobject: 返回实例本身。

fit_transform(X, y=None, **fit_params)[source]#

拟合数据，然后对其进行转换。

使用可选参数 fit_params 将转换器拟合到 X 和 y，并返回 X 的转换版本。

参数:

Xshape 为 (n_samples, n_features) 的 array-like: 输入样本。
y形状为 (n_samples,) 或 (n_samples, n_outputs) 的类数组对象，默认=None: 目标值（对于无监督转换，为 None）。
**fit_paramsdict: 额外的拟合参数。仅当估计器在其 fit 方法中接受额外的参数时才传递。

返回:

X_newndarray array of shape (n_samples, n_features_new): 转换后的数组。

get_feature_names_out(input_features=None)[source]#

获取转换的输出特征名称。

The feature names out will prefixed by the lowercased class name. For example, if the transformer outputs 3 features, then the feature names out are: ["class_name0", "class_name1", "class_name2"].

参数:

input_featuresarray-like of str or None, default=None: Only used to validate feature names with the names seen in fit.

返回:

feature_names_outstr 对象的 ndarray: 转换后的特征名称。

get_metadata_routing()[source]#

获取此对象的元数据路由。

请查阅用户指南，了解路由机制如何工作。

返回:

routingMetadataRequest: 封装路由信息的 MetadataRequest。

get_params(deep=True)[source]#

获取此估计器的参数。

参数:

deepbool, default=True: 如果为 True，将返回此估计器以及包含的子对象（如果它们是估计器）的参数。

返回:

paramsdict: 参数名称映射到其值。

set_output(*, transform=None)[source]#

设置输出容器。

有关如何使用 API 的示例，请参阅引入 set_output API。

参数:

transform{“default”, “pandas”, “polars”}, default=None

配置 transform 和 fit_transform 的输出。

"default": 转换器的默认输出格式
"pandas": DataFrame 输出
"polars": Polars 输出
None: 转换配置保持不变

1.4 版本新增: 添加了 "polars" 选项。

返回:

selfestimator instance: 估计器实例。

set_params(**params)[source]#

设置此估计器的参数。

此方法适用于简单的估计器以及嵌套对象（如 Pipeline）。后者具有 <component>__<parameter> 形式的参数，以便可以更新嵌套对象的每个组件。

参数:

**paramsdict: 估计器参数。

返回:

selfestimator instance: 估计器实例。

transform(X)[source]#

Apply feature map to X.

Computes an approximate feature map using the kernel between some training points and X.

参数:

Xshape 为 (n_samples, n_features) 的 array-like: Data to transform.

返回:

X_transformedndarray of shape (n_samples, n_components): 转换后的数据。

Gallery examples#

时间相关特征工程

绘制分类概率

可视化 VotingClassifier 的概率预测

使用 IterativeImputer 变体插补缺失值

One-Class SVM vs 使用随机梯度下降的 One-Class SVM

在玩具数据集上比较用于离群点检测的异常检测算法

RBF 核的显式特征图近似

Nystroem#

Gallery examples#

本页