SelectFromModel#

class sklearn.feature_selection.SelectFromModel(estimator, *, threshold=None, prefit=False, norm_order=1, max_features=None, importance_getter='auto')[source]#

基于重要性权重选择特征的元转换器。

版本0.17中新增。

Read more in the User Guide.

参数:
estimatorobject

The base estimator from which the transformer is built. This can be both a fitted (if prefit is set to True) or a non-fitted estimator. The estimator should have a feature_importances_ or coef_ attribute after fitting. Otherwise, the importance_getter parameter should be used.

thresholdstr or float, default=None

The threshold value to use for feature selection. Features whose absolute importance value is greater or equal are kept while the others are discarded. If “median” (resp. “mean”), then the threshold value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, “mean” is used by default.

prefitbool, default=False

Whether a prefit model is expected to be passed into the constructor directly or not. If True, estimator must be a fitted estimator. If False, estimator is fitted and updated by calling fit and partial_fit, respectively.

norm_ordernon-zero int, inf, -inf, default=1

Order of the norm used to filter the vectors of coefficients below threshold in the case where the coef_ attribute of the estimator is of dimension 2.

max_featuresint, callable, default=None

The maximum number of features to select.

  • If an integer, then it specifies the maximum number of features to allow.

  • If a callable, then it specifies how to calculate the maximum number of features allowed. The callable will receive X as input: max_features(X).

  • If None, then all features are kept.

To only select based on max_features, set threshold=-np.inf.

0.20 版本新增。

Changed in version 1.1: max_features accepts a callable.

importance_getterstr or callable, default=’auto’

If ‘auto’, uses the feature importance either through a coef_ attribute or feature_importances_ attribute of estimator.

Also accepts a string that specifies an attribute name/path for extracting feature importance (implemented with attrgetter). For example, give regressor_.coef_ in case of TransformedTargetRegressor or named_steps.clf.feature_importances_ in case of Pipeline with its last step named clf.

If callable, overrides the default feature importance getter. The callable is passed with the fitted estimator and it should return importance for each feature.

0.24 版本新增。

属性:
estimator_estimator

The base estimator from which the transformer is built. This attribute exist only when fit has been called.

  • If prefit=True, it is a deep copy of estimator.

  • If prefit=False, it is a clone of estimator and fit on the data passed to fit or partial_fit.

n_features_in_int

Number of features seen during fit.

max_features_int

Maximum number of features calculated during fit. Only defined if the max_features is not None.

  • If max_features is an int, then max_features_ = max_features.

  • If max_features is a callable, then max_features_ = max_features(X).

版本 1.1 中新增。

feature_names_in_shape 为 (n_features_in_,) 的 ndarray

fit 期间看到的特征名称。仅当 X 具有全部为字符串的特征名称时才定义。

1.0 版本新增。

threshold_float

Threshold value used for feature selection.

另请参阅

RFE

Recursive feature elimination based on importance weights.

RFECV

Recursive feature elimination with built-in cross-validated selection of the best number of features.

SequentialFeatureSelector

Sequential cross-validation based feature selection. Does not rely on importance weights.

注意事项

Allows NaN/Inf in the input if the underlying estimator does as well.

示例

>>> from sklearn.feature_selection import SelectFromModel
>>> from sklearn.linear_model import LogisticRegression
>>> X = [[ 0.87, -1.34,  0.31 ],
...      [-2.79, -0.02, -0.85 ],
...      [-1.34, -0.48, -2.55 ],
...      [ 1.92,  1.48,  0.65 ]]
>>> y = [0, 1, 0, 1]
>>> selector = SelectFromModel(estimator=LogisticRegression()).fit(X, y)
>>> selector.estimator_.coef_
array([[-0.3252,  0.8345,  0.4976]])
>>> selector.threshold_
np.float64(0.55249)
>>> selector.get_support()
array([False,  True, False])
>>> selector.transform(X)
array([[-1.34],
       [-0.02],
       [-0.48],
       [ 1.48]])

Using a callable to create a selector that can use no more than half of the input features.

>>> def half_callable(X):
...     return round(len(X[0]) / 2)
>>> half_selector = SelectFromModel(estimator=LogisticRegression(),
...                                 max_features=half_callable)
>>> _ = half_selector.fit(X, y)
>>> half_selector.max_features_
2
fit(X, y=None, **fit_params)[source]#

Fit the SelectFromModel meta-transformer.

参数:
Xshape 为 (n_samples, n_features) 的 array-like

训练输入样本。

yshape 为 (n_samples,), default=None 的 array-like

The target values (integers that correspond to classes in classification, real numbers in regression).

**fit_paramsdict
  • If enable_metadata_routing=False (default): Parameters directly passed to the fit method of the sub-estimator. They are ignored if prefit=True.

  • If enable_metadata_routing=True: Parameters safely routed to the fit method of the sub-estimator. They are ignored if prefit=True.

Changed in version 1.4: See Metadata Routing User Guide for more details.

返回:
selfobject

拟合的估计器。

fit_transform(X, y=None, **fit_params)[source]#

拟合数据,然后对其进行转换。

使用可选参数 fit_params 将转换器拟合到 Xy,并返回 X 的转换版本。

参数:
Xshape 为 (n_samples, n_features) 的 array-like

输入样本。

y形状为 (n_samples,) 或 (n_samples, n_outputs) 的类数组对象,默认=None

目标值(对于无监督转换,为 None)。

**fit_paramsdict

额外的拟合参数。仅当估计器在其 fit 方法中接受额外的参数时才传递。

返回:
X_newndarray array of shape (n_samples, n_features_new)

转换后的数组。

get_feature_names_out(input_features=None)[source]#

根据所选特征屏蔽特征名称。

参数:
input_featuresarray-like of str or None, default=None

输入特征。

  • 如果 input_featuresNone,则使用 feature_names_in_ 作为输入特征名称。如果 feature_names_in_ 未定义,则生成以下输入特征名称:["x0", "x1", ..., "x(n_features_in_ - 1)"]

  • 如果 input_features 是 array-like,则如果定义了 feature_names_in_input_features 必须与 feature_names_in_ 匹配。

返回:
feature_names_outstr 对象的 ndarray

转换后的特征名称。

get_metadata_routing()[source]#

获取此对象的元数据路由。

请查阅 用户指南,了解路由机制如何工作。

1.4 版本新增。

返回:
routingMetadataRouter

封装路由信息的 MetadataRouter

get_params(deep=True)[source]#

获取此估计器的参数。

参数:
deepbool, default=True

如果为 True,将返回此估计器以及包含的子对象(如果它们是估计器)的参数。

返回:
paramsdict

参数名称映射到其值。

get_support(indices=False)[source]#

获取所选特征的掩码或整数索引。

参数:
indicesbool, default=False

如果为 True,返回值将是一个整数数组,而不是布尔掩码。

返回:
supportarray

从特征向量中选择保留特征的索引。如果 indices 为 False,则这是一个形状为 [# input features] 的布尔数组,其中元素为 True 当且仅当其对应的特征被选中保留。如果 indices 为 True,则这是一个形状为 [# output features] 的整数数组,其值是输入特征向量中的索引。

inverse_transform(X)[source]#

反转转换操作。

参数:
Xarray of shape [n_samples, n_selected_features]

输入样本。

返回:
X_originalarray of shape [n_samples, n_original_features]

X with columns of zeros inserted where features would have been removed by transform.

partial_fit(X, y=None, **partial_fit_params)[source]#

Fit the SelectFromModel meta-transformer only once.

参数:
Xshape 为 (n_samples, n_features) 的 array-like

训练输入样本。

yshape 为 (n_samples,), default=None 的 array-like

The target values (integers that correspond to classes in classification, real numbers in regression).

**partial_fit_paramsdict
  • If enable_metadata_routing=False (default): Parameters directly passed to the partial_fit method of the sub-estimator.

  • If enable_metadata_routing=True: Parameters passed to the partial_fit method of the sub-estimator. They are ignored if prefit=True.

Changed in version 1.4: **partial_fit_params are routed to the sub-estimator, if enable_metadata_routing=True is set via set_config, which allows for aliasing.

See Metadata Routing User Guide for more details.

返回:
selfobject

拟合的估计器。

set_output(*, transform=None)[source]#

设置输出容器。

有关如何使用 API 的示例,请参阅引入 set_output API

参数:
transform{“default”, “pandas”, “polars”}, default=None

配置 transformfit_transform 的输出。

  • "default": 转换器的默认输出格式

  • "pandas": DataFrame 输出

  • "polars": Polars 输出

  • None: 转换配置保持不变

1.4 版本新增: 添加了 "polars" 选项。

返回:
selfestimator instance

估计器实例。

set_params(**params)[source]#

设置此估计器的参数。

此方法适用于简单的估计器以及嵌套对象(如 Pipeline)。后者具有 <component>__<parameter> 形式的参数,以便可以更新嵌套对象的每个组件。

参数:
**paramsdict

估计器参数。

返回:
selfestimator instance

估计器实例。

transform(X)[source]#

将 X 减少到所选特征。

参数:
Xarray of shape [n_samples, n_features]

输入样本。

返回:
X_rarray of shape [n_samples, n_selected_features]

仅包含所选特征的输入样本。