SelectFromModel#
- class sklearn.feature_selection.SelectFromModel(estimator, *, threshold=None, prefit=False, norm_order=1, max_features=None, importance_getter='auto')[source]#
基于重要性权重选择特征的元转换器。
版本0.17中新增。
Read more in the User Guide.
- 参数:
- estimatorobject
The base estimator from which the transformer is built. This can be both a fitted (if
prefitis set to True) or a non-fitted estimator. The estimator should have afeature_importances_orcoef_attribute after fitting. Otherwise, theimportance_getterparameter should be used.- thresholdstr or float, default=None
The threshold value to use for feature selection. Features whose absolute importance value is greater or equal are kept while the others are discarded. If “median” (resp. “mean”), then the
thresholdvalue is the median (resp. the mean) of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, “mean” is used by default.- prefitbool, default=False
Whether a prefit model is expected to be passed into the constructor directly or not. If
True,estimatormust be a fitted estimator. IfFalse,estimatoris fitted and updated by callingfitandpartial_fit, respectively.- norm_ordernon-zero int, inf, -inf, default=1
Order of the norm used to filter the vectors of coefficients below
thresholdin the case where thecoef_attribute of the estimator is of dimension 2.- max_featuresint, callable, default=None
The maximum number of features to select.
If an integer, then it specifies the maximum number of features to allow.
If a callable, then it specifies how to calculate the maximum number of features allowed. The callable will receive
Xas input:max_features(X).If
None, then all features are kept.
To only select based on
max_features, setthreshold=-np.inf.0.20 版本新增。
Changed in version 1.1:
max_featuresaccepts a callable.- importance_getterstr or callable, default=’auto’
If ‘auto’, uses the feature importance either through a
coef_attribute orfeature_importances_attribute of estimator.Also accepts a string that specifies an attribute name/path for extracting feature importance (implemented with
attrgetter). For example, giveregressor_.coef_in case ofTransformedTargetRegressorornamed_steps.clf.feature_importances_in case ofPipelinewith its last step namedclf.If
callable, overrides the default feature importance getter. The callable is passed with the fitted estimator and it should return importance for each feature.0.24 版本新增。
- 属性:
- estimator_estimator
The base estimator from which the transformer is built. This attribute exist only when
fithas been called.If
prefit=True, it is a deep copy ofestimator.If
prefit=False, it is a clone ofestimatorand fit on the data passed tofitorpartial_fit.
n_features_in_intNumber of features seen during
fit.- max_features_int
Maximum number of features calculated during fit. Only defined if the
max_featuresis notNone.If
max_featuresis anint, thenmax_features_ = max_features.If
max_featuresis a callable, thenmax_features_ = max_features(X).
版本 1.1 中新增。
- feature_names_in_shape 为 (
n_features_in_,) 的 ndarray 在 fit 期间看到的特征名称。仅当
X具有全部为字符串的特征名称时才定义。1.0 版本新增。
threshold_floatThreshold value used for feature selection.
另请参阅
RFERecursive feature elimination based on importance weights.
RFECVRecursive feature elimination with built-in cross-validated selection of the best number of features.
SequentialFeatureSelectorSequential cross-validation based feature selection. Does not rely on importance weights.
注意事项
Allows NaN/Inf in the input if the underlying estimator does as well.
示例
>>> from sklearn.feature_selection import SelectFromModel >>> from sklearn.linear_model import LogisticRegression >>> X = [[ 0.87, -1.34, 0.31 ], ... [-2.79, -0.02, -0.85 ], ... [-1.34, -0.48, -2.55 ], ... [ 1.92, 1.48, 0.65 ]] >>> y = [0, 1, 0, 1] >>> selector = SelectFromModel(estimator=LogisticRegression()).fit(X, y) >>> selector.estimator_.coef_ array([[-0.3252, 0.8345, 0.4976]]) >>> selector.threshold_ np.float64(0.55249) >>> selector.get_support() array([False, True, False]) >>> selector.transform(X) array([[-1.34], [-0.02], [-0.48], [ 1.48]])
Using a callable to create a selector that can use no more than half of the input features.
>>> def half_callable(X): ... return round(len(X[0]) / 2) >>> half_selector = SelectFromModel(estimator=LogisticRegression(), ... max_features=half_callable) >>> _ = half_selector.fit(X, y) >>> half_selector.max_features_ 2
- fit(X, y=None, **fit_params)[source]#
Fit the SelectFromModel meta-transformer.
- 参数:
- Xshape 为 (n_samples, n_features) 的 array-like
训练输入样本。
- yshape 为 (n_samples,), default=None 的 array-like
The target values (integers that correspond to classes in classification, real numbers in regression).
- **fit_paramsdict
If
enable_metadata_routing=False(default): Parameters directly passed to thefitmethod of the sub-estimator. They are ignored ifprefit=True.If
enable_metadata_routing=True: Parameters safely routed to thefitmethod of the sub-estimator. They are ignored ifprefit=True.
Changed in version 1.4: See Metadata Routing User Guide for more details.
- 返回:
- selfobject
拟合的估计器。
- fit_transform(X, y=None, **fit_params)[source]#
拟合数据,然后对其进行转换。
使用可选参数
fit_params将转换器拟合到X和y,并返回X的转换版本。- 参数:
- Xshape 为 (n_samples, n_features) 的 array-like
输入样本。
- y形状为 (n_samples,) 或 (n_samples, n_outputs) 的类数组对象,默认=None
目标值(对于无监督转换,为 None)。
- **fit_paramsdict
额外的拟合参数。仅当估计器在其
fit方法中接受额外的参数时才传递。
- 返回:
- X_newndarray array of shape (n_samples, n_features_new)
转换后的数组。
- get_feature_names_out(input_features=None)[source]#
根据所选特征屏蔽特征名称。
- 参数:
- input_featuresarray-like of str or None, default=None
输入特征。
如果
input_features为None,则使用feature_names_in_作为输入特征名称。如果feature_names_in_未定义,则生成以下输入特征名称:["x0", "x1", ..., "x(n_features_in_ - 1)"]。如果
input_features是 array-like,则如果定义了feature_names_in_,input_features必须与feature_names_in_匹配。
- 返回:
- feature_names_outstr 对象的 ndarray
转换后的特征名称。
- get_metadata_routing()[source]#
获取此对象的元数据路由。
请查阅 用户指南,了解路由机制如何工作。
1.4 版本新增。
- 返回:
- routingMetadataRouter
封装路由信息的
MetadataRouter。
- get_params(deep=True)[source]#
获取此估计器的参数。
- 参数:
- deepbool, default=True
如果为 True,将返回此估计器以及包含的子对象(如果它们是估计器)的参数。
- 返回:
- paramsdict
参数名称映射到其值。
- get_support(indices=False)[source]#
获取所选特征的掩码或整数索引。
- 参数:
- indicesbool, default=False
如果为 True,返回值将是一个整数数组,而不是布尔掩码。
- 返回:
- supportarray
从特征向量中选择保留特征的索引。如果
indices为 False,则这是一个形状为 [# input features] 的布尔数组,其中元素为 True 当且仅当其对应的特征被选中保留。如果indices为 True,则这是一个形状为 [# output features] 的整数数组,其值是输入特征向量中的索引。
- inverse_transform(X)[source]#
反转转换操作。
- 参数:
- Xarray of shape [n_samples, n_selected_features]
输入样本。
- 返回:
- X_originalarray of shape [n_samples, n_original_features]
Xwith columns of zeros inserted where features would have been removed bytransform.
- partial_fit(X, y=None, **partial_fit_params)[source]#
Fit the SelectFromModel meta-transformer only once.
- 参数:
- Xshape 为 (n_samples, n_features) 的 array-like
训练输入样本。
- yshape 为 (n_samples,), default=None 的 array-like
The target values (integers that correspond to classes in classification, real numbers in regression).
- **partial_fit_paramsdict
If
enable_metadata_routing=False(default): Parameters directly passed to thepartial_fitmethod of the sub-estimator.If
enable_metadata_routing=True: Parameters passed to thepartial_fitmethod of the sub-estimator. They are ignored ifprefit=True.
Changed in version 1.4:
**partial_fit_paramsare routed to the sub-estimator, ifenable_metadata_routing=Trueis set viaset_config, which allows for aliasing.See Metadata Routing User Guide for more details.
- 返回:
- selfobject
拟合的估计器。
- set_output(*, transform=None)[source]#
设置输出容器。
有关如何使用 API 的示例,请参阅引入 set_output API。
- 参数:
- transform{“default”, “pandas”, “polars”}, default=None
配置
transform和fit_transform的输出。"default": 转换器的默认输出格式"pandas": DataFrame 输出"polars": Polars 输出None: 转换配置保持不变
1.4 版本新增: 添加了
"polars"选项。
- 返回:
- selfestimator instance
估计器实例。