BaggingRegressor#

class sklearn.ensemble.BaggingRegressor(estimator=None, n_estimators=10, *, max_samples=None, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)[source]#

Bagging 回归器。

A Bagging regressor is an ensemble meta-estimator that fits base regressors each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [1]. If samples are drawn with replacement, then the method is known as Bagging [2]. When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [3]. Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [4].

Read more in the User Guide.

Added in version 0.15.

参数:

estimatorobject, default=None

The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a DecisionTreeRegressor.

版本 1.2 中新增： base_estimator 已重命名为 estimator。

n_estimatorsint, default=10

The number of base estimators in the ensemble.

max_samplesint or float, default=None

The number of samples to draw from X to train each base estimator (with replacement by default, see bootstrap for more details).

If None, then draw X.shape[0] samples irrespective of sample_weight.
如果为 int，则抽取 max_samples 个样本。
If float, then draw max_samples * X.shape[0] unweighted samples or max_samples * sample_weight.sum() weighted samples.

max_featuresint or float, default=1.0

The number of features to draw from X to train each base estimator ( without replacement by default, see bootstrap_features for more details).

If int, then draw max_features features.
If float, then draw max(1, int(max_features * n_features_in_)) features.

bootstrapbool, default=True

Whether samples are drawn with replacement. If False, sampling without replacement is performed. If fitting with sample_weight, it is strongly recommended to choose True, as only drawing with replacement will ensure the expected frequency semantics of sample_weight.

bootstrap_featuresbool, default=False

Whether features are drawn with replacement.

oob_scorebool, default=False

Whether to use out-of-bag samples to estimate the generalization error. Only available if bootstrap=True.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. See the Glossary.

n_jobsint, default=None

The number of jobs to run in parallel for both fit and predict. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint, RandomState instance or None, default=None

Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts a random_state attribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls. See Glossary.

verboseint, default=0

控制拟合和预测时的详细程度。

属性:

estimator_estimator: 用于构建集成的基础估计器。

版本 1.2 中新增: base_estimator_ 已重命名为 estimator_。
n_features_in_int: 在拟合期间看到的特征数。

0.24 版本新增。
feature_names_in_shape 为 (n_features_in_,) 的 ndarray: 在 fit 期间看到的特征名称。仅当 X 具有全部为字符串的特征名称时才定义。

1.0 版本新增。
estimators_list of estimators: 已拟合子估计器的集合。
estimators_samples_list of arrays: 每个基本估计器抽取的样本子集。
estimators_features_list of arrays: The subset of drawn features for each base estimator.
oob_score_float: 使用袋外估计获得的训练数据集分数。此属性仅当 oob_score 为 True 时存在。
oob_prediction_ndarray of shape (n_samples,): Prediction computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_prediction_ might contain NaN. This attribute exists only when oob_score is True.

另请参阅

BaggingClassifier: Bagging 分类器。

References

[1]

L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, 36(1), 85-103, 1999.

[2]

L. Breiman, “Bagging predictors”, Machine Learning, 24(2), 123-140, 1996.

[3]

T. Ho, “The random subspace method for constructing decision forests”, Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.

[4]

G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.

示例

>>> from sklearn.svm import SVR
>>> from sklearn.ensemble import BaggingRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=4,
...                        n_informative=2, n_targets=1,
...                        random_state=0, shuffle=False)
>>> regr = BaggingRegressor(estimator=SVR(),
...                         n_estimators=10, random_state=0).fit(X, y)
>>> regr.predict([[0, 0, 0, 0]])
array([-2.8720])

fit(X, y, sample_weight=None, **fit_params)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

参数:

Xshape 为 (n_samples, n_features) 的 {array-like, sparse matrix}: The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): 目标值（分类中的类别标签，回归中的实数）。
sample_weightshape 为 (n_samples,) 的 array-like, default=None: Sample weights. If None, then samples are equally weighted. Used as probabilities to sample the training set. Note that the expected frequency semantics for the sample_weight parameter are only fulfilled when sampling with replacement bootstrap=True and using a float or integer max_samples (instead of the default max_samples=None).
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

返回:

selfobject: 拟合的估计器。

get_metadata_routing()[source]#

获取此对象的元数据路由。

请查阅用户指南，了解路由机制如何工作。

1.5 版本新增。

返回:

routingMetadataRouter: 封装路由信息的 MetadataRouter。

get_params(deep=True)[source]#

获取此估计器的参数。

参数:

deepbool, default=True: 如果为 True，将返回此估计器以及包含的子对象（如果它们是估计器）的参数。

返回:

paramsdict: 参数名称映射到其值。

predict(X, **params)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

参数:

Xshape 为 (n_samples, n_features) 的 {array-like, sparse matrix}: The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

返回:

yndarray of shape (n_samples,): 预测值。

score(X, y, sample_weight=None)[source]#

返回测试数据的决定系数。

The coefficient of determination, $R^2$, is defined as $(1 - \frac{u}{v})$, where $u$ is the residual sum of squares ((y_true - y_pred)** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2$ score of 0.0.

参数:

Xshape 为 (n_samples, n_features) 的 array-like: 测试样本。对于某些估计器，这可能是一个预先计算的核矩阵或一个通用对象列表，形状为 (n_samples, n_samples_fitted)，其中 n_samples_fitted 是用于估计器拟合的样本数。
yshape 为 (n_samples,) 或 (n_samples, n_outputs) 的 array-like: X 的真实值。
sample_weightshape 为 (n_samples,) 的 array-like, default=None: 样本权重。

返回:

scorefloat: self.predict(X) 相对于 y 的 $R^2$。

注意事项

The $R^2$ score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → BaggingRegressor[source]#

配置是否应请求元数据以传递给 fit 方法。

请注意，此方法仅在以下情况下相关：此估计器用作元估计器中的子估计器，并且通过 enable_metadata_routing=True 启用了元数据路由（请参阅 sklearn.set_config）。请查看用户指南以了解路由机制的工作原理。

每个参数的选项如下：

True：请求元数据，如果提供则传递给 fit。如果未提供元数据，则忽略该请求。
False：不请求元数据，元估计器不会将其传递给 fit。
None：不请求元数据，如果用户提供元数据，元估计器将引发错误。
str：应将元数据以给定别名而不是原始名称传递给元估计器。

默认值 (sklearn.utils.metadata_routing.UNCHANGED) 保留现有请求。这允许您更改某些参数的请求而不更改其他参数。

在版本 1.3 中新增。

参数:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: fit 方法中 sample_weight 参数的元数据路由。

返回:

selfobject: 更新后的对象。

set_params(**params)[source]#

设置此估计器的参数。

此方法适用于简单的估计器以及嵌套对象（如 Pipeline）。后者具有 <component>__<parameter> 形式的参数，以便可以更新嵌套对象的每个组件。

参数:

**paramsdict: 估计器参数。

返回:

selfestimator instance: 估计器实例。

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → BaggingRegressor[source]#

配置是否应请求元数据以传递给 score 方法。

请注意，此方法仅在以下情况下相关：此估计器用作元估计器中的子估计器，并且通过 enable_metadata_routing=True 启用了元数据路由（请参阅 sklearn.set_config）。请查看用户指南以了解路由机制的工作原理。

每个参数的选项如下：

True：请求元数据，如果提供则传递给 score。如果未提供元数据，则忽略该请求。
False：不请求元数据，元估计器不会将其传递给 score。
None：不请求元数据，如果用户提供元数据，元估计器将引发错误。
str：应将元数据以给定别名而不是原始名称传递给元估计器。

默认值 (sklearn.utils.metadata_routing.UNCHANGED) 保留现有请求。这允许您更改某些参数的请求而不更改其他参数。

在版本 1.3 中新增。

参数:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: score 方法中 sample_weight 参数的元数据路由。

返回:

selfobject: 更新后的对象。

Gallery examples#

单个估计器 vs 装袋：偏差-方差分解

BaggingRegressor#

Gallery examples#

本页