BaggingRegressor#
- class sklearn.ensemble.BaggingRegressor(estimator=None, n_estimators=10, *, max_samples=None, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)[source]#
Bagging 回归器。
A Bagging regressor is an ensemble meta-estimator that fits base regressors each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.
This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [1]. If samples are drawn with replacement, then the method is known as Bagging [2]. When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [3]. Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [4].
Read more in the User Guide.
Added in version 0.15.
- 参数:
- estimatorobject, default=None
The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a
DecisionTreeRegressor.版本 1.2 中新增:
base_estimator已重命名为estimator。- n_estimatorsint, default=10
The number of base estimators in the ensemble.
- max_samplesint or float, default=None
The number of samples to draw from X to train each base estimator (with replacement by default, see
bootstrapfor more details).If None, then draw
X.shape[0]samples irrespective ofsample_weight.如果为 int,则抽取
max_samples个样本。If float, then draw
max_samples * X.shape[0]unweighted samples ormax_samples * sample_weight.sum()weighted samples.
- max_featuresint or float, default=1.0
The number of features to draw from X to train each base estimator ( without replacement by default, see
bootstrap_featuresfor more details).If int, then draw
max_featuresfeatures.If float, then draw
max(1, int(max_features * n_features_in_))features.
- bootstrapbool, default=True
Whether samples are drawn with replacement. If False, sampling without replacement is performed. If fitting with
sample_weight, it is strongly recommended to choose True, as only drawing with replacement will ensure the expected frequency semantics ofsample_weight.- bootstrap_featuresbool, default=False
Whether features are drawn with replacement.
- oob_scorebool, default=False
Whether to use out-of-bag samples to estimate the generalization error. Only available if bootstrap=True.
- warm_startbool, default=False
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. See the Glossary.
- n_jobsint, default=None
The number of jobs to run in parallel for both
fitandpredict.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. See Glossary for more details.- random_stateint, RandomState instance or None, default=None
Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts a
random_stateattribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls. See Glossary.- verboseint, default=0
控制拟合和预测时的详细程度。
- 属性:
- estimator_estimator
用于构建集成的基础估计器。
版本 1.2 中新增:
base_estimator_已重命名为estimator_。- n_features_in_int
在 拟合 期间看到的特征数。
0.24 版本新增。
- feature_names_in_shape 为 (
n_features_in_,) 的 ndarray 在 fit 期间看到的特征名称。仅当
X具有全部为字符串的特征名称时才定义。1.0 版本新增。
- estimators_list of estimators
已拟合子估计器的集合。
estimators_samples_list of arrays每个基本估计器抽取的样本子集。
- estimators_features_list of arrays
The subset of drawn features for each base estimator.
- oob_score_float
使用袋外估计获得的训练数据集分数。此属性仅当
oob_score为 True 时存在。- oob_prediction_ndarray of shape (n_samples,)
Prediction computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case,
oob_prediction_might contain NaN. This attribute exists only whenoob_scoreis True.
另请参阅
BaggingClassifierBagging 分类器。
References
[1]L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, 36(1), 85-103, 1999.
[2]L. Breiman, “Bagging predictors”, Machine Learning, 24(2), 123-140, 1996.
[3]T. Ho, “The random subspace method for constructing decision forests”, Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.
[4]G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.
示例
>>> from sklearn.svm import SVR >>> from sklearn.ensemble import BaggingRegressor >>> from sklearn.datasets import make_regression >>> X, y = make_regression(n_samples=100, n_features=4, ... n_informative=2, n_targets=1, ... random_state=0, shuffle=False) >>> regr = BaggingRegressor(estimator=SVR(), ... n_estimators=10, random_state=0).fit(X, y) >>> regr.predict([[0, 0, 0, 0]]) array([-2.8720])
- fit(X, y, sample_weight=None, **fit_params)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- 参数:
- Xshape 为 (n_samples, n_features) 的 {array-like, sparse matrix}
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
目标值(分类中的类别标签,回归中的实数)。
- sample_weightshape 为 (n_samples,) 的 array-like, default=None
Sample weights. If None, then samples are equally weighted. Used as probabilities to sample the training set. Note that the expected frequency semantics for the
sample_weightparameter are only fulfilled when sampling with replacementbootstrap=Trueand using a float or integermax_samples(instead of the defaultmax_samples=None).- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if
enable_metadata_routing=True, which can be set by usingsklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- 返回:
- selfobject
拟合的估计器。
- get_metadata_routing()[source]#
获取此对象的元数据路由。
请查阅 用户指南,了解路由机制如何工作。
1.5 版本新增。
- 返回:
- routingMetadataRouter
封装路由信息的
MetadataRouter。
- get_params(deep=True)[source]#
获取此估计器的参数。
- 参数:
- deepbool, default=True
如果为 True,将返回此估计器以及包含的子对象(如果它们是估计器)的参数。
- 返回:
- paramsdict
参数名称映射到其值。
- predict(X, **params)[source]#
Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- 参数:
- Xshape 为 (n_samples, n_features) 的 {array-like, sparse matrix}
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
Parameters routed to the
predictmethod of the sub-estimators via the metadata routing API.Added in version 1.7: Only available if
sklearn.set_config(enable_metadata_routing=True)is set. See Metadata Routing User Guide for more details.
- 返回:
- yndarray of shape (n_samples,)
预测值。
- score(X, y, sample_weight=None)[source]#
返回测试数据的 决定系数。
The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value ofy, disregarding the input features, would get a \(R^2\) score of 0.0.- 参数:
- Xshape 为 (n_samples, n_features) 的 array-like
测试样本。对于某些估计器,这可能是一个预先计算的核矩阵或一个通用对象列表,形状为
(n_samples, n_samples_fitted),其中n_samples_fitted是用于估计器拟合的样本数。- yshape 为 (n_samples,) 或 (n_samples, n_outputs) 的 array-like
X的真实值。- sample_weightshape 为 (n_samples,) 的 array-like, default=None
样本权重。
- 返回:
- scorefloat
self.predict(X)相对于y的 \(R^2\)。
注意事项
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score. This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BaggingRegressor[source]#
配置是否应请求元数据以传递给
fit方法。请注意,此方法仅在以下情况下相关:此估计器用作 元估计器 中的子估计器,并且通过
enable_metadata_routing=True启用了元数据路由(请参阅sklearn.set_config)。请查看 用户指南 以了解路由机制的工作原理。每个参数的选项如下:
True:请求元数据,如果提供则传递给fit。如果未提供元数据,则忽略该请求。False:不请求元数据,元估计器不会将其传递给fit。None:不请求元数据,如果用户提供元数据,元估计器将引发错误。str:应将元数据以给定别名而不是原始名称传递给元估计器。
默认值 (
sklearn.utils.metadata_routing.UNCHANGED) 保留现有请求。这允许您更改某些参数的请求而不更改其他参数。在版本 1.3 中新增。
- 参数:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
fit方法中sample_weight参数的元数据路由。
- 返回:
- selfobject
更新后的对象。
- set_params(**params)[source]#
设置此估计器的参数。
此方法适用于简单的估计器以及嵌套对象(如
Pipeline)。后者具有<component>__<parameter>形式的参数,以便可以更新嵌套对象的每个组件。- 参数:
- **paramsdict
估计器参数。
- 返回:
- selfestimator instance
估计器实例。
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BaggingRegressor[source]#
配置是否应请求元数据以传递给
score方法。请注意,此方法仅在以下情况下相关:此估计器用作 元估计器 中的子估计器,并且通过
enable_metadata_routing=True启用了元数据路由(请参阅sklearn.set_config)。请查看 用户指南 以了解路由机制的工作原理。每个参数的选项如下:
True:请求元数据,如果提供则传递给score。如果未提供元数据,则忽略该请求。False:不请求元数据,元估计器不会将其传递给score。None:不请求元数据,如果用户提供元数据,元估计器将引发错误。str:应将元数据以给定别名而不是原始名称传递给元估计器。
默认值 (
sklearn.utils.metadata_routing.UNCHANGED) 保留现有请求。这允许您更改某些参数的请求而不更改其他参数。在版本 1.3 中新增。
- 参数:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
score方法中sample_weight参数的元数据路由。
- 返回:
- selfobject
更新后的对象。