fetch_covtype#

sklearn.datasets.fetch_covtype(*, data_home=None, download_if_missing=True, random_state=None, shuffle=False, return_X_y=False, as_frame=False, n_retries=3, delay=1.0)[source]#

加载 covertype 数据集(分类)。

Download it if necessary.

类别数

7

样本总数

581012

维度

54

特征值范围

int

Read more in the User Guide.

参数:
data_homestr or path-like, default=None

为数据集指定另一个下载和缓存文件夹。默认情况下,所有 scikit-learn 数据都存储在 ‘~/scikit_learn_data’ 子文件夹中。

download_if_missingbool, default=True

If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site.

random_stateint, RandomState instance or None, default=None

Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls. See Glossary.

shufflebool, default=False

是否打乱数据集。

return_X_ybool, default=False

If True, returns (data.data, data.target) instead of a Bunch object.

0.20 版本新增。

as_framebool, default=False

如果为 True,则数据是包含具有相应 dtypes(数字)的列的 pandas DataFrame。目标是 pandas DataFrame 或 Series,具体取决于目标列数。如果 return_X_y 为 True,则 (data, target) 将是如下所述的 pandas DataFrames 或 Series。

0.24 版本新增。

n_retriesint, default=3

Number of retries when HTTP errors are encountered.

1.5 版本新增。

delayfloat, default=1.0

Number of seconds between retries.

1.5 版本新增。

返回:
datasetBunch

Dictionary-like object, with the following attributes.

datandarray of shape (581012, 54)

Each row corresponds to the 54 features in the dataset.

targetndarray of shape (581012,)

Each value corresponds to one of the 7 forest covertypes with values ranging between 1 to 7.

framedataframe of shape (581012, 55)

Only present when as_frame=True. Contains data and target.

DESCRstr

Description of the forest covertype dataset.

feature_nameslist

数据集列的名称。

target_names: list

The names of the target columns.

(data, target)tuple if return_X_y is True

包含两个 ndarray 的元组。第一个包含一个形状为 (n_samples, n_features) 的二维数组,其中每一行代表一个样本,每一列代表特征。第二个 ndarray 的形状为 (n_samples,),包含目标样本。

0.20 版本新增。

示例

>>> from sklearn.datasets import fetch_covtype
>>> cov_type = fetch_covtype()
>>> cov_type.data.shape
(581012, 54)
>>> cov_type.target.shape
(581012,)
>>> # Let's check the 4 first feature names
>>> cov_type.feature_names[:4]
['Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology']