fetch_california_housing#

sklearn.datasets.fetch_california_housing(*, data_home=None, download_if_missing=True, return_X_y=False, as_frame=False, n_retries=3, delay=1.0)[source]#

加载加州住房数据集（回归）。

样本总数	20640
维度	8
特征	实数
目标	实数 0.15 - 5.

欲了解更多信息，请参阅用户指南。

参数:

data_home字符串或路径类对象，默认为None: 为数据集指定另一个下载和缓存文件夹。默认情况下，所有 scikit-learn 数据都存储在“~/scikit_learn_data”子文件夹中。
download_if_missing布尔值，默认为True: 如果为False，当数据在本地不可用时会引发OSError，而不是尝试从源站点下载数据。
return_X_y布尔值，默认为False: 如果为True，则返回(data.data, data.target)而不是Bunch对象。

0.20 版本新增。
as_frame布尔值，默认为False: 如果为True，数据将是一个pandas DataFrame，其中包含具有适当dtypes（数值、字符串或分类）的列。目标将是pandas DataFrame或Series，具体取决于target_columns的数量。

0.23 版本新增。
n_retries整数，默认为3: 遇到HTTP错误时的重试次数。

1.5 版本新增。
delay浮点数，默认为1.0: 两次重试之间的秒数。

1.5 版本新增。

返回:

datasetBunch

字典类对象，包含以下属性。

datandarray，形状 (20640, 8): 每行按顺序对应8个特征值。如果as_frame为True，则data是一个pandas对象。
target形状为 (20640,) 的numpy数组: 每个值对应以100,000为单位的平均房价。如果as_frame为True，则target是一个pandas对象。
feature_names长度为8的列表: 数据集中使用的有序特征名称数组。
DESCR字符串: 加州住房数据集的描述。
framepandas DataFrame: 仅当as_frame=True时存在。包含data和target的DataFrame。

0.23 版本新增。

(data, target)如果return_X_y为True，则为元组

包含两个ndarray的元组。第一个包含一个形状为(n_samples, n_features)的2D数组，其中每行代表一个样本，每列代表特征。第二个ndarray的形状为(n_samples,)，包含目标样本。

0.20 版本新增。

注意

该数据集包含 20,640 个样本和 9 个特征。

示例

>>> from sklearn.datasets import fetch_california_housing
>>> housing = fetch_california_housing()
>>> print(housing.data.shape, housing.target.shape)
(20640, 8) (20640,)
>>> print(housing.feature_names[0:6])
['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup']