fetch_lfw_people#

sklearn.datasets.fetch_lfw_people(*, data_home=None, funneled=True, resize=0.5, min_faces_per_person=0, color=False, slice_=(slice(70, 195, None), slice(78, 172, None)), download_if_missing=True, return_X_y=False, n_retries=3, delay=1.0)[source]#

加载 Labeled Faces in the Wild (LFW) 人物数据集(分类)。

Download it if necessary.

类别数

5749

样本总数

13233

维度

5828

特征值范围

实数,介于 0 和 255 之间

For a usage example of this dataset, see Faces recognition example using eigenfaces and SVMs.

Read more in the User Guide.

参数:
data_homestr or path-like, default=None

为数据集指定另一个下载和缓存文件夹。默认情况下,所有 scikit-learn 数据都存储在 ‘~/scikit_learn_data’ 子文件夹中。

funneledbool, default=True

Download and use the funneled variant of the dataset.

resizefloat or None, default=0.5

Ratio used to resize the each face picture. If None, no resizing is performed.

min_faces_per_personint, default=None

The extracted dataset will only retain pictures of people that have at least min_faces_per_person different pictures.

colorbool, default=False

Keep the 3 RGB channels instead of averaging them to a single gray level channel. If color is True the shape of the data has one more dimension than the shape with color = False.

slice_tuple of slice, default=(slice(70, 195), slice(78, 172))

Provide a custom 2D slice (height, width) to extract the ‘interesting’ part of the jpeg files and avoid use statistical correlation from the background.

download_if_missingbool, default=True

If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site.

return_X_ybool, default=False

If True, returns (dataset.data, dataset.target) instead of a Bunch object. See below for more information about the dataset.data and dataset.target object.

0.20 版本新增。

n_retriesint, default=3

Number of retries when HTTP errors are encountered.

1.5 版本新增。

delayfloat, default=1.0

Number of seconds between retries.

1.5 版本新增。

返回:
datasetBunch

Dictionary-like object, with the following attributes.

datanumpy array of shape (13233, 2914)

Each row corresponds to a ravelled face image of original size 62 x 47 pixels. Changing the slice_ or resize parameters will change the shape of the output.

imagesnumpy array of shape (13233, 62, 47)

Each row is a face image corresponding to one of the 5749 people in the dataset. Changing the slice_ or resize parameters will change the shape of the output.

targetnumpy array of shape (13233,)

Labels associated to each face image. Those labels range from 0-5748 and correspond to the person IDs.

target_namesnumpy array of shape (5749,)

Names of all persons in the dataset. Position in array corresponds to the person ID in the target array.

DESCRstr

Description of the Labeled Faces in the Wild (LFW) dataset.

(data, target)tuple if return_X_y is True

包含两个 ndarray 的元组。第一个包含一个形状为 (n_samples, n_features) 的二维数组,其中每一行代表一个样本,每一列代表特征。第二个 ndarray 的形状为 (n_samples,),包含目标样本。

0.20 版本新增。

示例

>>> from sklearn.datasets import fetch_lfw_people
>>> lfw_people = fetch_lfw_people()
>>> lfw_people.data.shape
(13233, 2914)
>>> lfw_people.target.shape
(13233,)
>>> for name in lfw_people.target_names[:5]:
...    print(name)
AJ Cook
AJ Lamas
Aaron Eckhart
Aaron Guiel
Aaron Patterson