compute_sample_weight#

sklearn.utils.class_weight.compute_sample_weight(class_weight, y, *, indices=None)[source]#

估算不平衡数据集的按类别样本权重。

参数:
class_weightdict, list of dicts, “balanced”, or None

与类别关联的权重,形式为 {class_label: weight}。如果未给出,则所有类别的权重都假定为一。对于多输出问题,可以按照 y 的列顺序提供一个字典列表。

Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data: n_samples / (n_classes * np.bincount(y)).

对于多输出,y 的每一列的权重将相乘。

y{array-like, sparse matrix} of shape (n_samples,) or (n_samples, n_outputs)

Array of original class labels per sample.

indicesarray-like of shape (n_subsample,), default=None

Array of indices to be used in a subsample. Can be of length less than n_samples in the case of a subsample, or equal to n_samples in the case of a bootstrap subsample with repeated indices. If None, the sample weight will be calculated over the full sample. Only "balanced" is supported for class_weight if this is provided.

返回:
sample_weight_vectndarray of shape (n_samples,)

Array with sample weights as applied to the original y.

示例

>>> from sklearn.utils.class_weight import compute_sample_weight
>>> y = [1, 1, 1, 1, 0, 0]
>>> compute_sample_weight(class_weight="balanced", y=y)
array([0.75, 0.75, 0.75, 0.75, 1.5 , 1.5 ])