compute_sample_weight#
- sklearn.utils.class_weight.compute_sample_weight(class_weight, y, *, indices=None)[source]#
估算不平衡数据集的按类别样本权重。
- 参数:
- class_weightdict, list of dicts, “balanced”, or None
与类别关联的权重,形式为
{class_label: weight}。如果未给出,则所有类别的权重都假定为一。对于多输出问题,可以按照 y 的列顺序提供一个字典列表。Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be
[{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}]instead of[{1:1}, {2:5}, {3:1}, {4:1}].The
"balanced"mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data:n_samples / (n_classes * np.bincount(y)).对于多输出,y 的每一列的权重将相乘。
- y{array-like, sparse matrix} of shape (n_samples,) or (n_samples, n_outputs)
Array of original class labels per sample.
- indicesarray-like of shape (n_subsample,), default=None
Array of indices to be used in a subsample. Can be of length less than
n_samplesin the case of a subsample, or equal ton_samplesin the case of a bootstrap subsample with repeated indices. IfNone, the sample weight will be calculated over the full sample. Only"balanced"is supported forclass_weightif this is provided.
- 返回:
- sample_weight_vectndarray of shape (n_samples,)
Array with sample weights as applied to the original
y.
示例
>>> from sklearn.utils.class_weight import compute_sample_weight >>> y = [1, 1, 1, 1, 0, 0] >>> compute_sample_weight(class_weight="balanced", y=y) array([0.75, 0.75, 0.75, 0.75, 1.5 , 1.5 ])