compute_sample_weight#

sklearn.utils.class_weight.compute_sample_weight(class_weight, y, *, indices=None)[源代码]#

为不平衡数据集按类别估计样本权重。

参数:

class_weightdict, list of dicts, “balanced”, 或 None

与类别相关的权重，形式为 {class_label: weight}。如果未给出，则所有类别都被认为权重为1。对于多输出问题，可以提供一个字典列表，顺序与 y 的列相同。

请注意，对于多输出（包括多标签）问题，权重应为每列的每个类别在其自己的字典中定义。例如，对于四类别多标签分类，权重应为 [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}]，而不是 [{1:1}, {2:5}, {3:1}, {4:1}]。

"balanced" 模式使用 y 的值自动调整权重，使其与输入数据中的类别频率成反比：n_samples / (n_classes * np.bincount(y))。

对于多输出，y 的每列权重将相乘。

y{array-like, sparse matrix} 形状为 (n_samples,) 或 (n_samples, n_outputs)

每个样本的原始类别标签数组。

indicesarray-like 形状为 (n_subsample,), 默认为 None

用于子样本的索引数组。在子样本的情况下，其长度可以小于 n_samples；在带有重复索引的自举子样本的情况下，其长度可以等于 n_samples。如果为 None，则样本权重将在完整样本上计算。如果提供此参数，class_weight 仅支持 "balanced"。

返回:

sample_weight_vectndarray 形状为 (n_samples,): 应用于原始 y 的样本权重数组。

示例

>>> from sklearn.utils.class_weight import compute_sample_weight
>>> y = [1, 1, 1, 1, 0, 0]
>>> compute_sample_weight(class_weight="balanced", y=y)
array([0.75, 0.75, 0.75, 0.75, 1.5 , 1.5 ])

compute_sample_weight#

本页