版本 0.23#

有关此版本主要亮点的简要说明，请参阅 scikit-learn 0.23 的版本亮点。

更新日志图例

重大功能以前无法实现的大功能。
功能以前无法实现的功能。
效率提升现有功能现在可能不需要那么多计算或内存。
增强一般性的小改进。
修复以前无法按文档或合理预期工作的问题现在应该可以了。
API 变更未来你需要更改代码才能达到相同的效果；或者未来某个功能将被移除。

版本 0.23.2#

变更模型#

以下估计器和函数在用相同数据和参数拟合时，可能会产生与上一个版本不同的模型。这通常是由于建模逻辑（bug 修复或增强）或随机采样过程的更改所致。

修复 cluster.KMeans 和 cluster.MiniBatchKMeans 的 inertia_ 属性。

详细信息列在下面的更改日志中。

（虽然我们正试图通过提供此信息来更好地告知用户，但我们不能保证此列表完整。）

更新日志#

`sklearn.cluster`#

修复修复了 cluster.KMeans 中当 tol=0 时，舍入误差可能导致无法声明收敛的错误。 #17959 by Jérémie du Boisberranger。
修复修复了 cluster.KMeans 和 cluster.MiniBatchKMeans 中，报告的惯性被样本权重错误加权的错误。 #17848 by Jérémie du Boisberranger。
修复修复了 cluster.MeanShift 中 bin_seeding=True 的错误。当估计的带宽为 0 时，行为等同于 bin_seeding=False。 #17742 by Jeremie du Boisberranger。
修复修复了 cluster.AffinityPropagation 中，当数组 dtype 为 float32 时给出错误聚类的错误。 #17995 by Thomaz Santana and Amanda Dsouza。

`sklearn.decomposition`#

修复修复了 decomposition.MiniBatchDictionaryLearning.partial_fit 中本应只遍历一个小批量一次来更新字典的错误。 #17433 by Chiara Marmo。
修复在 Windows 上，当 batch_size 和 n_samples 值很大时，避免了 decomposition.IncrementalPCA.partial_fit 中的溢出。 #17985 by Alan Butler and Amanda Dsouza。

`sklearn.ensemble`#

修复修复了 ensemble.MultinomialDeviance 中平均 logloss 被错误计算为 logloss 总和的错误。 #17694 by Markus Rempfler and Tsutomu Kusanagi。
修复修复了 ensemble.StackingClassifier 和 ensemble.StackingRegressor 与未定义 n_features_in_ 的估计器的兼容性问题。 #17357 by Thomas Fan。

`sklearn.feature_extraction`#

修复修复了 feature_extraction.text.CountVectorizer 中，当设置了 max_features 且特征具有相同计数时，样本顺序不变性被破坏的错误。 #18016 by Thomas Fan, Roman Yurchak, and Joel Nothman。

`sklearn.linear_model`#

修复当 X_copy=True 且 Gram='auto' 时，linear_model.lars_path 不会覆盖 X。 #17914 by Thomas Fan。

`sklearn.manifold`#

修复修复了 metrics.pairwise_distances 在 metric='seuclidean' 且 X 不是 np.float64 类型时会引发错误的错误。 #15730 by Forrest Koch。

`sklearn.metrics`#

修复修复了 metrics.mean_squared_error 中，多个 RMSE 值的平均值被错误地计算为多个 MSE 值平均值的平方根的错误。 #17309 by Swier Heeres。

`sklearn.pipeline`#

修复当 transformer_list 中包含 None 时，pipeline.FeatureUnion 引发了弃用警告。 #17360 by Thomas Fan。

`sklearn.utils`#

修复修复了 utils.estimator_checks.check_estimator，使其所有测试用例都支持 binary_only 估计器标签。 #17812 by Bruno Charron。

版本 0.23.1#

2020 年 5 月 18 日

更新日志#

`sklearn.cluster`#

效率对于非常小的数据集，cluster.KMeans 的效率得到了提高。特别是，它不再会产生空闲线程。 #17210 and #17235 by Jeremie du Boisberranger。
修复修复了 cluster.KMeans 中用户提供的样本权重被就地修改的错误。 #17204 by Jeremie du Boisberranger。

杂项#

修复修复了使用 **kwargs 参数的第三方估计器在构造函数中的 repr 的错误，此时 changed_only 为 True（现在是默认值）。 #17205 by Nicolas Hug。

版本 0.23.0#

2020 年 5 月 12 日

强制关键字参数#

为了推广清晰且无歧义的库使用方式，大多数构造函数和函数参数现在应作为关键字参数（即使用 param=value 语法）而非位置参数传递。为便于过渡，如果将关键字参数用作位置参数，则会引发 FutureWarning。在版本 1.0（0.25 的重命名版本）中，这些参数将严格为关键字参数，并会引发 TypeError。 #15005 by Joel Nothman, Adrin Jalali, Thomas Fan, and Nicolas Hug。有关更多详细信息，请参阅 SLEP009。

更改的模型#

以下估计器和函数在用相同数据和参数拟合时，可能会产生与上一个版本不同的模型。这通常是由于建模逻辑（bug 修复或增强）或随机采样过程的更改所致。

修复 ensemble.BaggingClassifier, ensemble.BaggingRegressor, and ensemble.IsolationForest。
修复 cluster.KMeans with algorithm="elkan" and algorithm="full"。
修复 cluster.Birch
修复 compose.ColumnTransformer.get_feature_names
修复 compose.ColumnTransformer.fit
修复 datasets.make_multilabel_classification
修复 decomposition.PCA with n_components='mle'
增强 decomposition.NMF and decomposition.non_negative_factorization with float32 dtype input.
修复 decomposition.KernelPCA.inverse_transform
API 更改 ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor
修复 estimator_samples_ in ensemble.BaggingClassifier, ensemble.BaggingRegressor and ensemble.IsolationForest
修复 ensemble.StackingClassifier and ensemble.StackingRegressor with sample_weight
修复 gaussian_process.GaussianProcessRegressor
修复 linear_model.RANSACRegressor with sample_weight。
修复 linear_model.RidgeClassifierCV
修复 metrics.mean_squared_error with squared and multioutput='raw_values'。
修复 metrics.mutual_info_score with negative scores.
修复 metrics.confusion_matrix with zero length y_true and y_pred
修复 neural_network.MLPClassifier
修复 preprocessing.StandardScaler with partial_fit and sparse input.
修复 preprocessing.Normalizer with norm=’max’
修复任何使用 svm.libsvm 或 svm.liblinear 求解器的模型，包括 svm.LinearSVC, svm.LinearSVR, svm.NuSVC, svm.NuSVR, svm.OneClassSVM, svm.SVC, svm.SVR, linear_model.LogisticRegression。
修复 tree.DecisionTreeClassifier, tree.ExtraTreeClassifier and ensemble.GradientBoostingClassifier as well as predict method of tree.DecisionTreeRegressor, tree.ExtraTreeRegressor, and ensemble.GradientBoostingRegressor and read-only float32 input in predict, decision_path and predict_proba.

详细信息列在下面的更改日志中。

（虽然我们正试图通过提供此信息来更好地告知用户，但我们不能保证此列表完整。）

更改日志#

`sklearn.cluster`#

效率 cluster.Birch implementation of the predict method avoids high memory footprint by calculating the distances matrix using a chunked scheme. #16149 by Jeremie du Boisberranger and Alex Shacked。
效率主要功能 The critical parts of cluster.KMeans have a more optimized implementation. Parallelism is now over the data instead of over initializations allowing better scalability. #11950 by Jeremie du Boisberranger。
增强 cluster.KMeans now supports sparse data when solver = "elkan". #11950 by Jeremie du Boisberranger。
增强 cluster.AgglomerativeClustering has a faster and more memory efficient implementation of single linkage clustering. #11514 by Leland McInnes。
修复 cluster.KMeans with algorithm="elkan" now converges with tol=0 as with the default algorithm="full". #16075 by Erich Schubert。
修复修复了 cluster.Birch 中 n_clusters 参数不能是 np.int64 类型的错误。 #16484 by Jeremie du Boisberranger。
修复 cluster.AgglomerativeClustering 添加了特定错误，当距离矩阵非方形且 affinity=precomputed 时。 #16257 by Simona Maggio。
API 更改 cluster.KMeans, cluster.SpectralCoclustering and cluster.SpectralBiclustering 的 n_jobs 参数已被弃用。它们现在使用基于 OpenMP 的并行处理。有关如何控制线程数的更多详细信息，请参阅我们的并行处理说明。 #11950 by Jeremie du Boisberranger。
API 更改 cluster.KMeans 的 precompute_distances 参数已被弃用。它没有效果。 #11950 by Jeremie du Boisberranger。
API 更改 random_state 参数已添加到 cluster.AffinityPropagation。 #16801 by @rcwoolston and Chiara Marmo。

`sklearn.compose`#

效率 compose.ColumnTransformer 在处理数据帧并使用字符串来为转换器指定数据子集时现在更快。 #16431 by Thomas Fan。
增强 compose.ColumnTransformer 方法 get_feature_names 现在支持 'passthrough' 列，特征名称可以是数据帧的列名，或者对于索引为 i 的列，可以是 'xi'。 #14048 by Lewis Ball。
修复 compose.ColumnTransformer 方法 get_feature_names 现在在其中一个转换器步骤应用于空列列表时返回正确的结果 #15963 by Roman Yurchak。
修复当在数据帧中选择的列名不唯一时，compose.ColumnTransformer.fit 将会出错。 #16431 by Thomas Fan。

`sklearn.datasets`#

效率 datasets.fetch_openml 内存使用量有所减少，因为它不再将完整的数据集文本流存储在内存中。 #16084 by Joel Nothman。
功能 datasets.fetch_california_housing 现在通过设置 as_frame=True 支持使用 pandas 处理异构数据。 #15950 by Stephanie Andrews and Reshama Shaikh。
功能内置数据集加载器 datasets.load_breast_cancer, datasets.load_diabetes, datasets.load_digits, datasets.load_iris, datasets.load_linnerud and datasets.load_wine 现在通过设置 as_frame=True 支持加载为 pandas DataFrame。 #15980 by @wconnell and Reshama Shaikh。
增强在 datasets.make_blobs 中添加了 return_centers 参数，可用于返回每个簇的中心。 #15709 by @shivamgargsya and Venkatachalam N。
增强函数 datasets.make_circles and datasets.make_moons now accept two-element tuple. #15707 by Maciej J Mikulski。
修复 datasets.make_multilabel_classification now generates ValueError for arguments n_classes < 1 OR length < 1. #16006 by Rushabh Vasani。
API 更改 The StreamHandler was removed from sklearn.logger to avoid double logging of messages in common cases where a handler is attached to the root logger, and to follow the Python logging documentation recommendation for libraries to leave the log message handling to users and application code. #16451 by Christoph Deil。

`sklearn.decomposition`#

增强 decomposition.NMF and decomposition.non_negative_factorization now preserves float32 dtype. #16280 by Jeremie du Boisberranger。
增强 decomposition.TruncatedSVD.transform is now faster on given sparse csc matrices. #16837 by @wornbb。
修复 decomposition.PCA with a float n_components parameter, will exclusively choose the components that explain the variance greater than n_components. #15669 by Krishna Chaitanya
修复 decomposition.PCA with n_components='mle' now correctly handles small eigenvalues, and does not infer 0 as the correct number of components. #16224 by Lisa Schwetlick, and Gelavizh Ahmadi and Marija Vlajic Wheeler and #16841 by Nicolas Hug。
修复 decomposition.KernelPCA method inverse_transform now applies the correct inverse transform to the transformed data. #16655 by Lewis Ball。
修复 Fixed bug that was causing decomposition.KernelPCA to sometimes raise invalid value encountered in multiply during fit. #16718 by Gui Miotto。
功能 Added n_components_ attribute to decomposition.SparsePCA and decomposition.MiniBatchSparsePCA. #16981 by Mateusz Górski。

`sklearn.ensemble`#

主要功能 ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor now support sample_weight. #14696 by Adrin Jalali and Nicolas Hug。
功能 Early stopping in ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor is now determined with a new early_stopping parameter instead of n_iter_no_change. Default value is ‘auto’, which enables early stopping if there are at least 10,000 samples in the training set. #14516 by Johann Faouzi。
主要功能 ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor now support monotonic constraints, useful when features are supposed to have a positive/negative effect on the target. #15582 by Nicolas Hug。
API 更改 Added boolean verbose flag to classes: ensemble.VotingClassifier and ensemble.VotingRegressor. #16069 by Sam Bail, Hanna Bruce MacDonald, Reshama Shaikh, and Chiara Marmo。
API 更改 Fixed a bug in ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor that would not respect the max_leaf_nodes parameter if the criteria was reached at the same time as the max_depth criteria. #16183 by Nicolas Hug。
修复 Changed the convention for max_depth parameter of ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor. The depth now corresponds to the number of edges to go from the root to the deepest leaf. Stumps (trees with one split) are now allowed. #16182 by Santhosh B
修复 Fixed a bug in ensemble.BaggingClassifier, ensemble.BaggingRegressor and ensemble.IsolationForest where the attribute estimators_samples_ did not generate the proper indices used during fit. #16437 by Jin-Hwan CHO.
修复 Fixed a bug in ensemble.StackingClassifier and ensemble.StackingRegressor where the sample_weight argument was not being passed to cross_val_predict when evaluating the base estimators on cross-validation folds to obtain the input to the meta estimator. #16539 by Bill DeRose.
功能 Added additional option loss="poisson" to ensemble.HistGradientBoostingRegressor, which adds Poisson deviance with log-link useful for modeling count data. #16692 by Christian Lorentzen
修复 Fixed a bug where ensemble.HistGradientBoostingRegressor and ensemble.HistGradientBoostingClassifier would fail with multiple calls to fit when warm_start=True, early_stopping=True, and there is no validation set. #16663 by Thomas Fan。

`sklearn.feature_extraction`#

效率 feature_extraction.text.CountVectorizer now sorts features after pruning them by document frequency. This improves performances for datasets with large vocabularies combined with min_df or max_df. #15834 by Santiago M. Mola。

`sklearn.feature_selection`#

增强 Added support for multioutput data in feature_selection.RFE and feature_selection.RFECV. #16103 by Divyaprabha M。
API 更改 Adds feature_selection.SelectorMixin back to public API. #16132 by @trimeta。

`sklearn.gaussian_process`#

增强 gaussian_process.kernels.Matern returns the RBF kernel when nu=np.inf. #15503 by Sam Dixon。
修复 Fixed bug in gaussian_process.GaussianProcessRegressor that caused predicted standard deviations to only be between 0 and 1 when WhiteKernel is not used. #15782 by @plgreenLIRU。

`sklearn.impute`#

增强 impute.IterativeImputer accepts both scalar and array-like inputs for max_value and min_value. Array-like inputs allow a different max and min to be specified for each feature. #16403 by Narendra Mukherjee。
增强 impute.SimpleImputer, impute.KNNImputer, and impute.IterativeImputer accepts pandas’ nullable integer dtype with missing values. #16508 by Thomas Fan。

`sklearn.inspection`#

功能 inspection.partial_dependence and inspection.plot_partial_dependence now support the fast ‘recursion’ method for ensemble.RandomForestRegressor and tree.DecisionTreeRegressor. #15864 by Nicolas Hug。

`sklearn.linear_model`#

主要功能 Added generalized linear models (GLM) with non normal error distributions, including linear_model.PoissonRegressor, linear_model.GammaRegressor and linear_model.TweedieRegressor which use Poisson, Gamma and Tweedie distributions respectively. #14300 by Christian Lorentzen, Roman Yurchak, and Olivier Grisel。
主要功能 Support of sample_weight in linear_model.ElasticNet and linear_model.Lasso for dense feature matrix X. #15436 by Christian Lorentzen。
效率 linear_model.RidgeCV and linear_model.RidgeClassifierCV now do not allocate a potentially large array to store dual coefficients for all hyperparameters during its fit, nor an array to store all error or LOO predictions unless store_cv_values is True. #15652 by Jérôme Dockès。
Enhancement linear_model.LassoLars 和 linear_model.Lars 现在支持 jitter 参数，该参数会向目标添加随机噪声。这可能有助于在某些边缘情况下提高稳定性。 #15179 由 @angelaambroz 贡献。
Fix 修复了一个 bug：当 sample_weight 参数被传递给 linear_model.RANSACRegressor 的 fit 方法时，在拟合最终模型时不会将其传递给包装的 base_estimator。 #15773 由 Jeremy Alexandre 贡献。
Fix 为 linear_model.RidgeCV 和 linear_model.RidgeClassifierCV 添加了 best_score_ 属性。 #15655 由 Jérôme Dockès 贡献。
Fix 修复了 linear_model.RidgeClassifierCV 中传递特定评分策略的 bug。在此之前，内部估计器会输出分数而不是预测值。 #14848 由 Venkatachalam N 贡献。
Fix linear_model.LogisticRegression 现在会通过检查 absgrad 和 tol 在 utils.optimize._newton_cg 中的最大值是否小于等于（而不是严格小于）来避免不必要的迭代，当 solver='newton-cg' 时。 #16266 由 Rushabh Vasani 贡献。
API Change 弃用了 linear_model.SGDClassifier, linear_model.SGDRegressor, linear_model.PassiveAggressiveClassifier, linear_model.PassiveAggressiveRegressor 中的公共属性 standard_coef_, standard_intercept_, average_coef_, 和 average_intercept_。 #16261 由 Carlos Brandt 贡献。
Fix Efficiency 当 n_samples > n_features 时，linear_model.ARDRegression 更加稳定且速度更快。现在它可以扩展到数十万个样本。稳定性修复可能意味着非零系数数量和预测输出的变化。 #16849 由 Nicolas Hug 贡献。
Fix 修复了 linear_model.ElasticNetCV, linear_model.MultiTaskElasticNetCV, linear_model.LassoCV 和 linear_model.MultiTaskLassoCV 中在使用 joblib loky 后端时拟合失败的 bug。 #14264 由 Jérémie du Boisberranger 贡献。
Efficiency 通过避免对小型数组使用较慢的 BLAS Level 2 调用，加速了 linear_model.MultiTaskLasso, linear_model.MultiTaskLassoCV, linear_model.MultiTaskElasticNet, linear_model.MultiTaskElasticNetCV。 #17021 由 Alex Gramfort 和 Mathurin Massias 贡献。

`sklearn.metrics`#

Enhancement metrics.pairwise_distances_chunked 现在允许其 reduce_func 没有返回值，从而支持原地操作。 #16397 由 Joel Nothman 贡献。
Fix 修复了 metrics.mean_squared_error 中当 multioutput='raw_values' 时不忽略 squared 参数的 bug。 #16323 由 Rushabh Vasani 贡献。
Fix 修复了 metrics.mutual_info_score 中可能返回负分数的 bug。 #16362 由 Thomas Fan 贡献。
Fix 修复了 metrics.confusion_matrix 在 y_true 和 y_pred 长度为零且 labels 不为 None 时会引发错误的 bug。此外，当给 labels 参数传递空列表时，我们会引发一个错误。 #16442 由 Kyle Parsons 贡献。
API Change 更改了 metrics.ConfusionMatrixDisplay.plot 和 metrics.plot_confusion_matrix 中值的格式，以选择更短的格式（'2g' 或 'd'）。 #16159 由 Rick Mackenbach 和 Thomas Fan 贡献。
API Change 从 0.25 版本开始，如果传递了 Y，metrics.pairwise_distances 将不再自动计算 Mahalanobis 距离的 VI 参数和 seuclidean 距离的 V 参数。用户需要自己计算此参数并将其传递给 pairwise_distances。 #16993 由 Joel Nothman 贡献。

`sklearn.model_selection`#

Enhancement model_selection.GridSearchCV 和 model_selection.RandomizedSearchCV 在 fit 失败的警告消息中除了之前发出的类型和细节外，还提供了堆栈跟踪信息。 #15622 由 Gregory Morse 贡献。
Fix 当 y=None 时，model_selection.cross_val_predict 支持 method="predict_proba"。 #15918 由 Luca Kubin 贡献。
Fix model_selection.fit_grid_point 在 0.23 版本中被弃用，将在 0.25 版本中移除。 #16401 由 Arie Pratama Sutiono 贡献。

`sklearn.multioutput`#

Feature multioutput.MultiOutputRegressor.fit 和 multioutput.MultiOutputClassifier.fit 现在可以接受 fit_params 参数，以传递给每个步骤的 estimator.fit 方法。 #15953 #15959 由 Ke Huang 贡献。
Enhancement multioutput.RegressorChain 现在在 fit 过程中支持 base_estimator 的 fit_params。 #16111 由 Venkatachalam N 贡献。

`sklearn.naive_bayes`#

Fix 当 predict 和 fit 之间的特征数量不同时，在 naive_bayes.CategoricalNB 中会显示正确格式的错误消息。 #16090 由 Madhura Jayaratne 贡献。

`sklearn.neural_network`#

Efficiency 使用随机求解器（'sgd' 或 'adam'）和 shuffle=True 时，neural_network.MLPClassifier 和 neural_network.MLPRegressor 的内存占用有所减少。 #14075 由 @meyer89 贡献。
Fix 通过裁剪概率，提高了 neural_network.MLPClassifier 中逻辑损失函数的数值稳定性。 #16117 由 Thomas Fan 贡献。

`sklearn.inspection`#

Enhancement inspection.PartialDependenceDisplay 现在将分位数线公开为属性，以便可以隐藏或自定义它们。 #15785 由 Nicolas Hug 贡献。

`sklearn.preprocessing`#

Feature preprocessing.OneHotEncoder 的 drop 参数现在接受值 'if_binary'，并将删除每个具有两个类别的特征的第一个类别。 #16245 由 Rushabh Vasani 贡献。
Enhancement preprocessing.OneHotEncoder 的 drop_idx_ ndarray 现在可以包含 None，其中 drop_idx_[i] = None 表示索引 i 没有删除任何类别。 #16585 由 Chiara Marmo 贡献。
Enhancement preprocessing.MaxAbsScaler, preprocessing.MinMaxScaler, preprocessing.StandardScaler, preprocessing.PowerTransformer, preprocessing.QuantileTransformer, preprocessing.RobustScaler 现在支持 pandas 的可空整数 dtype 和缺失值。 #16508 由 Thomas Fan 贡献。
Efficiency preprocessing.OneHotEncoder 的转换速度现已加快。 #15762 由 Thomas Fan 贡献。
Fix 修复了 preprocessing.StandardScaler 在对稀疏输入调用 partial_fit 时错误计算统计量的 bug。 #16466 由 Guillaume Lemaitre 贡献。
Fix 修复了 preprocessing.Normalizer 的 norm='max' 参数的 bug，该参数在归一化向量之前没有取最大值的绝对值。 #16632 由 Maura Pintor 和 Battista Biggio 贡献。

`sklearn.semi_supervised`#

Fix semi_supervised.LabelSpreading 和 semi_supervised.LabelPropagation 避免了在归一化 label_distributions_ 时出现除零警告。 #15946 由 @ngshya 贡献。

`sklearn.svm`#

Fix Efficiency 改进了 libsvm 和 liblinear 中用于在坐标下降算法中随机选择坐标的随机数生成器。以前使用的是平台相关的 C rand()，在 Windows 平台上只能生成小于等于 32767 的数字（参见此博客文章），并且随机化能力较差，正如此演示文稿所建议的。现已替换为 C++11 mt19937，一个 Mersenne Twister，它可以在所有平台上正确生成 31 位/63 位随机数。此外，用于获取有界区间的随机数的粗略“模”后处理器已被替换为修改后的 Lemire 方法，如此博客文章所建议的。任何使用 svm.libsvm 或 svm.liblinear 求解器的模型，包括 svm.LinearSVC, svm.LinearSVR, svm.NuSVC, svm.NuSVR, svm.OneClassSVM, svm.SVC, svm.SVR, linear_model.LogisticRegression, 都会受到影响。特别是，当样本数量（LibSVM）或特征数量（LibLinear）很大时，用户可以期待更好的收敛性。 #13511 由 Sylvain Marié 贡献。
Fix 修复了 svm.SVC 和 svm.SVR 中使用不接受浮点数输入的自定义核（如字符串核）的问题。请注意，自定义核现在需要验证其输入，而以前它们会收到有效的数值数组。 #11296 由 Alexandre Gramfort 和 Georgi Peev 贡献。
API Change svm.SVR 和 svm.OneClassSVM 的属性 probA_ 和 probB_ 已被弃用，因为它们没有实际用途。 #15558 由 Thomas Fan 贡献。

`sklearn.tree`#

Fix tree.plot_tree 的 rotate 参数未使用且已被弃用。 #15806 由 Chiara Marmo 贡献。
Fix 修复了 tree.DecisionTreeClassifier, tree.ExtraTreeClassifier 和 ensemble.GradientBoostingClassifier 的 predict, decision_path 和 predict_proba 方法，以及 tree.DecisionTreeRegressor, tree.ExtraTreeRegressor, 和 ensemble.GradientBoostingRegressor 的 predict 方法对只读 float32 数组输入的兼容性。 #16331 由 Alexandre Batisse 贡献。

`sklearn.utils`#

Major Feature Estimators 现在可以显示丰富的 HTML 表示。这可以在 Jupyter notebook 中通过在 set_config 中设置 display='diagram' 来启用。可以通过使用 utils.estimator_html_repr 返回原始 HTML。 #14180 由 Thomas Fan 贡献。
Enhancement 改进了 utils.validation.column_or_1d 的错误消息。 #15926 由 Loïc Estève 贡献。
Enhancement 在 utils.check_array 中为 pandas sparse DataFrame 添加了警告。 #16021 由 Rushabh Vasani 贡献。
Enhancement utils.check_array 现在可以从包含仅 SparseArray 列的 pandas DataFrame 构建稀疏矩阵。 #16728 由 Thomas Fan 贡献。
Enhancement 当 force_all_finite 设置为 False 或 'allow-nan' 时，utils.check_array 支持 pandas 的可空整数 dtype 和缺失值，此时数据将被转换为浮点值，其中 pd.NA 值被替换为 np.nan。因此，所有接受带有缺失值（表示为 np.nan）的数值输入的 sklearn.preprocessing 转换器现在也接受直接传入使用 pd.NA 作为缺失值标记的 pd.Int* 或 `pd.Uint* 类型列的 pandas DataFrame。 #16508 由 Thomas Fan 贡献。
API Change 将类传递给 utils.estimator_checks.check_estimator 和 utils.estimator_checks.parametrize_with_checks 已被弃用，并且将在 0.24 版本中移除对类的支持。请改用实例。 #17032 由 Nicolas Hug 贡献。
API Change utils.estimator_checks 中的私有实用函数 _safe_tags 已被移除，因此所有标签都应通过 estimator._get_tags() 获取。请注意，为了使 _get_tags() 正常工作，Mixin（如 RegressorMixin）必须排在基类之前。 #16950 由 Nicolas Hug 贡献。
Fix utils.all_estimators 现在只返回公共估计器。 #15380 由 Thomas Fan 贡献。

Miscellaneous#

Major Feature 添加了估计器的 HTML 表示，用于在 jupyter notebook 或 lab 中显示。此可视化是通过在 sklearn.set_config 中设置 display 选项来激活的。 #14180 由 Thomas Fan 贡献。
Enhancement scikit-learn 现在可以与 mypy 无错误地工作。 #16726 由 Roman Yurchak 贡献。
API Change 大多数估计器现在都公开了一个 n_features_in_ 属性。此属性等于传递给 fit 方法的特征数量。有关详细信息，请参阅 SLEP010。 #16112 由 Nicolas Hug 贡献。
API Change 估计器现在有一个 requires_y 标签，默认情况下为 False，除非估计器继承自 ~sklearn.base.RegressorMixin 或 ~sklearn.base.ClassifierMixin。此标签用于确保当期望 y 但传递了 None 时，会引发正确的错误消息。 #16622 由 Nicolas Hug 贡献。
API Change 默认设置 print_changed_only 已从 False 更改为 True。这意味着当打印估计器时，估计器的 repr 现在更加简洁，并且仅显示值已更改的参数。您可以通过使用 sklearn.set_config(print_changed_only=False) 来恢复以前的行为。此外，请注意，始终可以使用 est.get_params(deep=False) 快速检查任何估计器的参数。 #17061 由 Nicolas Hug 贡献。

代码和文档贡献者

感谢自 0.22 版本以来为项目的维护和改进做出贡献的所有人，包括：

Abbie Popa, Adrin Jalali, Aleksandra Kocot, Alexandre Batisse, Alexandre Gramfort, Alex Henrie, Alex Itkes, Alex Liang, alexshacked, Alonso Silva Allende, Ana Casado, Andreas Mueller, Angela Ambroz, Ankit810, Arie Pratama Sutiono, Arunav Konwar, Baptiste Maingret, Benjamin Beier Liu, bernie gray, Bharathi Srinivasan, Bharat Raghunathan, Bibhash Chandra Mitra, Brian Wignall, brigi, Brigitta Sipőcz, Carlos H Brandt, CastaChick, castor, cgsavard, Chiara Marmo, Chris Gregory, Christian Kastner, Christian Lorentzen, Corrie Bartelheimer, Daniël van Gelder, Daphne, David Breuer, david-cortes, dbauer9, Divyaprabha M, Edward Qian, Ekaterina Borovikova, ELNS, Emily Taylor, Erich Schubert, Eric Leung, Evgeni Chasnovski, Fabiana, Facundo Ferrín, Fan, Franziska Boenisch, Gael Varoquaux, Gaurav Sharma, Geoffrey Bolmier, Georgi Peev, gholdman1, Gonthier Nicolas, Gregory Morse, Gregory R. Lee, Guillaume Lemaitre, Gui Miotto, Hailey Nguyen, Hanmin Qin, Hao Chun Chang, HaoYin, Hélion du Mas des Bourboux, Himanshu Garg, Hirofumi Suzuki, huangk10, Hugo van Kemenade, Hye Sung Jung, indecisiveuser, inderjeet, J-A16, Jérémie du Boisberranger, Jin-Hwan CHO, JJmistry, Joel Nothman, Johann Faouzi, Jon Haitz Legarreta Gorroño, Juan Carlos Alfaro Jiménez, judithabk6, jumon, Kathryn Poole, Katrina Ni, Kesshi Jordan, Kevin Loftis, Kevin Markham, krishnachaitanya9, Lam Gia Thuan, Leland McInnes, Lisa Schwetlick, lkubin, Loic Esteve, lopusz, lrjball, lucgiffon, lucyleeow, Lucy Liu, Lukas Kemkes, Maciej J Mikulski, Madhura Jayaratne, Magda Zielinska, maikia, Mandy Gu, Manimaran, Manish Aradwad, Maren Westermann, Maria, Mariana Meireles, Marie Douriez, Marielle, Mateusz Górski, mathurinm, Matt Hall, Maura Pintor, mc4229, meyer89, m.fab, Michael Shoemaker, Michał Słapek, Mina Naghshhnejad, mo, Mohamed Maskani, Mojca Bertoncelj, narendramukherjee, ngshya, Nicholas Won, Nicolas Hug, nicolasservel, Niklas, @nkish, Noa Tamir, Oleksandr Pavlyk, olicairns, Oliver Urs Lenz, Olivier Grisel, parsons-kyle-89, Paula, Pete Green, Pierre Delanoue, pspachtholz, Pulkit Mehta, Qizhi Jiang, Quang Nguyen, rachelcjordan, raduspaimoc, Reshama Shaikh, Riccardo Folloni, Rick Mackenbach, Ritchie Ng, Roman Feldbauer, Roman Yurchak, Rory Hartong-Redden, Rüdiger Busche, Rushabh Vasani, Sambhav Kothari, Samesh Lakhotia, Samuel Duan, SanthoshBala18, Santiago M. Mola, Sarat Addepalli, scibol, Sebastian Kießling, SergioDSR, Sergul Aydore, Shiki-H, shivamgargsya, SHUBH CHATTERJEE, Siddharth Gupta, simonamaggio, smarie, Snowhite, stareh, Stephen Blystone, Stephen Marsh, Sunmi Yoon, SylvainLan, talgatomarov, tamirlan1, th0rwas, theoptips, Thomas J Fan, Thomas Li, Thomas Schmitt, Tim Nonner, Tim Vink, Tiphaine Viard, Tirth Patel, Titus Christian, Tom Dupré la Tour, trimeta, Vachan D A, Vandana Iyer, Venkatachalam N, waelbenamara, wconnell, wderose, wenliwyan, Windber, wornbb, Yu-Hang “Maxin” Tang

版本 0.23#

版本 0.23.2#

变更模型#

更新日志#

版本 0.23.1#

更新日志#

杂项#

版本 0.23.0#

强制关键字参数#

更改的模型#

更改日志#

Miscellaneous#

本页