注意

转到结尾下载完整示例代码。或通过 JupyterLite 或 Binder 在浏览器中运行此示例

显示管线#

在 Jupyter Notebook 中显示管线的默认配置是 'diagram'，其中 set_config(display='diagram')。要禁用 HTML 表示，请使用 set_config(display='text')。

要查看管线可视化的更详细步骤，请单击管线中的步骤。

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

显示带有预处理步骤和分类器的管线#

此部分构建一个带有预处理步骤 StandardScaler 和分类器 LogisticRegression 的 Pipeline，并显示其可视化表示。

from sklearn import set_config
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

steps = [
    ("preprocessing", StandardScaler()),
    ("classifier", LogisticRegression()),
]
pipe = Pipeline(steps)

为了可视化图表，默认设置为 display='diagram'。

set_config(display="diagram")
pipe  # click on the diagram below to see the details of each step

Pipeline(steps=[('preprocessing', StandardScaler()),
                ('classifier', LogisticRegression())])

在 Jupyter 环境中，请重新运行此单元格以显示 HTML 表示或信任笔记本。
在 GitHub 上，HTML 表示无法渲染，请尝试使用 nbviewer.org 加载此页面。

管线

?Pipeline 文档i未拟合

参数

	steps	[('preprocessing', ...), ('classifier', ...)]
	transform_input	None
	memory	None
	verbose	False

StandardScaler

?StandardScaler 文档

参数

	copy	True
	with_mean	True
	with_std	True

LogisticRegression

?LogisticRegression 文档

参数

	penalty	'l2'
	dual	False
	tol	0.0001
	C	1.0
	fit_intercept	True
	intercept_scaling	1
	class_weight	None
	random_state	None
	solver	'lbfgs'
	max_iter	100
	multi_class	'deprecated'
	verbose	0
	warm_start	False
	n_jobs	None
	l1_ratio	None

要查看文本管线，请更改为 display='text'。

set_config(display="text")
pipe

Pipeline(steps=[('preprocessing', StandardScaler()),
                ('classifier', LogisticRegression())])

恢复默认显示

set_config(display="diagram")

显示链式连接多个预处理步骤和分类器的管线#

此部分构建一个带有多个预处理步骤 PolynomialFeatures 和 StandardScaler 以及分类器步骤 LogisticRegression 的 Pipeline，并显示其可视化表示。

from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler

steps = [
    ("standard_scaler", StandardScaler()),
    ("polynomial", PolynomialFeatures(degree=3)),
    ("classifier", LogisticRegression(C=2.0)),
]
pipe = Pipeline(steps)
pipe  # click on the diagram below to see the details of each step

Pipeline(steps=[('standard_scaler', StandardScaler()),
                ('polynomial', PolynomialFeatures(degree=3)),
                ('classifier', LogisticRegression(C=2.0))])

在 Jupyter 环境中，请重新运行此单元格以显示 HTML 表示或信任笔记本。
在 GitHub 上，HTML 表示无法渲染，请尝试使用 nbviewer.org 加载此页面。

管线

?Pipeline 文档i未拟合

参数

	steps	[('standard_scaler', ...), ('polynomial', ...), ...]
	transform_input	None
	memory	None
	verbose	False

StandardScaler

?StandardScaler 文档

参数

	copy	True
	with_mean	True
	with_std	True

PolynomialFeatures

?PolynomialFeatures 文档

参数

	degree	3
	interaction_only	False
	include_bias	True
	order	'C'

LogisticRegression

?LogisticRegression 文档

参数

	penalty	'l2'
	dual	False
	tol	0.0001
	C	2.0
	fit_intercept	True
	intercept_scaling	1
	class_weight	None
	random_state	None
	solver	'lbfgs'
	max_iter	100
	multi_class	'deprecated'
	verbose	0
	warm_start	False
	n_jobs	None
	l1_ratio	None

显示管线、降维和分类器#

此部分构建一个带有降维步骤 PCA 和分类器 SVC 的 Pipeline，并显示其可视化表示。

from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

steps = [("reduce_dim", PCA(n_components=4)), ("classifier", SVC(kernel="linear"))]
pipe = Pipeline(steps)
pipe  # click on the diagram below to see the details of each step

Pipeline(steps=[('reduce_dim', PCA(n_components=4)),
                ('classifier', SVC(kernel='linear'))])

在 Jupyter 环境中，请重新运行此单元格以显示 HTML 表示或信任笔记本。
在 GitHub 上，HTML 表示无法渲染，请尝试使用 nbviewer.org 加载此页面。

管线

?Pipeline 文档i未拟合

参数

	steps	[('reduce_dim', ...), ('classifier', ...)]
	transform_input	None
	memory	None
	verbose	False

PCA

?PCA 文档

参数

	n_components	4
	copy	True
	whiten	False
	svd_solver	'auto'
	tol	0.0
	iterated_power	'auto'
	n_oversamples	10
	power_iteration_normalizer	'auto'
	random_state	None

SVC

?SVC 文档

参数

	C	1.0
	kernel	'linear'
	degree	3
	gamma	'scale'
	coef0	0.0
	shrinking	True
	probability	False
	tol	0.001
	cache_size	200
	class_weight	None
	verbose	False
	max_iter	-1
	decision_function_shape	'ovr'
	break_ties	False
	random_state	None

显示链式连接 Column Transformer 的复杂管线#

此部分构建一个带有 ColumnTransformer 和分类器 LogisticRegression 的复杂 Pipeline，并显示其可视化表示。

import numpy as np

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

numeric_preprocessor = Pipeline(
    steps=[
        ("imputation_mean", SimpleImputer(missing_values=np.nan, strategy="mean")),
        ("scaler", StandardScaler()),
    ]
)

categorical_preprocessor = Pipeline(
    steps=[
        (
            "imputation_constant",
            SimpleImputer(fill_value="missing", strategy="constant"),
        ),
        ("onehot", OneHotEncoder(handle_unknown="ignore")),
    ]
)

preprocessor = ColumnTransformer(
    [
        ("categorical", categorical_preprocessor, ["state", "gender"]),
        ("numerical", numeric_preprocessor, ["age", "weight"]),
    ]
)

pipe = make_pipeline(preprocessor, LogisticRegression(max_iter=500))
pipe  # click on the diagram below to see the details of each step

Pipeline(steps=[('columntransformer',
                 ColumnTransformer(transformers=[('categorical',
                                                  Pipeline(steps=[('imputation_constant',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('onehot',
                                                                   OneHotEncoder(handle_unknown='ignore'))]),
                                                  ['state', 'gender']),
                                                 ('numerical',
                                                  Pipeline(steps=[('imputation_mean',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  ['age', 'weight'])])),
                ('logisticregression', LogisticRegression(max_iter=500))])

在 Jupyter 环境中，请重新运行此单元格以显示 HTML 表示或信任笔记本。
在 GitHub 上，HTML 表示无法渲染，请尝试使用 nbviewer.org 加载此页面。

管线

?Pipeline 文档i未拟合

参数

	steps	[('columntransformer', ...), ('logisticregression', ...)]
	transform_input	None
	memory	None
	verbose	False

columntransformer: ColumnTransformer

?columntransformer: ColumnTransformer 文档

参数

	transformers	[('categorical', ...), ('numerical', ...)]
	remainder	'drop'
	sparse_threshold	0.3
	n_jobs	None
	transformer_weights	None
	verbose	False
	verbose_feature_names_out	True
	force_int_remainder_cols	'deprecated'

categorical

['state', 'gender']

SimpleImputer

?SimpleImputer 文档

参数

	missing_values	nan
	strategy	'constant'
	fill_value	'missing'
	copy	True
	add_indicator	False
	keep_empty_features	False

OneHotEncoder

?OneHotEncoder 文档

参数

	categories	'auto'
	drop	None
	sparse_output	True
	dtype	<class 'numpy.float64'>
	handle_unknown	'ignore'
	min_frequency	None
	max_categories	None
	feature_name_combiner	'concat'

numerical

['age', 'weight']

SimpleImputer

?SimpleImputer 文档

参数

	missing_values	nan
	strategy	'mean'
	fill_value	None
	copy	True
	add_indicator	False
	keep_empty_features	False

StandardScaler

?StandardScaler 文档

参数

	copy	True
	with_mean	True
	with_std	True

LogisticRegression

?LogisticRegression 文档

参数

	penalty	'l2'
	dual	False
	tol	0.0001
	C	1.0
	fit_intercept	True
	intercept_scaling	1
	class_weight	None
	random_state	None
	solver	'lbfgs'
	max_iter	500
	multi_class	'deprecated'
	verbose	0
	warm_start	False
	n_jobs	None
	l1_ratio	None

	estimator	Pipeline(step...lassifier())])
	param_grid	{'classifier__criterion': ['gini', 'entropy'], 'classifier__max_depth': [4, 5, ...], 'classifier__max_features': ['auto', 'sqrt', ...], 'classifier__n_estimators': [200, 500]}
	scoring	None
	n_jobs	1
	refit	True
	cv	None
	verbose	0
	pre_dispatch	'2*n_jobs'
	error_score	nan
	return_train_score	False

显示管线#

显示带有预处理步骤和分类器的管线#

显示链式连接多个预处理步骤和分类器的管线#

显示管线、降维和分类器#

显示链式连接 Column Transformer 的复杂管线#

在带有分类器的管线上显示网格搜索#

本页

	n_estimators	100
	criterion	'gini'
	max_depth	None
	min_samples_split	2
	min_samples_leaf	1
	min_weight_fraction_leaf	0.0
	max_features	'sqrt'
	max_leaf_nodes	None
	min_impurity_decrease	0.0
	bootstrap	True
	oob_score	False
	n_jobs	None
	random_state	None
	verbose	0
	warm_start	False
	class_weight	None
	ccp_alpha	0.0
	max_samples	None
	monotonic_cst	None