显示管道#

在Jupyter Notebook中显示管道的默认配置是'diagram',其中set_config(display='diagram')。要停用HTML表示,请使用set_config(display='text')

要在管道可视化中查看更详细的步骤,请点击管道中的步骤。

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

显示包含预处理步骤和分类器的管道#

本节构建一个具有预处理步骤StandardScaler和分类器LogisticRegressionPipeline,并显示其可视化表示。

from sklearn import set_config
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

steps = [
    ("preprocessing", StandardScaler()),
    ("classifier", LogisticRegression()),
]
pipe = Pipeline(steps)

要可视化图表,默认值为display='diagram'

set_config(display="diagram")
pipe  # click on the diagram below to see the details of each step
Pipeline(steps=[('preprocessing', StandardScaler()),
                ('classifier', LogisticRegression())])
在Jupyter环境中,请重新运行此单元格以显示HTML表示或信任notebook。
在GitHub上,HTML表示无法呈现,请尝试使用nbviewer.org加载此页面。


要查看文本管道,请更改为display='text'

set_config(display="text")
pipe
Pipeline(steps=[('preprocessing', StandardScaler()),
                ('classifier', LogisticRegression())])

恢复默认显示

set_config(display="diagram")

显示链接多个预处理步骤和分类器的管道#

本节构建一个具有多个预处理步骤PolynomialFeaturesStandardScaler以及分类器步骤LogisticRegressionPipeline,并显示其可视化表示。

from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler

steps = [
    ("standard_scaler", StandardScaler()),
    ("polynomial", PolynomialFeatures(degree=3)),
    ("classifier", LogisticRegression(C=2.0)),
]
pipe = Pipeline(steps)
pipe  # click on the diagram below to see the details of each step
Pipeline(steps=[('standard_scaler', StandardScaler()),
                ('polynomial', PolynomialFeatures(degree=3)),
                ('classifier', LogisticRegression(C=2.0))])
在Jupyter环境中,请重新运行此单元格以显示HTML表示或信任notebook。
在GitHub上,HTML表示无法呈现,请尝试使用nbviewer.org加载此页面。


显示具有降维和分类器的管道#

本节构建一个具有降维步骤PCA、分类器SVCPipeline,并显示其可视化表示。

from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

steps = [("reduce_dim", PCA(n_components=4)), ("classifier", SVC(kernel="linear"))]
pipe = Pipeline(steps)
pipe  # click on the diagram below to see the details of each step
Pipeline(steps=[('reduce_dim', PCA(n_components=4)),
                ('classifier', SVC(kernel='linear'))])
在Jupyter环境中,请重新运行此单元格以显示HTML表示或信任notebook。
在GitHub上,HTML表示无法呈现,请尝试使用nbviewer.org加载此页面。


显示链接列转换器的复杂管道#

本节构建一个具有ColumnTransformer和分类器LogisticRegression的复杂Pipeline,并显示其可视化表示。

import numpy as np

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

numeric_preprocessor = Pipeline(
    steps=[
        ("imputation_mean", SimpleImputer(missing_values=np.nan, strategy="mean")),
        ("scaler", StandardScaler()),
    ]
)

categorical_preprocessor = Pipeline(
    steps=[
        (
            "imputation_constant",
            SimpleImputer(fill_value="missing", strategy="constant"),
        ),
        ("onehot", OneHotEncoder(handle_unknown="ignore")),
    ]
)

preprocessor = ColumnTransformer(
    [
        ("categorical", categorical_preprocessor, ["state", "gender"]),
        ("numerical", numeric_preprocessor, ["age", "weight"]),
    ]
)

pipe = make_pipeline(preprocessor, LogisticRegression(max_iter=500))
pipe  # click on the diagram below to see the details of each step
Pipeline(steps=[('columntransformer',
                 ColumnTransformer(transformers=[('categorical',
                                                  Pipeline(steps=[('imputation_constant',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('onehot',
                                                                   OneHotEncoder(handle_unknown='ignore'))]),
                                                  ['state', 'gender']),
                                                 ('numerical',
                                                  Pipeline(steps=[('imputation_mean',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  ['age', 'weight'])])),
                ('logisticregression', LogisticRegression(max_iter=500))])
在Jupyter环境中,请重新运行此单元格以显示HTML表示或信任notebook。
在GitHub上,HTML表示无法呈现,请尝试使用nbviewer.org加载此页面。


显示对具有分类器的管道的网格搜索#

本节构建一个对具有RandomForestClassifierPipeline进行GridSearchCV,并显示其可视化表示。

import numpy as np

from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

numeric_preprocessor = Pipeline(
    steps=[
        ("imputation_mean", SimpleImputer(missing_values=np.nan, strategy="mean")),
        ("scaler", StandardScaler()),
    ]
)

categorical_preprocessor = Pipeline(
    steps=[
        (
            "imputation_constant",
            SimpleImputer(fill_value="missing", strategy="constant"),
        ),
        ("onehot", OneHotEncoder(handle_unknown="ignore")),
    ]
)

preprocessor = ColumnTransformer(
    [
        ("categorical", categorical_preprocessor, ["state", "gender"]),
        ("numerical", numeric_preprocessor, ["age", "weight"]),
    ]
)

pipe = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", RandomForestClassifier())]
)

param_grid = {
    "classifier__n_estimators": [200, 500],
    "classifier__max_features": ["auto", "sqrt", "log2"],
    "classifier__max_depth": [4, 5, 6, 7, 8],
    "classifier__criterion": ["gini", "entropy"],
}

grid_search = GridSearchCV(pipe, param_grid=param_grid, n_jobs=1)
grid_search  # click on the diagram below to see the details of each step
GridSearchCV(estimator=Pipeline(steps=[('preprocessor',
                                        ColumnTransformer(transformers=[('categorical',
                                                                         Pipeline(steps=[('imputation_constant',
                                                                                          SimpleImputer(fill_value='missing',
                                                                                                        strategy='constant')),
                                                                                         ('onehot',
                                                                                          OneHotEncoder(handle_unknown='ignore'))]),
                                                                         ['state',
                                                                          'gender']),
                                                                        ('numerical',
                                                                         Pipeline(steps=[('imputation_mean',
                                                                                          SimpleImputer()),
                                                                                         ('scaler',
                                                                                          StandardScaler())]),
                                                                         ['age',
                                                                          'weight'])])),
                                       ('classifier',
                                        RandomForestClassifier())]),
             n_jobs=1,
             param_grid={'classifier__criterion': ['gini', 'entropy'],
                         'classifier__max_depth': [4, 5, 6, 7, 8],
                         'classifier__max_features': ['auto', 'sqrt', 'log2'],
                         'classifier__n_estimators': [200, 500]})
在Jupyter环境中,请重新运行此单元格以显示HTML表示或信任notebook。
在GitHub上,HTML表示无法呈现,请尝试使用nbviewer.org加载此页面。


脚本总运行时间:(0分钟0.096秒)

相关示例

显示估计器和复杂的管道

显示估计器和复杂的管道

具有混合类型的列转换器

具有混合类型的列转换器

介绍set_output API

介绍set_output API

比较目标编码器与其他编码器

比较目标编码器与其他编码器

Sphinx-Gallery生成的图库