Using FunctionTransformer in scikit-learn Pipeline (Python)

Posted by


Pipeline in scikit-learn is a tool that helps to simplify the machine learning workflow by chaining together multiple processing steps. Each step in the pipeline can be a transformer (which preprocesses the data) or an estimator (which makes predictions). The output of each step is passed as input to the next step. One commonly used transformer in scikit-learn is FunctionTransformer, which allows you to apply a function to the data. In this tutorial, we will discuss how to use FunctionTransformer in a scikit-learn pipeline.

Step 1: Import necessary libraries

First, you need to import the necessary libraries. In this case, you will need scikit-learn and FunctionTransformer from sklearn.preprocessing.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer

Step 2: Define the preprocessing function

Next, you need to define the function that you want to apply to the data. This function will be passed as an argument to the FunctionTransformer.

def custom_function(X):
    return X ** 2

In this example, we have defined a simple function that squares the input data.

Step 3: Create the FunctionTransformer

Now, you can create the FunctionTransformer object by passing in the preprocessing function.

function_transformer = FunctionTransformer(func=custom_function)

Step 4: Create the pipeline

Next, you can create a Pipeline object and add the FunctionTransformer to it.

pipeline = Pipeline([
    ('function_transformer', function_transformer)
])

In this example, we have only included the FunctionTransformer step in the pipeline. In a real machine learning workflow, you would typically include other preprocessing steps and the final estimator in the pipeline as well.

Step 5: Fit and transform the data

Finally, you can fit the pipeline to your data and transform the data using the pipeline.

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
pipeline.fit(X)
transformed_X = pipeline.transform(X)

In this example, we have applied the custom_function to the input data using the FunctionTransformer in the pipeline.

By using the FunctionTransformer in a scikit-learn pipeline, you can easily preprocess the data by applying custom functions. This can be useful for feature engineering or data transformation steps in your machine learning workflow. I hope this tutorial was helpful in understanding how to use FunctionTransformer in a scikit-learn pipeline.