Tutorial on sklearn.preprocessing: Polynomial Features and Custom Transformers in Scikit-learn

Posted by


In machine learning, feature engineering plays a crucial role in improving the performance of the model. One common technique used for feature engineering is polynomial features. In this tutorial, we will explore how to create polynomial features using the PolynomialFeatures transformer from the sklearn.preprocessing module.

We will also see how to create custom transformers using the FunctionTransformer class from the same module. Let’s begin by importing the necessary libraries:

import numpy as np
from sklearn.preprocessing import PolynomialFeatures, FunctionTransformer

1. Polynomial Features

Polynomial features are created by taking all possible combinations of input features up to a certain degree. For example, if we have two input features x1 and x2, and we want to create polynomial features up to degree 2, the resulting features would be [1, x1, x2, x1^2, x1*x2, x2^2]. This can help capture non-linear relationships in the data.

Let’s create a simple dataset to demonstrate how to use the PolynomialFeatures transformer:

X = np.array([[1, 2],
              [3, 4],
              [5, 6]])

Now, we will create polynomial features up to degree 2 using the PolynomialFeatures transformer:

poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)
print(X_poly)

The output will be:

[[ 1.  2.  1.  2.  4.]
 [ 3.  4.  9. 12. 16.]
 [ 5.  6. 25. 30. 36.]]

As you can see, the transformer has created polynomial features up to degree 2. You can change the degree parameter to create polynomial features of higher degrees.

2. Custom Transformers

Sometimes, you may need to apply custom transformations to the data before feeding it into the model. This can be achieved by creating custom transformers using the FunctionTransformer class.

Let’s create a custom transformer that takes the square root of each feature in the dataset:

def sqrt_transform(X):
    return np.sqrt(X)

sqrt_transformer = FunctionTransformer(sqrt_transform)
X_sqrt = sqrt_transformer.fit_transform(X)
print(X_sqrt)

The output will be:

[[1.         1.41421356]
 [1.73205081 2.        ]
 [2.23606798 2.44948974]]

In this example, we defined a custom function sqrt_transform that takes the square root of each feature in the dataset. We then created a FunctionTransformer object with this function and applied it to the dataset to get the transformed features.

You can define any custom function and create transformers based on your specific requirements.

Conclusion

In this tutorial, we learned how to use the PolynomialFeatures transformer to create polynomial features and the FunctionTransformer class to create custom transformers in scikit-learn. Feature engineering is a crucial step in building machine learning models, and these techniques can help improve the performance of your model by capturing more complex relationships in the data. Experiment with different transformations and see how they impact the model performance.

0 0 votes
Article Rating
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@apollokre1d
3 months ago

A big thank you for the time and effort you put into this series, enjoyed it and learnt a fair bit.