Scikit-Learn Python for Machine Learning: Mastering Feature Scaling

Posted by


Machine learning is an important aspect of data science that involves building and training algorithms that can learn from data. One essential step in the process of building machine learning models is feature scaling. Feature scaling is the process of normalizing or standardizing the range of independent variables or features of data so that they can all be compared on a common scale.

In this tutorial, we will explore the concept of feature scaling and how to apply it using the Scikit-Learn library in Python. Scikit-Learn is a powerful library that provides a wide range of tools for machine learning tasks, including feature scaling.

Feature Scaling Techniques

There are several techniques for feature scaling, but two of the most commonly used methods are normalization and standardization.

Normalization:
Normalization, also known as Min-Max scaling, is the process of scaling the features to be within a specific range, usually between 0 and 1. This can be achieved using the formula:

[X’ = frac{X – X{min}}{X{max} – X_{min}}]

where (X) is the original feature vector, (X{min}) is the minimum value of the feature, and (X{max}) is the maximum value of the feature.

Standardization:
Standardization, on the other hand, is the process of transforming the features so that they have a mean of 0 and a standard deviation of 1. This can be achieved using the formula:

[X’ = frac{X – mu}{sigma}]

where (X) is the original feature vector, (mu) is the mean of the feature, and (sigma) is the standard deviation of the feature.

Now, let’s see how to apply feature scaling using Scikit-Learn in Python.

Importing the necessary libraries:

import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler

Generating sample data:

# Generating a sample dataset
data = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
df = pd.DataFrame(data, columns=['Feature1', 'Feature2'])
print(df)

Output:

   Feature1  Feature2
0         1         2
1         3         4
2         5         6
3         7         8

Applying Min-Max scaling:

# Applying Min-Max scaling
scaler = MinMaxScaler()
df_normalized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print(df_normalized)

Output:

   Feature1  Feature2
0       0.0       0.0
1       0.333333  0.333333
2       0.666667  0.666667
3       1.0       1.0

Applying Standardization:

# Applying Standardization
scaler = StandardScaler()
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print(df_standardized)

Output:

   Feature1  Feature2
0 -1.341641 -1.341641
1 -0.447214 -0.447214
2  0.447214  0.447214
3  1.341641  1.341641

As we can see from the outputs, Min-Max scaling transforms the features into a range between 0 and 1, while standardization transforms the features so that they have a mean of 0 and a standard deviation of 1.

Conclusion

In this tutorial, we have learned about the importance of feature scaling in machine learning and how to apply feature scaling using the Scikit-Learn library in Python. By scaling features, we can ensure that all the features are on a common scale, which can help improve the performance of machine learning models. I hope this tutorial has been helpful in understanding the concept of feature scaling and how to implement it in Python. Happy coding!

0 0 votes
Article Rating
8 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@adityakishan589
2 months ago

I believe there is one small correction needed. The output of standard scaler is not necessarily between -1 to 1. Kindly check.

Thanks for the explanation btw!!

@kartikpandya2193
2 months ago

Hello, It is really a nice video. Can we get its jupyter notebook?

@Damien-cb4iz
2 months ago

Hi, I love your videos. This one doesn't have auto-subtitles, would you be able to activate it?

@shaktijain8560
2 months ago

Could you provide the exact path of the dataset?

@arijitRC473
2 months ago

Keep it up…

@xritzx
2 months ago

Loved it🤘👌

@motivation9718
2 months ago

Bhai , Sujan….. Khub bhalo hoeche….

Jai Mahakal 🙏

@benai172
2 months ago

i have actually seen all your tutorials. thank you for amazing videos. truly appreciated.