Machine learning is an important aspect of data science that involves building and training algorithms that can learn from data. One essential step in the process of building machine learning models is feature scaling. Feature scaling is the process of normalizing or standardizing the range of independent variables or features of data so that they can all be compared on a common scale.
In this tutorial, we will explore the concept of feature scaling and how to apply it using the Scikit-Learn library in Python. Scikit-Learn is a powerful library that provides a wide range of tools for machine learning tasks, including feature scaling.
Feature Scaling Techniques
There are several techniques for feature scaling, but two of the most commonly used methods are normalization and standardization.
Normalization:
Normalization, also known as Min-Max scaling, is the process of scaling the features to be within a specific range, usually between 0 and 1. This can be achieved using the formula:
[X’ = frac{X – X{min}}{X{max} – X_{min}}]
where (X) is the original feature vector, (X{min}) is the minimum value of the feature, and (X{max}) is the maximum value of the feature.
Standardization:
Standardization, on the other hand, is the process of transforming the features so that they have a mean of 0 and a standard deviation of 1. This can be achieved using the formula:
[X’ = frac{X – mu}{sigma}]
where (X) is the original feature vector, (mu) is the mean of the feature, and (sigma) is the standard deviation of the feature.
Now, let’s see how to apply feature scaling using Scikit-Learn in Python.
Importing the necessary libraries:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler
Generating sample data:
# Generating a sample dataset
data = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
df = pd.DataFrame(data, columns=['Feature1', 'Feature2'])
print(df)
Output:
Feature1 Feature2
0 1 2
1 3 4
2 5 6
3 7 8
Applying Min-Max scaling:
# Applying Min-Max scaling
scaler = MinMaxScaler()
df_normalized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print(df_normalized)
Output:
Feature1 Feature2
0 0.0 0.0
1 0.333333 0.333333
2 0.666667 0.666667
3 1.0 1.0
Applying Standardization:
# Applying Standardization
scaler = StandardScaler()
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print(df_standardized)
Output:
Feature1 Feature2
0 -1.341641 -1.341641
1 -0.447214 -0.447214
2 0.447214 0.447214
3 1.341641 1.341641
As we can see from the outputs, Min-Max scaling transforms the features into a range between 0 and 1, while standardization transforms the features so that they have a mean of 0 and a standard deviation of 1.
Conclusion
In this tutorial, we have learned about the importance of feature scaling in machine learning and how to apply feature scaling using the Scikit-Learn library in Python. By scaling features, we can ensure that all the features are on a common scale, which can help improve the performance of machine learning models. I hope this tutorial has been helpful in understanding the concept of feature scaling and how to implement it in Python. Happy coding!
I believe there is one small correction needed. The output of standard scaler is not necessarily between -1 to 1. Kindly check.
Thanks for the explanation btw!!
Hello, It is really a nice video. Can we get its jupyter notebook?
Hi, I love your videos. This one doesn't have auto-subtitles, would you be able to activate it?
Could you provide the exact path of the dataset?
Keep it up…
Loved it🤘👌
Bhai , Sujan….. Khub bhalo hoeche….
Jai Mahakal 🙏
i have actually seen all your tutorials. thank you for amazing videos. truly appreciated.