Linear regression is a fundamental machine learning algorithm that is widely used for predicting continuous numerical values. In this tutorial, I will walk you through how to perform linear regression using Scikit-learn, a popular Python machine learning library.
Linear regression is a simple algorithm that tries to find the best line that fits the data points. The goal is to find the line that minimizes the sum of squared differences between the actual values and the predicted values. This line is represented as Y = mX + b, where Y is the predicted value, X is the input feature, m is the slope of the line, and b is the intercept.
Let’s start by installing Scikit-learn if you haven’t already:
pip install scikit-learn
Next, we need to import the necessary libraries:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
Now, let’s generate some sample data to work with:
# Generate random data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
In this example, we have generated random data with a linear relationship between X and y. Next, we will split the data into training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
Now, we can create a LinearRegression object and fit it to the training data:
model = LinearRegression()
model.fit(X_train, y_train)
We can now make predictions using the trained model:
y_pred = model.predict(X_test)
To evaluate the model, we can calculate the mean squared error:
mse = np.mean((y_pred - y_test) ** 2)
print("Mean Squared Error:", mse)
Additionally, we can also plot the data points along with the line predicted by the model:
import matplotlib.pyplot as plt
plt.scatter(X_test, y_test, color='blue')
plt.plot(X_test, y_pred, color='red')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression')
plt.show()
This will give you a visual representation of how well the model fits the data.
In conclusion, linear regression is a simple yet powerful algorithm that can be used for predicting continuous values. By using Scikit-learn, you can easily implement linear regression in Python. I hope this tutorial has helped you understand the basics of linear regression and how to apply it using Scikit-learn. Happy coding!