In this tutorial, we will be exploring the concepts of supervised learning with Scikit-Learn, specifically focusing on regression analysis, R-squared, and Mean Squared Error (MSE). These are important metrics used to evaluate the performance of regression models.
Before we dive into the code, let’s first understand the key terms:
-
Supervised Learning: Supervised learning is a type of machine learning where the model is trained on a labeled dataset, meaning the dataset has both input features and a corresponding output label. The goal of supervised learning is to learn a mapping between input features and output labels.
-
Regression Analysis: Regression analysis is a statistical technique used to model the relationship between a dependent variable (output) and one or more independent variables (input features). It is commonly used in prediction and forecasting tasks.
-
R-squared: R-squared, or the coefficient of determination, is a statistical measure that indicates the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, with 1 indicating a perfect fit.
- Mean Squared Error (MSE): Mean Squared Error is a measure of the average squared difference between the actual and predicted values in a regression model. It is used to evaluate the performance of regression models, with lower MSE values indicating better performance.
Now, let’s create a simple regression model using Scikit-Learn and calculate R-squared and MSE:
<!DOCTYPE html>
<html>
<head>
<title>Supervised Learning with Scikit-Learn</title>
</head>
<body>
<h1>Supervised Learning with Scikit-Learn: Regression</h1>
<script src="https://cdn.jsdelivr.net/npm/scikit-learn"></script>
<script>
const { LinearRegression } = sklearn.linear_model;
const { r2_score, mean_squared_error } = sklearn.metrics;
const { train_test_split } = sklearn.model_selection;
const { StandardScaler } = sklearn.preprocessing;
// Create a sample dataset
const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]];
const y = [2, 3, 4, 5, 6];
// Split the dataset into training and testing sets
const [X_train, X_test, y_train, y_test] = train_test_split(X, y, test_size=0.2, random_state=42);
// Initialize a Linear Regression model
const model = new LinearRegression();
// Fit the model on the training data
model.fit(X_train, y_train);
// Make predictions on the test data
const y_pred = model.predict(X_test);
// Calculate R-squared and MSE
const r2 = r2_score(y_test, y_pred);
const mse = mean_squared_error(y_test, y_pred);
// Display the results
document.write(`<p>R-squared: ${r2}</p>`);
document.write(`<p>Mean Squared Error: ${mse}</p>`);
</script>
</body>
</html>
In this code snippet, we first import the necessary modules from Scikit-Learn, including LinearRegression, r2_score, mean_squared_error, train_test_split, and StandardScaler. We then create a sample dataset with input features (X) and output labels (y).
Next, we split the dataset into training and testing sets using train_test_split. We initialize a Linear Regression model and fit it on the training data. We then make predictions on the test data and calculate R-squared and MSE using the r2_score and mean_squared_error functions.
Finally, we display the results of R-squared and MSE on the webpage using JavaScript.
This is a basic example of implementing supervised learning with Scikit-Learn for regression analysis. You can further explore different regression models, feature engineering techniques, and hyperparameter tuning to improve the performance of your models.
I hope this tutorial was helpful in understanding the concepts of supervised learning, regression analysis, R-squared, and Mean Squared Error with Scikit-Learn. Happy learning!