Supervised Learning with Scikit-Learn: Regression, R-squared, and Mean Squared Error

Posted by

In this tutorial, we will be exploring the concepts of supervised learning with Scikit-Learn, specifically focusing on regression analysis, R-squared, and Mean Squared Error (MSE). These are important metrics used to evaluate the performance of regression models.

Before we dive into the code, let’s first understand the key terms:

  • Supervised Learning: Supervised learning is a type of machine learning where the model is trained on a labeled dataset, meaning the dataset has both input features and a corresponding output label. The goal of supervised learning is to learn a mapping between input features and output labels.

  • Regression Analysis: Regression analysis is a statistical technique used to model the relationship between a dependent variable (output) and one or more independent variables (input features). It is commonly used in prediction and forecasting tasks.

  • R-squared: R-squared, or the coefficient of determination, is a statistical measure that indicates the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, with 1 indicating a perfect fit.

  • Mean Squared Error (MSE): Mean Squared Error is a measure of the average squared difference between the actual and predicted values in a regression model. It is used to evaluate the performance of regression models, with lower MSE values indicating better performance.

Now, let’s create a simple regression model using Scikit-Learn and calculate R-squared and MSE:

<!DOCTYPE html>
<html>
<head>
    <title>Supervised Learning with Scikit-Learn</title>
</head>
<body>
    <h1>Supervised Learning with Scikit-Learn: Regression</h1>

    <script src="https://cdn.jsdelivr.net/npm/scikit-learn"></script>

    <script>
        const { LinearRegression } = sklearn.linear_model;
        const { r2_score, mean_squared_error } = sklearn.metrics;
        const { train_test_split } = sklearn.model_selection;
        const { StandardScaler } = sklearn.preprocessing;

        // Create a sample dataset
        const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]];
        const y = [2, 3, 4, 5, 6];

        // Split the dataset into training and testing sets
        const [X_train, X_test, y_train, y_test] = train_test_split(X, y, test_size=0.2, random_state=42);

        // Initialize a Linear Regression model
        const model = new LinearRegression();

        // Fit the model on the training data
        model.fit(X_train, y_train);

        // Make predictions on the test data
        const y_pred = model.predict(X_test);

        // Calculate R-squared and MSE
        const r2 = r2_score(y_test, y_pred);
        const mse = mean_squared_error(y_test, y_pred);

        // Display the results
        document.write(`<p>R-squared: ${r2}</p>`);
        document.write(`<p>Mean Squared Error: ${mse}</p>`);
    </script>
</body>
</html>

In this code snippet, we first import the necessary modules from Scikit-Learn, including LinearRegression, r2_score, mean_squared_error, train_test_split, and StandardScaler. We then create a sample dataset with input features (X) and output labels (y).

Next, we split the dataset into training and testing sets using train_test_split. We initialize a Linear Regression model and fit it on the training data. We then make predictions on the test data and calculate R-squared and MSE using the r2_score and mean_squared_error functions.

Finally, we display the results of R-squared and MSE on the webpage using JavaScript.

This is a basic example of implementing supervised learning with Scikit-Learn for regression analysis. You can further explore different regression models, feature engineering techniques, and hyperparameter tuning to improve the performance of your models.

I hope this tutorial was helpful in understanding the concepts of supervised learning, regression analysis, R-squared, and Mean Squared Error with Scikit-Learn. Happy learning!