Exploring Machine Learning in Python with Scikit-Learn: Understanding RMSE, MAE, RMSLE, Adjusted R2, and Beyond

Posted by


Introduction:

Machine Learning is a branch of artificial intelligence that deals with the development of algorithms and models that allow computers to learn from data without being explicitly programmed. Scikit-Learn is a popular machine learning library in Python that provides a wide range of tools for building machine learning models. In this tutorial, we will cover some of the key metrics used to evaluate the performance of machine learning models, including RMSE, MAE, RMSLE, and adjusted R2.

  1. Importing the necessary libraries:

Before we start building our machine learning models, we need to import the necessary libraries. Scikit-Learn provides a number of different modules for various machine learning tasks, such as regression, classification, clustering, etc. In this tutorial, we will focus on regression models. Here is how you can import the required libraries:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_squared_log_error, r2_score
  1. Loading and Preprocessing the data:

Next, we need to load the dataset that we will use to build our machine learning model. In this tutorial, we will use the Boston Housing dataset, which is included in the Scikit-Learn library. Here is how you can load the dataset and preprocess it:

from sklearn.datasets import load_boston
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.Series(boston.target)
  1. Splitting the data into training and testing sets:

Before we build our machine learning model, we need to split the data into training and testing sets. This is done to train the model on a subset of the data and then evaluate its performance on unseen data. Here is how you can split the data into training and testing sets:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  1. Building a Linear Regression model:

Now that we have preprocessed the data and split it into training and testing sets, we can build a simple Linear Regression model using the Scikit-Learn library. Here is how you can do this:

lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
  1. Evaluating the model:

Once we have built the model and made predictions on the test set, we can evaluate its performance using various metrics. Here are some of the key metrics that are commonly used to evaluate regression models:

  • Root Mean Squared Error (RMSE): RMSE is a measure of the average deviation of the predicted values from the actual values. Lower RMSE values indicate better model performance.
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print("RMSE: {:.2f}".format(rmse))
  • Mean Absolute Error (MAE): MAE is a measure of the average absolute difference between the predicted values and the actual values. Lower MAE values indicate better model performance.
mae = mean_absolute_error(y_test, y_pred)
print("MAE: {:.2f}".format(mae))
  • Root Mean Squared Log Error (RMSLE): RMSLE is a logarithmic transformation of the RMSE metric. It is often used when the target variable has a wide range of values. Lower RMSLE values indicate better model performance.
rmsle = np.sqrt(mean_squared_log_error(y_test, y_pred))
print("RMSLE: {:.2f}".format(rmsle))
  • Adjusted R squared (adj R2): R-squared is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variables. Adjusted R squared takes into account the number of independent variables in the model and penalizes the addition of unnecessary variables.
r2 = r2_score(y_test, y_pred)
n = len(X_test)
p = X_test.shape[1]
adj_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
print("Adjusted R2: {:.2f}".format(adj_r2))

Conclusion:

In this tutorial, we have covered some of the key metrics used to evaluate the performance of machine learning models, including RMSE, MAE, RMSLE, and adjusted R2. These metrics are essential for assessing the accuracy and reliability of machine learning models and can help you make informed decisions when building and optimizing your models. By following the steps outlined in this tutorial and experimenting with different models and hyperparameters, you can improve the performance of your machine learning models and make more accurate predictions on unseen data.

0 0 votes
Article Rating
9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@anjanaalex8988
1 month ago

where is numpy ?

@ARR686
1 month ago

I have seen 5 videos of yours today, I am to understand them clearly, thank you very much, keep doing the great work!

@aminahmadisharaf8707
1 month ago

Thanks for this video; that was clear and helpful.
I got this message : ValueError: Mean Squared Logarithmic Error cannot be used when targets contain negative values.
Do you have any idea how I can fix it?

@joshuapatterson7281
1 month ago

Thank you so much! I have been looking for something like this for a long time!

@sandysameh7195
1 month ago

OLS?

@AlacarteTV
1 month ago

Hi, if i predict some sales data and have MAE 700.00 +. and RMSE around 1000, which evaluation should i use?

@gadicherlapadmasri2276
1 month ago

when to use MAE,MSE,RMSE,R2and adj R2? When to use MAE, when to use RMSE and so on?

@abrarahmed9239
1 month ago

hello i have a doubt sir

@omkardeshpande7482
1 month ago

Thanks You!