Save classifier to disk in scikit-learn #shorts
Scikit-learn is a popular machine learning library in Python that provides a wide range of tools for building and training machine learning models. In this article, we will discuss how to save a trained classifier to disk using scikit-learn’s joblib library.
Step 1: Train a Classifier
First, you need to train a classifier using scikit-learn. This can be done by importing the necessary modules and dataset, splitting the data into training and testing sets, and fitting a classifier to the training data.
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
Step 2: Save the Classifier to Disk
Once the classifier is trained, you can save it to disk using joblib’s dump function. This function takes the trained classifier and a file path as input and saves the classifier to the specified file.
from joblib import dump
file_path = 'classifier.joblib'
dump(clf, file_path)
Step 3: Load the Classifier from Disk
To use the saved classifier in the future, you can load it from disk using joblib’s load function. This function takes the file path of the saved classifier as input and returns the loaded classifier.
from joblib import load
loaded_clf = load(file_path)
Conclusion
Saving a classifier to disk in scikit-learn is a simple process that can be done using joblib’s dump function. This allows you to easily reuse the trained classifier in the future without needing to retrain it. Remember to always save your classifiers to disk after training them to ensure they are not lost.