Saving a classifier to disk in scikit-learn #quicktip

Posted by

Save classifier to disk in scikit-learn #shorts

Save classifier to disk in scikit-learn #shorts

Scikit-learn is a popular machine learning library in Python that provides a wide range of tools for building and training machine learning models. In this article, we will discuss how to save a trained classifier to disk using scikit-learn’s joblib library.

Step 1: Train a Classifier

First, you need to train a classifier using scikit-learn. This can be done by importing the necessary modules and dataset, splitting the data into training and testing sets, and fitting a classifier to the training data.

        
            import numpy as np
            from sklearn.model_selection import train_test_split
            from sklearn.ensemble import RandomForestClassifier
            from sklearn.datasets import load_iris

            iris = load_iris()
            X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

            clf = RandomForestClassifier()
            clf.fit(X_train, y_train)
        
    

Step 2: Save the Classifier to Disk

Once the classifier is trained, you can save it to disk using joblib’s dump function. This function takes the trained classifier and a file path as input and saves the classifier to the specified file.

        
            from joblib import dump

            file_path = 'classifier.joblib'
            dump(clf, file_path)
        
    

Step 3: Load the Classifier from Disk

To use the saved classifier in the future, you can load it from disk using joblib’s load function. This function takes the file path of the saved classifier as input and returns the loaded classifier.

        
            from joblib import load

            loaded_clf = load(file_path)
        
    

Conclusion

Saving a classifier to disk in scikit-learn is a simple process that can be done using joblib’s dump function. This allows you to easily reuse the trained classifier in the future without needing to retrain it. Remember to always save your classifiers to disk after training them to ensure they are not lost.