Model serving in PyTorch refers to the process of deploying a trained PyTorch model to an application or system where it can be used to make predictions on new data. Model serving is a crucial step in the machine learning pipeline as it allows the model to be used in real-world applications to generate insights and make decisions.
In this tutorial, we will walk through the process of serving a PyTorch model using the FastAPI framework and Docker. FastAPI is a modern web framework for building APIs with Python, and Docker is a containerization platform that allows you to package and deploy applications with all their dependencies.
Step 1: Train a PyTorch model
Before we can serve a PyTorch model, we first need to train it on some data. For the purpose of this tutorial, let’s train a simple convolutional neural network (CNN) on the MNIST dataset, which consists of handwritten digits.
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
# Load the MNIST dataset
train_dataset = datasets.MNIST(root="data/", train=True, transform=transforms.ToTensor(), download=True)
test_dataset = datasets.MNIST(root="data/", train=False, transform=transforms.ToTensor(), download=True)
# Define the CNN model
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
self.fc1 = nn.Linear(12*12*64, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = torch.relu(self.conv1(x))
x = torch.max_pool2d(x, 2)
x = torch.relu(self.conv2(x))
x = torch.max_pool2d(x, 2)
x = x.view(-1, 12*12*64)
x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x
# Initialize the model and optimizer
model = SimpleCNN()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Train the model
for epoch in range(5):
for i, (images, labels) in enumerate(train_dataset):
optimizer.zero_grad()
outputs = model(images)
loss = nn.CrossEntropyLoss()(outputs, labels)
loss.backward()
optimizer.step()
if i % 100 == 0:
print(f"Epoch {epoch}, Iteration {i}, Loss: {loss.item()}")
# Save the trained model
torch.save(model.state_dict(), "mnist_cnn.pt")
Step 2: Create a FastAPI app to serve the model
Next, we will create a FastAPI app that will serve the trained PyTorch model. We will define an API endpoint that accepts an image of a handwritten digit, preprocesses it, passes it through the model, and returns the predicted digit.
from fastapi import FastAPI, File, UploadFile
import io
from PIL import Image
import torchvision.transforms as transforms
app = FastAPI()
# Load the trained model
model = SimpleCNN()
model.load_state_dict(torch.load("mnist_cnn.pt"))
model.eval()
# Define the preprocessing function
def preprocess_image(image_bytes):
image = Image.open(io.BytesIO(image_bytes))
image = transforms.ToTensor()(image).unsqueeze(0)
return image
# Define the prediction endpoint
@app.post("/predict")
async def predict(file: UploadFile = File(...)):
image = preprocess_image(await file.read())
prediction = model(image)
predicted_class = torch.argmax(prediction).item()
return {"predicted_digit": predicted_class}
Step 3: Create a Dockerfile to containerize the app
We will now create a Dockerfile that will containerize the FastAPI app along with all its dependencies. This will make it easier to deploy and run the app on different platforms.
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8
COPY . /app
WORKDIR /app
RUN pip install torch torchvision aiofiles pillow
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Step 4: Build and run the Docker container
Finally, we will build the Docker container using the Dockerfile and run the FastAPI app inside the container.
docker build -t mnist-fastapi .
docker run -d -p 8000:8000 mnist-fastapi
That’s it! You now have a PyTorch model serving app running inside a Docker container using FastAPI. You can send a POST request to the /predict
endpoint with an image containing a handwritten digit, and the app will return the predicted digit.
In this tutorial, we learned how to serve a PyTorch model using FastAPI and Docker. Model serving is an essential part of the machine learning pipeline, as it allows us to deploy our trained models to real-world applications and systems. I hope this tutorial was helpful in understanding the process of model serving in PyTorch. Thank you for reading!
Very nice introduction to Torchserve.. To make things simple at the high level! thanks Geeta!
A really great session!