Creating a Containerized Transcription API using the Whisper Model and FastAPI

Posted by

Build a Containerized Transcription API using Whisper Model and FastAPI

Build a Containerized Transcription API using Whisper Model and FastAPI

Transcription is a critical part of many applications, from voice assistants to call centers to video captioning. In this article, we’ll walk through how to build a containerized transcription API using the Whisper model and FastAPI.

What is the Whisper Model?

The Whisper model is a state-of-the-art speech recognition model developed by Facebook AI Research. It is designed to be fast and accurate, making it an excellent choice for transcription tasks.

What is FastAPI?

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints. It is easy to use, and provides automatic interactive API documentation (as Swagger UI), and validation using Python type hints.

Prerequisites

Before we get started, make sure you have Docker installed on your machine. You’ll also need to have a basic understanding of Python and web development. If you’re new to Docker, you can check out the official documentation for installation instructions.

Steps

1. First, let’s create a new directory for our project and navigate into it.


mkdir transcription-api
cd transcription-api

2. Next, we’ll create a new virtual environment and activate it.


python3 -m venv venv
source venv/bin/activate

3. Now, let’s install FastAPI and Uvicorn, which is a lightning-fast ASGI server.


pip install fastapi uvicorn

4. We’ll also need to install the Whisper model. You can do this by running the following command:


pip install transformers

5. Once we have everything set up, we can begin writing our FastAPI application. Create a new file called main.py and add the following code:


from fastapi import FastAPI
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

app = FastAPI()

@app.post("/transcribe")
async def transcribe_audio(audio_data: bytes):
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")

transcribed_text = "Transcription not implemented yet"

return {"transcription": transcribed_text}

6. Finally, let’s create a Dockerfile to containerize our application. Create a new file called Dockerfile in the root of your project directory and add the following code:


FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8

COPY ./app /app

7. Now, we can build and run our containerized application. Run the following commands in your terminal:


docker build -t transcription-api .
docker run -d -p 8000:80 transcription-api

That’s it! You now have a containerized transcription API using the Whisper model and FastAPI. You can test it out by sending a POST request with audio data to http://localhost:8000/transcribe.

Conclusion

In this article, we walked through the process of building a containerized transcription API using the Whisper model and FastAPI. By following these steps, you can easily create a powerful and scalable transcription service for your applications.

0 0 votes
Article Rating
7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@MrZelektronz
6 months ago

Solely judging from the title this is exactly what i need. I hope it works as I expect 😀 gonna keep watching

@rois8888
6 months ago

When I run in Postman in headers I put Content-Type: multipart/form-data and in the Body I put Key as "files" and for Value I upload the .wav file. For some reason I get files: undefined

Maybe on Mac I'm supposed to do something different?

@datasciencetoday7127
6 months ago

hero

@kshitizkhandelwal879
6 months ago

You are incredible. Can we get more of end to end projects involving Docker

@concaption
6 months ago

requirements file in incomplete. Is not working with the whisper library that i am usign from pypi

@shivamroy1775
6 months ago

Absolute quality content. So informative and I love how every step is explained in great detail.

@harshkadam3702
6 months ago

Hey , you created video on the text to image API in past , so can we able to create API that can use checkpoint from civitai , like able to use multiple checkpoint , models and able to call that API ? Is it possible ?