Build a Text to Speech API with FastAPI and Polly
Text to speech technology has revolutionized the way we interact with computers and devices. It allows users to convert written text into natural-sounding speech, which can be useful for a variety of applications such as accessibility, language learning, and even virtual assistants.
In this article, we will explore how to build a Text to Speech API using FastAPI, a modern, fast (high-performance), web framework for building APIs with Python, and Polly, a service provided by Amazon Web Services (AWS) that turns text into lifelike speech.
Setting Up FastAPI
First, let’s install FastAPI and Uvicorn, which is a lightning-fast ASGI server implementation to run FastAPI:
“`html
pip install fastapi
pip install uvicorn
“`
Now, let’s create a new file called app.py
and import FastAPI:
“`html
from fastapi import FastAPI
“`
We can then create a new instance of the FastAPI class:
“`html
app = FastAPI()
“`
Implementing the Text to Speech API
Next, let’s import the necessary libraries to interact with Polly:
“`html
import boto3
“`
We will need to set up credentials to interact with the Polly service. You can set up your AWS credentials using environment variables or in the ~/.aws/credentials
file.
Now, we can create a new endpoint to generate speech from text using the /text-to-speech
path:
“`html
@app.get("/text-to-speech")
async def text_to_speech(text: str):
polly = boto3.client("polly")
response = polly.synthesize_speech(
Engine="standard",
LanguageCode="en-US",
Text=text,
VoiceId="Joanna"
)
return response["AudioStream"].read()
“`
In this example, we are using the synthesize_speech
method from the Polly client to generate speech from the provided text with the voice “Joanna” in US English. The audio stream is then returned as the response.
Running the API
Finally, we can run the FastAPI app using Uvicorn:
“`html
uvicorn app:app --reload
“`
Now, you can access the Text to Speech API at http://127.0.0.1:8000/text-to-speech?text=Hello, world!
and it will return the audio stream of the synthesized speech.
With just a few lines of code, we have built a simple Text to Speech API using FastAPI and Polly. You can extend this further by adding input validations, error handling, and additional features such as multiple languages and voices.
Text to Speech technology has opened up a world of possibilities for developers to create innovative applications that can benefit users in various ways. By leveraging the power of FastAPI and services like Polly, you can easily incorporate Text to Speech capabilities into your applications and bring a whole new level of accessibility and functionality to your users.
very helpful video thanks man!
I used your code for my own fast API it is really helpful
now I am building a user interface with jinja2 using a HTML code but anytime I try to generate the audio output I usually get the response body from fast API instead of the audible audio
can you prescribe a solution I can use to solve this
Thorough and comprehensive. Thank you 🔥