Creating a Text to Speech API using FastAPI and Polly

Posted by

Build a Text to Speech API with FastAPI and Polly

Text to speech technology has revolutionized the way we interact with computers and devices. It allows users to convert written text into natural-sounding speech, which can be useful for a variety of applications such as accessibility, language learning, and even virtual assistants.

In this article, we will explore how to build a Text to Speech API using FastAPI, a modern, fast (high-performance), web framework for building APIs with Python, and Polly, a service provided by Amazon Web Services (AWS) that turns text into lifelike speech.

Setting Up FastAPI

First, let’s install FastAPI and Uvicorn, which is a lightning-fast ASGI server implementation to run FastAPI:

“`html

pip install fastapi
pip install uvicorn

“`

Now, let’s create a new file called app.py and import FastAPI:

“`html

from fastapi import FastAPI

“`

We can then create a new instance of the FastAPI class:

“`html

app = FastAPI()

“`

Implementing the Text to Speech API

Next, let’s import the necessary libraries to interact with Polly:

“`html

import boto3

“`

We will need to set up credentials to interact with the Polly service. You can set up your AWS credentials using environment variables or in the ~/.aws/credentials file.

Now, we can create a new endpoint to generate speech from text using the /text-to-speech path:

“`html

@app.get("/text-to-speech")
async def text_to_speech(text: str):
polly = boto3.client("polly")
response = polly.synthesize_speech(
Engine="standard",
LanguageCode="en-US",
Text=text,
VoiceId="Joanna"
)
return response["AudioStream"].read()

“`

In this example, we are using the synthesize_speech method from the Polly client to generate speech from the provided text with the voice “Joanna” in US English. The audio stream is then returned as the response.

Running the API

Finally, we can run the FastAPI app using Uvicorn:

“`html

uvicorn app:app --reload

“`

Now, you can access the Text to Speech API at http://127.0.0.1:8000/text-to-speech?text=Hello, world! and it will return the audio stream of the synthesized speech.

With just a few lines of code, we have built a simple Text to Speech API using FastAPI and Polly. You can extend this further by adding input validations, error handling, and additional features such as multiple languages and voices.

Text to Speech technology has opened up a world of possibilities for developers to create innovative applications that can benefit users in various ways. By leveraging the power of FastAPI and services like Polly, you can easily incorporate Text to Speech capabilities into your applications and bring a whole new level of accessibility and functionality to your users.

0 0 votes
Article Rating
3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@MsKakashi2012
10 months ago

very helpful video thanks man!

@user-wp1zc3jo1b
10 months ago

I used your code for my own fast API it is really helpful
now I am building a user interface with jinja2 using a HTML code but anytime I try to generate the audio output I usually get the response body from fast API instead of the audible audio
can you prescribe a solution I can use to solve this

@petersamuel6269
10 months ago

Thorough and comprehensive. Thank you 🔥