Text to Speech using Amazon Polly with React JS & Python

By April 24, 2020 May 18th, 2020 AI, AWS, Blogs, Chatbot

Written by Ishita Saha, Software Engineer, Powerupcloud Technologies

In this blog, we will discuss how we can integrate AWS Polly using Python & React JS to a chatbot application.

Use Case

We are developing a Chatbot Framework where we use AWS Polly for an exquisite & lively voice experience for our users

Problem Statement

We are trying to showcase how we can integrate AWS Polly voice services with our existing chatbot application built on React JS & Python.

What is AWS Polly ?

Amazon Polly is a service that turns text into lifelike speech. Amazon Polly enables existing applications to speak as a first-class feature and creates the opportunity for entirely new categories of speech-enabled products, from mobile apps and cars to devices and appliances. Amazon Polly includes dozens of lifelike voices and support for multiple languages, so you can select the ideal voice and distribute your speech-enabled applications in many geographies. Amazon Polly is easy to use – you just send the text you want converted into speech to the Amazon Polly API, and Amazon Polly immediately returns the audio stream to your application so you can play it directly or store it in a standard audio file format, such as MP3.

AWS Polly is easy to use. We only need an AWS subscription. We can test Polly directly from the AWS Console.

Go to :

https://console.aws.amazon.com/polly/home/SynthesizeSpeech

There is an option to select Voice from Different Languages & Regions.

Why Amazon Polly?

You can use Amazon Polly to power your application with high-quality spoken output. This cost-effective service has very low response times, and is available for virtually any use case, with no restrictions on storing and reusing generated speech.

Implementation

User provides input to the Chatbot. This Input goes to our React JS Frontend, which interacts internally with a Python Application in the backend. This Python application is responsible for interacting with AWS Polly and sending response back to the React app which plays the audio streaming output as mp3.

React JS

In this implementation, we are using the Audio() constructor.

The Audio() constructor creates and returns a new HTMLAudioElement which can be either attached to a document for the user to interact with and/or listen to, or can be used offscreen to manage and play audio.

Syntax :

audio = new Audio(url);

Methods :

play – Make the media object play or resume after pausing.
pause – Pause the media object.
load – Reload the media object.
canPlayType – Determine if a media type can be played.
 
Here, we are using only play() and pause() methods in our implementation.

Step 1: We have to initialize a variable into the state.

this.state = {
audio : "",
languageName: "",
voiceName: ""
}

Step 2 : Remove all unwanted space characters from input.

response = response.replace(/\//g, " ");
response = response.replace(/(\r\n|\n|\r)/gm, "");

Step 3 : If any existing reply from Bot is already in play. We can stop it.

if (this.state.audio != undefined) {
     this.state.audio.pause();
   }

Step 4 :

This method interacts with our Python Application. It sends requests to our Python backend with the following parameters. We create a new Audio() object. We are passing the following parameters dynamically to handle speaker() method :

  • languageName
  • voiceName
  • inputText
handleSpeaker = inputText => {
this.setState({
     audio: ""
   });
   this.setState({
     audio: new Audio(
       POLLY_API +
         "/texttospeech?LanguageCode=" +
         this.state.languageName +
         "&VoiceId=" +
         this.state.voiceName +
         "&OutputFormat=mp3"
    "&Text=" + inputText
     )
   });
}

Step 5 : On getting the response from our POLLY_API Python App, we will need to play the mp3 file.

this.state.audio.play();

Python

The Python application communicates with AWS Polly using AWS Python SDK – boto3.

Step 1: Now we will need to configure AWS credentials for accessing AWS Polly by using Secret Key, Access Keys & Region.

import boto3
def connectToPolly():
 polly_client = boto3.Session(
     aws_access_key=”xxxxxx”,
     aws_secret_key=”xxxxxx”,
     region=”xxxxxx”).client('polly')

 return polly_client

Here, we are creating a polly client to access AWS Polly Services.

Step 2: We are using synthesize_speech() to get an audio stream file.

Request Syntax :

response = client.synthesize_speech(
    Engine='standard'|'neural',
    LanguageCode='arb'|'cmn-CN'|'cy-GB'|'da-DK'|'de-DE'|'en-AU'|'en-GB'|'en-GB-WLS'|'en-IN'|'en-US'|'es-ES'|'es-MX'|'es-US'|'fr-CA'|'fr-FR'|'is-IS'|'it-IT'|'ja-JP'|'hi-IN'|'ko-KR'|'nb-NO'|'nl-NL'|'pl-PL'|'pt-BR'|'pt-PT'|'ro-RO'|'ru-RU'|'sv-SE'|'tr-TR',
        				OutputFormat='json'|'mp3'|'ogg_vorbis'|'pcm',
    									TextType='ssml'|'text',
    VoiceId='Aditi'|'Amy'|'Astrid'|'Bianca'|'Brian'|'Camila'|'Carla'|'Carmen'|'Celine'|'Chantal'|'Conchita'|'Cristiano'|'Dora'|'Emma'|'Enrique'|'Ewa'|'Filiz'|'Geraint'|'Giorgio'|'Gwyneth'|'Hans'|'Ines'|'Ivy'|'Jacek'|'Jan'|'Joanna'|'Joey'|'Justin'|'Karl'|'Kendra'|'Kimberly'|'Lea'|'Liv'|'Lotte'|'Lucia'|'Lupe'|'Mads'|'Maja'|'Marlene'|'Mathieu'|'Matthew'|'Maxim'|'Mia'|'Miguel'|'Mizuki'|'Naja'|'Nicole'|'Penelope'|'Raveena'|'Ricardo'|'Ruben'|'Russell'|'Salli'|'Seoyeon'|'Takumi'|'Tatyana'|'Vicki'|'Vitoria'|'Zeina'|'Zhiyu'
)

Response Syntax :

{
    'AudioStream': StreamingBody(),
    'ContentType': 'string',
    'RequestCharacters': 123
}

We are calling textToSpeech Flask API which accepts parameters sent by React and further proceeds to call AWS Polly internally. The response is sent back to React as a mp3 file. The React application then plays out the audio file for the user.

@app.route('/textToSpeech', methods=['GET'])
def textToSpeech():
 languageCode = request.args.get('LanguageCode')
 voiceId = request.args.get('VoiceId')
 outputFormat = request.args.get('OutputFormat')
 polly_client = credentials.connectToPolly(aws_access_key, aws_secret_key, region)
 response = polly_client.synthesize_speech(Text="<speak>" + text + "</speak>",    
     LanguageCode=languageCode,
     VoiceId=voiceId,
     OutputFormat=outputFormat,
     TextType='ssml')
 return send_file(response.get("AudioStream"),
           AUDIO_FORMATS['mp3'])

Conclusion

This blog showcases the simple implementation of React JS integration with Python to utilize AWS Polly services. This can be used as a reference for such use cases with chatbots.

Leave a Reply