forked from open-runtimes/examples
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feature #4155 Text To Speech in Python (#36)
Co-authored-by: rubynguyen1510 <[email protected]>
- Loading branch information
1 parent
92ea32b
commit 12462c1
Showing
5 changed files
with
685 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# 🗣️ Text To Speech with Google, Azure and AWS API | ||
|
||
A Python cloud function for text to speech synthesis using [Google](https://cloud.google.com/text-to-speech), [Azure](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) and [AWS](https://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.html). | ||
|
||
### Supported Providers and Language Codes | ||
| Providers | Language Code (BCP-47) | | ||
| ----------- | ----------- | | ||
| Google |[Google Language Code](https://cloud.google.com/text-to-speech/docs/voices) | | ||
| Azure |[Azure Language Code](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt) | | ||
| AWS |[AWS Language Code](https://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.html) | | ||
|
||
### Example Input: | ||
```json | ||
{ | ||
"provider":"<YOUR_PROVIDER_HERE>", | ||
"language":"<YOUR_LANGUAGE_CODE>", | ||
"text":"Hello world!" | ||
} | ||
``` | ||
### Example output: | ||
```json | ||
{ | ||
"success":true, | ||
"audio_bytes":"iVBORw0KGgoAAAANSUhE...o6Ie+UAAAAASU5CYII=" | ||
} | ||
``` | ||
### Example error output: | ||
```json | ||
{ | ||
"success":false, | ||
"error":"Missing API_KEY" | ||
} | ||
``` | ||
|
||
## 📝 Environment Variables | ||
List of environment variables used by this cloud function: | ||
- **API_KEY** - Supported with Google, Azure, and AWS. | ||
- **PROJECT_ID** - Supported with Google. | ||
- **SECRET_API_KEY** - Supported with AWS. | ||
|
||
| **Google**| **AWS** | **Azure** | | ||
| -------- | -------- | -------- | | ||
|API_KEY | API_KEY | API_KEY | ||
|PROJECT_ID |SECRET_API_KEY| | ||
|
||
|
||
## 🚀 Deployment | ||
|
||
1. Clone this repository, and enter this function folder: | ||
|
||
```bash | ||
git clone https://github.com/open-runtimes/examples.git && cd examples | ||
cd python/text-to-speech | ||
``` | ||
|
||
2. Enter this function folder and build the code: | ||
```bash | ||
docker run --rm --interactive --tty --volume $PWD:/usr/code openruntimes/python:v2-3.10 sh /usr/local/src/build.sh | ||
``` | ||
As a result, a `code.tar.gz` file will be generated. | ||
|
||
3. Start the Open Runtime: | ||
```bash | ||
docker run -p 3000:3000 -e INTERNAL_RUNTIME_KEY=secret-key -e INTERNAL_RUNTIME_ENTRYPOINT=main.py --rm --interactive --tty --volume $PWD/code.tar.gz:/tmp/code.tar.gz:ro openruntimes/python:v2-3.10 sh /usr/local/src/start.sh | ||
``` | ||
|
||
> Make sure to replace `YOUR_API_KEY` with your key. | ||
Your function is now listening on port `3000`, and you can execute it by sending `POST` request with appropriate authorization headers. To learn more about runtime, you can visit Python runtime [README](https://github.com/open-runtimes/open-runtimes/tree/main/openruntimes/python:v2-3.10). | ||
|
||
4. Run the cURL function to send request. | ||
>Google Curl Example (Supports only API_KEY and PROJECT_ID in Environment Variables) | ||
```bash | ||
curl http://localhost:3000/ -H "X-Internal-Challenge: secret-key" -H "Content-Type: application/json" -d '{"payload": {"provider": "google", "language": "en-US", "text": "Hello World!"}, "variables": {"API_KEY": "<YOUR_API_KEY>", "PROJECT_ID": "<YOUR_PROJECT_ID>"}}' | ||
``` | ||
>Azure Curl Example (Supports API_KEY in Environment Variables) | ||
```bash | ||
curl http://localhost:3000/ -H "X-Internal-Challenge: secret-key" -H "Content-Type: application/json" -d '{"payload": {"provider": "azure", "language":"en-US", "text": "Hello World!"}, "variables": {"API_KEY": "<YOUR_API_KEY>"}}' | ||
``` | ||
>AWS Curl Example (Supports API_KEY and SECRET_API_KEY in Environment Variables) | ||
```bash | ||
curl http://localhost:3000/ -H "X-Internal-Challenge: secret-key" -H "Content-Type: application/json" -d '{"payload": {"provider": "aws", "language":"en-US", "text":"Hello World!"}, "variables": {"API_KEY": "<YOUR_API_KEY>", "SECRET_API_KEY": "<YOUR_SECRET_API_KEY>"}}' | ||
``` | ||
## 📝 Notes | ||
- This function is designed for use with Appwrite Cloud Functions. You can learn more about it in [Appwrite docs](https://appwrite.io/docs/functions). | ||
- This example is compatible with Python 3.10. Other versions may work but are not guaranteed to work as they haven't been tested. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,294 @@ | ||
"""Synthesize text to speech using Google, Azure and AWS API.""" | ||
# Standard library | ||
import abc | ||
import base64 | ||
|
||
# Third party | ||
import boto3 | ||
import requests | ||
from google.cloud import texttospeech | ||
|
||
|
||
class TextToSpeech(): | ||
"""Base class for Text to Speech.""" | ||
|
||
def __init__(self, req: requests) -> None: | ||
"""Initialize class method.""" | ||
self.validate_request(req) | ||
|
||
@abc.abstractmethod | ||
def validate_request(self, req: requests) -> None: | ||
"""Abstract validate request method for providers.""" | ||
|
||
@abc.abstractmethod | ||
def speech(self, text: str, language: str) -> bytes: | ||
"""Abstract speech method for providers.""" | ||
|
||
|
||
class Google(TextToSpeech): | ||
"""Represent the implementation of Google text to speech.""" | ||
|
||
def validate_request(self, req: requests) -> None: | ||
""" | ||
Validate the request data for Google text to speech. | ||
Input: | ||
req (request): The request provided by the user. | ||
Raises: | ||
ValueError: If any required value is missing or invalid. | ||
""" | ||
if not req.variables.get("API_KEY"): | ||
raise ValueError("Missing API_KEY.") | ||
if not req.variables.get("PROJECT_ID"): | ||
raise ValueError("Missing PROJECT_ID.") | ||
self.api_key = req.variables.get("API_KEY") | ||
self.project_id = req.variables.get("PROJECT_ID") | ||
|
||
def speech(self, text: str, language: str) -> bytes: | ||
""" | ||
Convert the given text into speech with the Google text to speech API. | ||
Input: | ||
text: The text to be converted into speech. | ||
language: The language code (BCP-47 format). | ||
Returns: | ||
bytes: The synthezied speech in bytes. | ||
""" | ||
# Instantiate a client. | ||
client = texttospeech.TextToSpeechClient( | ||
client_options={ | ||
"api_key": self.api_key, | ||
"quota_project_id": self.project_id, | ||
} | ||
) | ||
# Set the text input to be synthesized. | ||
synthesis_input = texttospeech.SynthesisInput(text=text) | ||
# Build the voice request, select the language code ("en-US") | ||
# and the ssml voice gender is neutral. | ||
voice = texttospeech.VoiceSelectionParams( | ||
language_code=language, | ||
ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL, | ||
) | ||
# Select the type of audio file you want returned. | ||
audio_config = texttospeech.AudioConfig( | ||
audio_encoding=texttospeech.AudioEncoding.MP3, | ||
) | ||
# Perform the text-to-speech request on the text input | ||
# with the selected voice parameters and audio file type. | ||
response = client.synthesize_speech( | ||
input=synthesis_input, | ||
voice=voice, | ||
audio_config=audio_config, | ||
) | ||
return response.audio_content | ||
|
||
|
||
class Azure(TextToSpeech): | ||
"""Represent the implementation of Azure text to speech.""" | ||
|
||
VOICE = "en-US-ChristopherNeural" | ||
GENDER = "Male" | ||
REGION = "westus" | ||
FETCH_TOKEN_URL = ( | ||
"https://westus.api.cognitive.microsoft.com/sts/v1.0/issuetoken" | ||
) | ||
|
||
def validate_request(self, req: requests) -> None: | ||
""" | ||
Validate the request data for Azure text to speech. | ||
Input: | ||
req (request): The request provided by the user. | ||
Raises: | ||
ValueError: If any required value is missing or invalid. | ||
""" | ||
if not req.variables.get("API_KEY"): | ||
raise ValueError("Missing API_KEY.") | ||
self.api_key = req.variables.get("API_KEY") | ||
|
||
def get_token(self, subscription_key: str) -> str: | ||
"""Return an Azure token for a given subscription key.""" | ||
headers = { | ||
"Ocp-Apim-Subscription-Key": subscription_key | ||
} | ||
# Send request with subscription key. | ||
response = requests.post( | ||
self.FETCH_TOKEN_URL, | ||
headers=headers, | ||
timeout=10, | ||
) | ||
# Grab access token valid for 10 minutes. | ||
response.raise_for_status() | ||
return response.text | ||
|
||
def speech(self, text: str, language: str) -> bytes: | ||
""" | ||
Convert the given text into speech with the Google text to speech API. | ||
Input: | ||
text: The text to be converted into speech. | ||
language: The language code (BCP-47 format). | ||
Returns: | ||
bytes: The synthezied speech in bytes. | ||
""" | ||
# Endpoint for cognitive services speech api | ||
url = ( | ||
f"https://{self.REGION}.tts." | ||
"speech.microsoft.com/cognitiveservices/v1" | ||
) | ||
# Headers and auth for request. | ||
headers_azure = { | ||
"Content-type": "application/ssml+xml", | ||
"Authorization": "Bearer " + self.get_token(self.api_key), | ||
"X-Microsoft-OutputFormat": "audio-16khz-32kbitrate-mono-mp3", | ||
} | ||
data_azure = ( | ||
f"<speak version='1.0' xml:lang='{language}'><voice " | ||
f"xml:lang='{language}' xml:gender='{self.GENDER}' " | ||
f"name='{self.VOICE}'>{text}</voice></speak>" | ||
) | ||
response = requests.request( | ||
"POST", | ||
url, | ||
headers=headers_azure, | ||
data=data_azure, | ||
timeout=10, | ||
) | ||
response.raise_for_status() | ||
return response.content | ||
|
||
|
||
class AWS(TextToSpeech): | ||
"""Represent the implementation of AWS text to speech. """ | ||
|
||
VOICE_ID = "Joanna" | ||
REGION = "us-west-2" | ||
|
||
def validate_request(self, req: requests) -> None: | ||
""" | ||
Validate the request data for AWS text to speech. | ||
Input: | ||
req (request): The request provided by the user. | ||
Raises: | ||
ValueError: If any required value is missing or invalid. | ||
""" | ||
if not req.variables.get("API_KEY"): | ||
raise ValueError("Missing API_KEY.") | ||
if not req.variables.get("SECRET_API_KEY"): | ||
raise ValueError("Missing SECRET_API_KEY.") | ||
self.api_key = req.variables.get("API_KEY") | ||
self.secret_api_key = req.variables.get("SECRET_API_KEY") | ||
|
||
def speech(self, text: str, language: str) -> bytes: | ||
""" | ||
Converts the given text into speech with the AWS text to speech API. | ||
Input: | ||
text: The text to be converted into speech. | ||
language: The language code (BCP-47 format). | ||
Returns: | ||
bytes: The synthezied speech in bytes. | ||
""" | ||
# Call polly client using boto3.session. | ||
polly_client = boto3.Session( | ||
aws_access_key_id=self.api_key, | ||
aws_secret_access_key=self.secret_api_key, | ||
region_name=self.REGION, | ||
).client("polly") | ||
|
||
# Get response from polly client. | ||
response = polly_client.synthesize_speech( | ||
VoiceId=AWS.VOICE_ID, | ||
OutputFormat="mp3", | ||
Text=text, | ||
LanguageCode=language, | ||
) | ||
return response["AudioStream"].read() | ||
|
||
|
||
def validate_common(req: requests) -> tuple[str, str, str]: | ||
""" | ||
Validate common fields in request. | ||
Input: | ||
req (request): The request provided by the user. | ||
Returns: | ||
(tuple): A tuple containing the text and language from the request. | ||
Raises: | ||
ValueError: If any of the common fields (provider, text, language) | ||
are missing in the request payload. | ||
""" | ||
# Check if the payload is empty. | ||
if not req.payload: | ||
raise ValueError("Missing Payload.") | ||
|
||
# Check if variables is empty. | ||
if not req.variables: | ||
raise ValueError("Missing Variables.") | ||
|
||
# Check if provider is empty. | ||
if not req.payload.get("provider"): | ||
raise ValueError("Missing Provider.") | ||
|
||
# Check if text is empty. | ||
if not req.payload.get("text"): | ||
raise ValueError("Missing Text.") | ||
|
||
# Check if language is empty. | ||
if not req.payload.get("language"): | ||
raise ValueError("Missing Language.") | ||
|
||
# Return the text and langage. | ||
return ( | ||
req.payload.get("provider").lower(), | ||
req.payload.get("text"), | ||
req.payload.get("language"), | ||
) | ||
|
||
|
||
def main(req: requests, res: str) -> str: | ||
|
||
""" | ||
Main Function for Text to Speech. | ||
Input: | ||
req(request): The request from the user. | ||
res(json): The response for the user. | ||
Returns: | ||
(json): JSON representing the success value of the text to speech api | ||
containing the synthesized audio in base64 encoded format. | ||
""" | ||
try: | ||
provider, text, language = validate_common(req) | ||
if provider == "google": | ||
provider_class = Google(req) | ||
elif provider == "azure": | ||
provider_class = Azure(req) | ||
elif provider == "aws": | ||
provider_class = AWS(req) | ||
else: | ||
raise ValueError("Invalid Provider.") | ||
except ValueError as value_error: | ||
return res.json({ | ||
"success": False, | ||
"error": str(value_error), | ||
}) | ||
try: | ||
audio_bytes = provider_class.speech(text, language) | ||
except Exception as error: | ||
return res.json({ | ||
"success": False, | ||
"error": f"{type(error).__name__}: {error}", | ||
}) | ||
return res.json({ | ||
"success": True, | ||
"audio_bytes": base64.b64encode(audio_bytes).decode(), | ||
}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
boto3==1.28.9 | ||
botocore==1.31.9 | ||
google-cloud-texttospeech==2.14.1 | ||
parameterized==0.9.0 | ||
requests==2.31.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
//NExAASoAEkAUAAAXwGQ/p/gDhGY/4fADM/x/ADAzMfwfAM6/o/AyB+Y/N4A7pkc+A7AyOeH/AMwZD/0+A4P8/4DgDo/4fADo/w/gBgZ4f4/AR63R+AhfA81AUf47wD//NExAgUiyowAZNoAIHidDXtgtAXBH/EmB2CZkr/4WsZZff/+tOHPLw8P/8S8Yc3RN3Hp//+XEEXQNzM3////qNHMzcwLiCac0/////zM3bWncwNC+boKhAsSKDAV90A//NExAgS4cKcAYNQAM+vABFdTDxmRCqimDwaEZaIkBAUgNs/NLnbGEhMNPzrVIFPvJm/ONfYwkE+Ri5l/+Y11WLnAAXV+n6HjBQmc//+tLaagQESJOt3hyTZiQnjMvw6//NExA8TMWKoAc9IAFYEPV3dawoSzDBkasnJWCooQpTjVuz/P9/zz/3/fWQ/j3LQPH1nZe+++0EQcPlKrkoFWGBCMftpbLOTvFRagKXtjORQvVLmEjQcfnk70D7pU44b//NExBUWuTagAMvMcDtwm6jSR/sJNlQTAxWVzG+QdAJrAomaWUlU/yzNX+T3z//9m1medMBsUYkgeGBEYgFAbSFGO0OOAuQiizvfUOD0uoe6fx3u7JlQ3nZYLMs3UOao//NExA0VgWawAMYMlKw4YVjA12JJyT0fBI0uKzxl83MlCXgQieqwO2nyudv6pLP4Um/zt4973h+2fx+9/T03XRCSZPHJp1AIMjsoo0EPtWz/2F3AQwrvLoMXLesIVzrT//NExAoU8W60AMYQlCR77CH4FFF5QIEOLk6ALq4t1JeV0sKALV+zDpq9luT6/3VHl9Bz/sfzfp8L9+8XBrYdg8DUkB5lUJpkSKowQzVc3A1nctiP/xwx39CitfwVQwyz//NExAkSEVq8AMPMlGLWpsVa02Kdgz46LEOgRyRwm07wNAY0rsvMVTqaNSaNaBLqk1fdtfX/////d23VmoGgthZ1zc1CLJHC5Iwd+lXFogOmaGSeL2E8uLOd1kyECQp+//NExBMRIVK8AHvMlQ+zheivslKjEZU+DDLC2nWswY8a943vjWr43vb43b//vrM9MTpIBgqBA2uf6V1A6h+l3xIBQ8u7JU/BYnzwQGKPgGCQmpwHEcgkLWpzrH6vjRP4//NExCESeWa4AHvKlIeQc2jrfKlxfzTyRZtbrjOldd1/qlHFhpzCYsLsGi1GMPMokZxSeu/S3QHENVuJbVvNaxfC+qgvg1lYykKqXQlx/F4LkujlMU41Cca2Jqc4mA8q//NExCoQqS64AHvOcXi00oapjnJtRf/3Z0UdID4lF2jRXMIZNY3dcKxY4X0aFLncnn0n3LFmGErC9H2iDBLYrRNR/k9KxHHyrTdblCaBtgNCwYDxoh5Io7nL6O/+iW9R//NExDoSESasAMPOcKoPBwiJnnR4YDa///9iVbijphYgkTSaw1ltIChmtbtuzeBsrOSMQDMEwSHw4TiQwJKIqCMTAyENltitougYm95rITet3eZC3DO8SlmflZhSkg+c//NExEQSKVqkAMMElP9P//yq0quA2ohqww/Sw/UlkPxyOUTAkojm5Hgotq1QQFREOmA2uC/BAyCQPl0IJhsL6jyfheXFO5y8v7r7J02CGmoQLHQ2FHtO0////JC9kpDN//NExE4SqRakAMPScCRKAx25MZxaO1owbRoARBXA0NBUDMCC2kWrKyUXIVSDBKAoH4FC70CiFfHJt1BiN05fARmQVcPdKkv////91iPR/qXoEElSkhA0OySG4jFK0yWH//NExFYRYPagAMJScMyF0JMIDZKabcwpclYV40vrriFJiUA6fKsmuIzlQulz27tnbT7qFaOoHLmUBUjQKX////zPbSrEypUxIgujlFb0Tuy9/C+lWcqAYWnLdWNI/kta//NExGMRmQqcAMJYcLnXtaUtjdQNLSbQWGdFmSN5azCHw74RtZxtSY4yJlAYIgdH////Q7///0V+DkRhYkxqkghyLSmH5lyn/hxaroW4NooZzz1+HbM/s8/8bV+1eCpd//NExG8RqRKYAMPScOeDQBSwJiwJAqPhVEySSHCAUpszVSn2xcMjCRUBHTVn////pvX29P+UPPDM4HNw/TEZVLxQGpgXdRNjmgcfaU02He+SJPJVIcJhz54mlxax8Lmp//NExHsUQR6QAVhIACczCuCXAKIZYXQTwE/AgKzIyWcF8DOBGi6J6MMLQXygkPUexk7VLMiEamwww9lj2JUpGxK0tfmrLrSqrQuzf/91oqN0klHupa0Eta//1MnP9kTY//NExH0hEqJ4AZpoAdb/bIFpG9ZVgkPqlevVgL5Om9klcaBqFjLNI5RxmxKalnDHdnn/tMvtZtMc5Nevu7ekBeYHYRUIxFqupOqVicgAsbol8L5rXO29ad0wct7aU91A//NExEsbMWoUAdhgAGtcIFUxp9qDKHvUTbKkhEQJE6mXVi1Jo8oDXLJJUKKYp61BEa1KIwLTSqjjMMxqmrTCrDVCpUKoHway4xry8rjHfHNj8lKcSElSCwqRKCoVPQoc//NExDEYwUYAAMJScJaWBIEg0TJiklrylcY+TSjxUioGn15INPLA1iIeCrhMDT+uIpGdeInnmDXM5Iqtj6gaCYhmCuKPnnWWtyq4hGg+XJD7LTKIhLHF2NyeSiqtdXJV//NExCESkbWMAGJElCTXhrOxioh2//oq/KiqjtuUyKn/3KGBgwTMmQEK/xUW/+Kijf6haoWFVUxBTTXYUNNntBQ28i7Tc/GCN14Jo3UyUTcUPONmtTU0xF9lFfBVi5jj//NExCkAAANIAAAAAF5UP5IJoGgaCohx+SxSiw7F3fQXPuixe0kFAeGFnkGIji9wn/yWLDv6Abn5/oiIiIn//T4iJXcO9dw4GLeaF13d3Qrz+Cw8PH+8/4AH8AQnAABH//NExHwAAANIAAAAAIAYe072jOCP8zxCBSiPGaUSFnSoXydOnhkDLKzVx9ejtbWzWtGjkgVKZkCDo6C/j/4q5LBRR++nDZYbjvm2VZFRfG7hdWSy/Oacyi/v//lRJuX+//NExMwH4CQAAPe8AEuqxZZf4+F5V3m/Kb/6/GoAqLBUhFduI/76O27EYo8cL+NGlM7PUnCRQo8y4djSij4dpOEgQEfDtqiL/9nKinb5TBlI7OVP/+YoYGDIfULC4rFR//NExP8Y6f3sAHoGmVEj//rFcVFjVbP4qKcWTEFNRTMuMTAwqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq//NExO4V0LIAAHpMTaqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq//NExOkUQbmYAMGElKqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq |
Oops, something went wrong.