Feature #4155 Text To Speech in Python (#36)

Co-authored-by: rubynguyen1510 <[email protected]>
Mushmou · Aug 3, 2023 · 12462c1 · 12462c1
1 parent 92ea32b
commit 12462c1
Show file tree

Hide file tree

Showing 5 changed files with 685 additions and 0 deletions.
diff --git a/python/text-to-speech/README.md b/python/text-to-speech/README.md
@@ -0,0 +1,85 @@
+# 🗣️ Text To Speech with Google, Azure and AWS API
+
+A Python cloud function for text to speech synthesis using [Google](https://cloud.google.com/text-to-speech), [Azure](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) and [AWS](https://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.html).
+
+### Supported Providers and Language Codes
+| Providers | Language Code (BCP-47) |
+| ----------- | ----------- |
+| Google      |[Google Language Code](https://cloud.google.com/text-to-speech/docs/voices) |
+| Azure       |[Azure Language Code](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt)  |
+| AWS         |[AWS Language Code](https://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.html) |
+
+### Example Input:
+```json
+{
+    "provider":"<YOUR_PROVIDER_HERE>",
+    "language":"<YOUR_LANGUAGE_CODE>",
+    "text":"Hello world!"
+}
+```
+### Example output:
+```json
+{
+    "success":true,
+    "audio_bytes":"iVBORw0KGgoAAAANSUhE...o6Ie+UAAAAASU5CYII="
+}
+```
+### Example error output:
+```json
+{
+    "success":false,
+    "error":"Missing API_KEY"
+}
+```
+
+## 📝 Environment Variables
+List of environment variables used by this cloud function:
+- **API_KEY** - Supported with Google, Azure, and AWS.
+- **PROJECT_ID** - Supported with Google.
+- **SECRET_API_KEY** - Supported with AWS.
+
+| **Google**| **AWS** | **Azure** |
+| --------  | --------  | -------- |
+|API_KEY    | API_KEY   | API_KEY
+|PROJECT_ID |SECRET_API_KEY| 
+
+
+## 🚀 Deployment
+
+1. Clone this repository, and enter this function folder:
+
+```bash
+git clone https://github.com/open-runtimes/examples.git && cd examples
+cd python/text-to-speech
+```
+
+2. Enter this function folder and build the code:
+```bash
+docker run --rm --interactive --tty --volume $PWD:/usr/code openruntimes/python:v2-3.10 sh /usr/local/src/build.sh
+```
+As a result, a `code.tar.gz` file will be generated.
+
+3. Start the Open Runtime:
+```bash
+docker run -p 3000:3000 -e INTERNAL_RUNTIME_KEY=secret-key -e INTERNAL_RUNTIME_ENTRYPOINT=main.py --rm --interactive --tty --volume $PWD/code.tar.gz:/tmp/code.tar.gz:ro openruntimes/python:v2-3.10 sh /usr/local/src/start.sh
+```
+
+> Make sure to replace `YOUR_API_KEY` with your key.
+Your function is now listening on port `3000`, and you can execute it by sending `POST` request with appropriate authorization headers. To learn more about runtime, you can visit Python runtime [README](https://github.com/open-runtimes/open-runtimes/tree/main/openruntimes/python:v2-3.10).
+
+4. Run the cURL function to send request.
+>Google Curl Example (Supports only API_KEY and PROJECT_ID in Environment Variables)
+```bash
+curl http://localhost:3000/ -H "X-Internal-Challenge: secret-key" -H "Content-Type: application/json" -d '{"payload": {"provider": "google", "language": "en-US", "text": "Hello World!"}, "variables": {"API_KEY": "<YOUR_API_KEY>", "PROJECT_ID": "<YOUR_PROJECT_ID>"}}'
+```
+>Azure Curl Example (Supports API_KEY in Environment Variables)
+```bash
+curl http://localhost:3000/ -H "X-Internal-Challenge: secret-key" -H "Content-Type: application/json" -d '{"payload": {"provider": "azure", "language":"en-US", "text": "Hello World!"}, "variables": {"API_KEY": "<YOUR_API_KEY>"}}'
+```
+>AWS Curl Example (Supports API_KEY and SECRET_API_KEY in Environment Variables)
+```bash
+curl http://localhost:3000/ -H "X-Internal-Challenge: secret-key" -H "Content-Type: application/json" -d '{"payload": {"provider": "aws", "language":"en-US", "text":"Hello World!"}, "variables": {"API_KEY": "<YOUR_API_KEY>", "SECRET_API_KEY": "<YOUR_SECRET_API_KEY>"}}'
+```
+## 📝 Notes
+- This function is designed for use with Appwrite Cloud Functions. You can learn more about it in [Appwrite docs](https://appwrite.io/docs/functions).
+- This example is compatible with Python 3.10. Other versions may work but are not guaranteed to work as they haven't been tested.
diff --git a/python/text-to-speech/main.py b/python/text-to-speech/main.py
@@ -0,0 +1,294 @@
+"""Synthesize text to speech using Google, Azure and AWS API."""
+# Standard library
+import abc
+import base64
+
+# Third party
+import boto3
+import requests
+from google.cloud import texttospeech
+
+
+class TextToSpeech():
+    """Base class for Text to Speech."""
+
+    def __init__(self, req: requests) -> None:
+        """Initialize class method."""
+        self.validate_request(req)
+
+    @abc.abstractmethod
+    def validate_request(self, req: requests) -> None:
+        """Abstract validate request method for providers."""
+
+    @abc.abstractmethod
+    def speech(self, text: str, language: str) -> bytes:
+        """Abstract speech method for providers."""
+
+
+class Google(TextToSpeech):
+    """Represent the implementation of Google text to speech."""
+
+    def validate_request(self, req: requests) -> None:
+        """
+        Validate the request data for Google text to speech.
+
+        Input:
+            req (request): The request provided by the user.
+
+        Raises:
+            ValueError: If any required value is missing or invalid.
+        """
+        if not req.variables.get("API_KEY"):
+            raise ValueError("Missing API_KEY.")
+        if not req.variables.get("PROJECT_ID"):
+            raise ValueError("Missing PROJECT_ID.")
+        self.api_key = req.variables.get("API_KEY")
+        self.project_id = req.variables.get("PROJECT_ID")
+
+    def speech(self, text: str, language: str) -> bytes:
+        """
+        Convert the given text into speech with the Google text to speech API.
+
+        Input:
+            text: The text to be converted into speech.
+            language: The language code (BCP-47 format).
+
+        Returns:
+            bytes: The synthezied speech in bytes.
+        """
+        # Instantiate a client.
+        client = texttospeech.TextToSpeechClient(
+            client_options={
+                "api_key": self.api_key,
+                "quota_project_id": self.project_id,
+            }
+        )
+        # Set the text input to be synthesized.
+        synthesis_input = texttospeech.SynthesisInput(text=text)
+        # Build the voice request, select the language code ("en-US")
+        # and the ssml voice gender is neutral.
+        voice = texttospeech.VoiceSelectionParams(
+            language_code=language,
+            ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL,
+        )
+        # Select the type of audio file you want returned.
+        audio_config = texttospeech.AudioConfig(
+            audio_encoding=texttospeech.AudioEncoding.MP3,
+        )
+        # Perform the text-to-speech request on the text input
+        # with the selected voice parameters and audio file type.
+        response = client.synthesize_speech(
+            input=synthesis_input,
+            voice=voice,
+            audio_config=audio_config,
+        )
+        return response.audio_content
+
+
+class Azure(TextToSpeech):
+    """Represent the implementation of Azure text to speech."""
+
+    VOICE = "en-US-ChristopherNeural"
+    GENDER = "Male"
+    REGION = "westus"
+    FETCH_TOKEN_URL = (
+            "https://westus.api.cognitive.microsoft.com/sts/v1.0/issuetoken"
+    )
+
+    def validate_request(self, req: requests) -> None:
+        """
+        Validate the request data for Azure text to speech.
+
+        Input:
+            req (request): The request provided by the user.
+        Raises:
+            ValueError: If any required value is missing or invalid.
+        """
+        if not req.variables.get("API_KEY"):
+            raise ValueError("Missing API_KEY.")
+        self.api_key = req.variables.get("API_KEY")
+
+    def get_token(self, subscription_key: str) -> str:
+        """Return an Azure token for a given subscription key."""
+        headers = {
+            "Ocp-Apim-Subscription-Key": subscription_key
+        }
+        # Send request with subscription key.
+        response = requests.post(
+            self.FETCH_TOKEN_URL,
+            headers=headers,
+            timeout=10,
+        )
+        # Grab access token valid for 10 minutes.
+        response.raise_for_status()
+        return response.text
+
+    def speech(self, text: str, language: str) -> bytes:
+        """
+        Convert the given text into speech with the Google text to speech API.
+
+        Input:
+            text: The text to be converted into speech.
+            language: The language code (BCP-47 format).
+
+        Returns:
+            bytes: The synthezied speech in bytes.
+        """
+        # Endpoint for cognitive services speech api
+        url = (
+            f"https://{self.REGION}.tts."
+            "speech.microsoft.com/cognitiveservices/v1"
+        )
+        # Headers and auth for request.
+        headers_azure = {
+            "Content-type": "application/ssml+xml",
+            "Authorization": "Bearer " + self.get_token(self.api_key),
+            "X-Microsoft-OutputFormat": "audio-16khz-32kbitrate-mono-mp3",
+        }
+        data_azure = (
+            f"<speak version='1.0' xml:lang='{language}'><voice "
+            f"xml:lang='{language}' xml:gender='{self.GENDER}' "
+            f"name='{self.VOICE}'>{text}</voice></speak>"
+        )
+        response = requests.request(
+            "POST",
+            url,
+            headers=headers_azure,
+            data=data_azure,
+            timeout=10,
+        )
+        response.raise_for_status()
+        return response.content
+
+
+class AWS(TextToSpeech):
+    """Represent the implementation of AWS text to speech. """
+
+    VOICE_ID = "Joanna"
+    REGION = "us-west-2"
+
+    def validate_request(self, req: requests) -> None:
+        """
+        Validate the request data for AWS text to speech.
+
+        Input:
+            req (request): The request provided by the user.
+        Raises:
+            ValueError: If any required value is missing or invalid.
+        """
+        if not req.variables.get("API_KEY"):
+            raise ValueError("Missing API_KEY.")
+        if not req.variables.get("SECRET_API_KEY"):
+            raise ValueError("Missing SECRET_API_KEY.")
+        self.api_key = req.variables.get("API_KEY")
+        self.secret_api_key = req.variables.get("SECRET_API_KEY")
+
+    def speech(self, text: str, language: str) -> bytes:
+        """
+        Converts the given text into speech with the AWS text to speech API.
+
+        Input:
+            text: The text to be converted into speech.
+            language: The language code (BCP-47 format).
+
+        Returns:
+            bytes: The synthezied speech in bytes.
+        """
+        # Call polly client using boto3.session.
+        polly_client = boto3.Session(
+            aws_access_key_id=self.api_key,
+            aws_secret_access_key=self.secret_api_key,
+            region_name=self.REGION,
+        ).client("polly")
+
+        # Get response from polly client.
+        response = polly_client.synthesize_speech(
+            VoiceId=AWS.VOICE_ID,
+            OutputFormat="mp3",
+            Text=text,
+            LanguageCode=language,
+        )
+        return response["AudioStream"].read()
+
+
+def validate_common(req: requests) -> tuple[str, str, str]:
+    """
+    Validate common fields in request.
+
+    Input:
+        req (request): The request provided by the user.
+
+    Returns:
+        (tuple): A tuple containing the text and language from the request.
+
+    Raises:
+        ValueError: If any of the common fields (provider, text, language)
+        are missing in the request payload.
+    """
+    # Check if the payload is empty.
+    if not req.payload:
+        raise ValueError("Missing Payload.")
+
+    # Check if variables is empty.
+    if not req.variables:
+        raise ValueError("Missing Variables.")
+
+    # Check if provider is empty.
+    if not req.payload.get("provider"):
+        raise ValueError("Missing Provider.")
+
+    # Check if text is empty.
+    if not req.payload.get("text"):
+        raise ValueError("Missing Text.")
+
+    # Check if language is empty.
+    if not req.payload.get("language"):
+        raise ValueError("Missing Language.")
+
+    # Return the text and langage.
+    return (
+        req.payload.get("provider").lower(),
+        req.payload.get("text"),
+        req.payload.get("language"),
+    )
+
+
+def main(req: requests, res: str) -> str:
+
+    """
+    Main Function for Text to Speech.
+
+    Input:
+        req(request): The request from the user.
+        res(json): The response for the user.
+
+    Returns:
+        (json): JSON representing the success value of the text to speech api
+        containing the synthesized audio in base64 encoded format.
+    """
+    try:
+        provider, text, language = validate_common(req)
+        if provider == "google":
+            provider_class = Google(req)
+        elif provider == "azure":
+            provider_class = Azure(req)
+        elif provider == "aws":
+            provider_class = AWS(req)
+        else:
+            raise ValueError("Invalid Provider.")
+    except ValueError as value_error:
+        return res.json({
+            "success": False,
+            "error": str(value_error),
+        })
+    try:
+        audio_bytes = provider_class.speech(text, language)
+    except Exception as error:
+        return res.json({
+            "success": False,
+            "error": f"{type(error).__name__}: {error}",
+        })
+    return res.json({
+        "success": True,
+        "audio_bytes": base64.b64encode(audio_bytes).decode(),
+    })
diff --git a/python/text-to-speech/requirements.txt b/python/text-to-speech/requirements.txt
@@ -0,0 +1,5 @@
+boto3==1.28.9
+botocore==1.31.9
+google-cloud-texttospeech==2.14.1
+parameterized==0.9.0
+requests==2.31.0
diff --git a/python/text-to-speech/results.txt b/python/text-to-speech/results.txt
@@ -0,0 +1 @@
+//NExAASoAEkAUAAAXwGQ/p/gDhGY/4fADM/x/ADAzMfwfAM6/o/AyB+Y/N4A7pkc+A7AyOeH/AMwZD/0+A4P8/4DgDo/4fADo/w/gBgZ4f4/AR63R+AhfA81AUf47wD//NExAgUiyowAZNoAIHidDXtgtAXBH/EmB2CZkr/4WsZZff/+tOHPLw8P/8S8Yc3RN3Hp//+XEEXQNzM3////qNHMzcwLiCac0/////zM3bWncwNC+boKhAsSKDAV90A//NExAgS4cKcAYNQAM+vABFdTDxmRCqimDwaEZaIkBAUgNs/NLnbGEhMNPzrVIFPvJm/ONfYwkE+Ri5l/+Y11WLnAAXV+n6HjBQmc//+tLaagQESJOt3hyTZiQnjMvw6//NExA8TMWKoAc9IAFYEPV3dawoSzDBkasnJWCooQpTjVuz/P9/zz/3/fWQ/j3LQPH1nZe+++0EQcPlKrkoFWGBCMftpbLOTvFRagKXtjORQvVLmEjQcfnk70D7pU44b//NExBUWuTagAMvMcDtwm6jSR/sJNlQTAxWVzG+QdAJrAomaWUlU/yzNX+T3z//9m1medMBsUYkgeGBEYgFAbSFGO0OOAuQiizvfUOD0uoe6fx3u7JlQ3nZYLMs3UOao//NExA0VgWawAMYMlKw4YVjA12JJyT0fBI0uKzxl83MlCXgQieqwO2nyudv6pLP4Um/zt4973h+2fx+9/T03XRCSZPHJp1AIMjsoo0EPtWz/2F3AQwrvLoMXLesIVzrT//NExAoU8W60AMYQlCR77CH4FFF5QIEOLk6ALq4t1JeV0sKALV+zDpq9luT6/3VHl9Bz/sfzfp8L9+8XBrYdg8DUkB5lUJpkSKowQzVc3A1nctiP/xwx39CitfwVQwyz//NExAkSEVq8AMPMlGLWpsVa02Kdgz46LEOgRyRwm07wNAY0rsvMVTqaNSaNaBLqk1fdtfX/////d23VmoGgthZ1zc1CLJHC5Iwd+lXFogOmaGSeL2E8uLOd1kyECQp+//NExBMRIVK8AHvMlQ+zheivslKjEZU+DDLC2nWswY8a943vjWr43vb43b//vrM9MTpIBgqBA2uf6V1A6h+l3xIBQ8u7JU/BYnzwQGKPgGCQmpwHEcgkLWpzrH6vjRP4//NExCESeWa4AHvKlIeQc2jrfKlxfzTyRZtbrjOldd1/qlHFhpzCYsLsGi1GMPMokZxSeu/S3QHENVuJbVvNaxfC+qgvg1lYykKqXQlx/F4LkujlMU41Cca2Jqc4mA8q//NExCoQqS64AHvOcXi00oapjnJtRf/3Z0UdID4lF2jRXMIZNY3dcKxY4X0aFLncnn0n3LFmGErC9H2iDBLYrRNR/k9KxHHyrTdblCaBtgNCwYDxoh5Io7nL6O/+iW9R//NExDoSESasAMPOcKoPBwiJnnR4YDa///9iVbijphYgkTSaw1ltIChmtbtuzeBsrOSMQDMEwSHw4TiQwJKIqCMTAyENltitougYm95rITet3eZC3DO8SlmflZhSkg+c//NExEQSKVqkAMMElP9P//yq0quA2ohqww/Sw/UlkPxyOUTAkojm5Hgotq1QQFREOmA2uC/BAyCQPl0IJhsL6jyfheXFO5y8v7r7J02CGmoQLHQ2FHtO0////JC9kpDN//NExE4SqRakAMPScCRKAx25MZxaO1owbRoARBXA0NBUDMCC2kWrKyUXIVSDBKAoH4FC70CiFfHJt1BiN05fARmQVcPdKkv////91iPR/qXoEElSkhA0OySG4jFK0yWH//NExFYRYPagAMJScMyF0JMIDZKabcwpclYV40vrriFJiUA6fKsmuIzlQulz27tnbT7qFaOoHLmUBUjQKX////zPbSrEypUxIgujlFb0Tuy9/C+lWcqAYWnLdWNI/kta//NExGMRmQqcAMJYcLnXtaUtjdQNLSbQWGdFmSN5azCHw74RtZxtSY4yJlAYIgdH////Q7///0V+DkRhYkxqkghyLSmH5lyn/hxaroW4NooZzz1+HbM/s8/8bV+1eCpd//NExG8RqRKYAMPScOeDQBSwJiwJAqPhVEySSHCAUpszVSn2xcMjCRUBHTVn////pvX29P+UPPDM4HNw/TEZVLxQGpgXdRNjmgcfaU02He+SJPJVIcJhz54mlxax8Lmp//NExHsUQR6QAVhIACczCuCXAKIZYXQTwE/AgKzIyWcF8DOBGi6J6MMLQXygkPUexk7VLMiEamwww9lj2JUpGxK0tfmrLrSqrQuzf/91oqN0klHupa0Eta//1MnP9kTY//NExH0hEqJ4AZpoAdb/bIFpG9ZVgkPqlevVgL5Om9klcaBqFjLNI5RxmxKalnDHdnn/tMvtZtMc5Nevu7ekBeYHYRUIxFqupOqVicgAsbol8L5rXO29ad0wct7aU91A//NExEsbMWoUAdhgAGtcIFUxp9qDKHvUTbKkhEQJE6mXVi1Jo8oDXLJJUKKYp61BEa1KIwLTSqjjMMxqmrTCrDVCpUKoHway4xry8rjHfHNj8lKcSElSCwqRKCoVPQoc//NExDEYwUYAAMJScJaWBIEg0TJiklrylcY+TSjxUioGn15INPLA1iIeCrhMDT+uIpGdeInnmDXM5Iqtj6gaCYhmCuKPnnWWtyq4hGg+XJD7LTKIhLHF2NyeSiqtdXJV//NExCESkbWMAGJElCTXhrOxioh2//oq/KiqjtuUyKn/3KGBgwTMmQEK/xUW/+Kijf6haoWFVUxBTTXYUNNntBQ28i7Tc/GCN14Jo3UyUTcUPONmtTU0xF9lFfBVi5jj//NExCkAAANIAAAAAF5UP5IJoGgaCohx+SxSiw7F3fQXPuixe0kFAeGFnkGIji9wn/yWLDv6Abn5/oiIiIn//T4iJXcO9dw4GLeaF13d3Qrz+Cw8PH+8/4AH8AQnAABH//NExHwAAANIAAAAAIAYe072jOCP8zxCBSiPGaUSFnSoXydOnhkDLKzVx9ejtbWzWtGjkgVKZkCDo6C/j/4q5LBRR++nDZYbjvm2VZFRfG7hdWSy/Oacyi/v//lRJuX+//NExMwH4CQAAPe8AEuqxZZf4+F5V3m/Kb/6/GoAqLBUhFduI/76O27EYo8cL+NGlM7PUnCRQo8y4djSij4dpOEgQEfDtqiL/9nKinb5TBlI7OVP/+YoYGDIfULC4rFR//NExP8Y6f3sAHoGmVEj//rFcVFjVbP4qKcWTEFNRTMuMTAwqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq//NExO4V0LIAAHpMTaqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq//NExOkUQbmYAMGElKqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq