-
Notifications
You must be signed in to change notification settings - Fork 2k
Adds Speech-to-Speech Translation Sample. #777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
beccasaurus
merged 11 commits into
GoogleCloudPlatform:master
from
ricalo:feature/speech
Nov 2, 2018
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
b18aad5
Adds Speech-to-Speech Translation Sample.
ricalo de51aba
Merge branch 'master' into feature/speech
fe59bf9
Merge branch 'master' into feature/speech
fhinkel 054765b
Replaces Ava.js with the Mocha test framework.
ricalo 1b20d67
Merge branch 'feature/speech' of github.com:ricalo/nodejs-docs-sample…
ricalo 04cd760
Merge branch 'master' into feature/speech
beccasaurus e8b99e5
Test creates GCS bucket if it doesn't exist.
ricalo 99b6b0f
Provides a random bucket name for tests.
ricalo 2293dba
Merge branch 'feature/speech' of github.com:ricalo/nodejs-docs-sample…
ricalo 3a196b8
Test goes back to create and delete random bucket.
ricalo e9ef4ed
Implements test recommendation about findOrCreate.
ricalo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| # Format: //devtools/kokoro/config/proto/build.proto | ||
|
|
||
| # Set the folder in which the tests are run | ||
| env_vars: { | ||
| key: "PROJECT" | ||
| value: "functions/speech-to-speech" | ||
| } | ||
|
|
||
| # Tell the trampoline which build file to use. | ||
| env_vars: { | ||
| key: "TRAMPOLINE_BUILD_FILE" | ||
| value: "github/nodejs-docs-samples/.kokoro/build.sh" | ||
| } | ||
|
|
||
| # Environment values for tests that Kokoro doesn't provide natively | ||
| env_vars: { | ||
| key: "OUTPUT_BUCKET" | ||
| value: "6fa8d42c-a0f5-474e-a52b-687eb54c3f01" | ||
| } | ||
|
|
||
| env_vars: { | ||
| key: "SUPPORTED_LANGUAGE_CODES" | ||
| value: "en,es,fr" | ||
| } | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| # This file specifies files that are *not* uploaded to Google Cloud Platform | ||
| # using gcloud. It follows the same syntax as .gitignore, with the addition of | ||
| # "#!include" directives (which insert the entries of the given .gitignore-style | ||
| # file at that point). | ||
| # | ||
| # For more information, run: | ||
| # $ gcloud topic gcloudignore | ||
| # | ||
| .gcloudignore | ||
| # If you would like to upload your .git directory, .gitignore file or files | ||
| # from your .gitignore file, remove the corresponding line | ||
| # below: | ||
| .git | ||
| .gitignore | ||
|
|
||
| node_modules |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| v6.14.4 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,127 @@ | ||
| # Speech-to-Speech Translation Sample | ||
|
|
||
| The Speech-to-Speech Translation sample uses the [Speech-to-Text][1], | ||
| [Translation][2], and [Text-to-Speech][3] APIs to translate an audio message to | ||
| another language. The sample uses [Google Cloud Functions][4] to wrap up the | ||
| calls to the APIs to show how you can incrementally add features to your | ||
| existing apps, whether they're hosted on Google Cloud Platform or not. | ||
| The sample receives the input audio message as b64-encoded text and drops the | ||
| translated audio messages to [Google Cloud Storage][5] where existing apps can | ||
| retrieve them. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| Before using the sample app, make sure that you have the following | ||
| prerequisites: | ||
|
|
||
| * A [Google Cloud Platform][0] (GCP) account with the following APIs enabled: | ||
| * Cloud Speech API | ||
| * Cloud Text-to-Speech API | ||
| * Cloud Translation API | ||
| * An API key file for a service account that has permissions to use the APIs | ||
| mentioned in the previous prerequisite. For more information, see [Using API | ||
| Keys][8]. | ||
| * [Node Version Manager][6] (NVM) | ||
|
|
||
| ## Configuring the sample | ||
|
|
||
| To configure the sample you must declare the required environment variables, set | ||
| up NVM, and install the [Cloud Functions Node.js emulator][7]. | ||
|
|
||
| The sample requires the following environment variables: | ||
|
|
||
| * `GCF_REGION`: The region where your Cloud Function is deployed. For available | ||
| regions, see [Cloud Functions Locations][11] in the Functions documentation. | ||
| * `GOOGLE_CLOUD_PROJECT`: The project id of your GCP project. | ||
| * `BASE_URL`: The URL that the Cloud Functions emulator uses to serve requests. | ||
| * `OUTPUT_BUCKET`: A bucket that the sample uses to drop translated files. The | ||
| test script creates this buckt if it doesn't exist. | ||
| * `GOOGLE_APPLICATION_CREDENTIALS`: The path to your API key file. | ||
| * `SUPPORTED_LANGUAGE_CODES`: Comma-separated list of languages that the sample | ||
| translates messages to. | ||
|
|
||
| Use the following commands to declare the required environment variables: | ||
|
|
||
| ``` | ||
| export GCF_REGION=us-central1 | ||
| export GOOGLE_CLOUD_PROJECT=[your-GCP-project-id] | ||
| export BASE_URL=http://localhost:8010/$GOOGLE_CLOUD_PROJECT/$GCF_REGION | ||
| export OUTPUT_BUCKET=[your-Google-Cloud-Storage-bucket] | ||
| export GOOGLE_APPLICATION_CREDENTIALS=[path-to-your-API-key-file] | ||
| export SUPPORTED_LANGUAGE_CODES=en,es,fr | ||
| ``` | ||
|
|
||
| The sample includes an `.nvmrc` file that declares the version of Node.js that | ||
| you should use to run the app. | ||
| Run the following commands to set up NVM to work with the Node.js version | ||
| declared in the `.nvmrc` file: | ||
|
|
||
| ``` | ||
| nvm install && nvm use | ||
| ``` | ||
|
|
||
| Run the following commands to install and start the Cloud Functions emulator: | ||
|
|
||
| ``` | ||
| npm install -g @google-cloud/functions-emulator | ||
| functions-emulator start | ||
| ``` | ||
|
|
||
| ## Running the tests | ||
|
|
||
| The test script performs the following tasks: | ||
|
|
||
| 1. Runs the linter. | ||
| 1. Deploys the function to the emulator. | ||
| 1. Runs tests that don't perform any calls to the Google Cloud APIs. | ||
| 1. Creates the output bucket if it doesn't exist. | ||
| 1. Runs tests that perform calls to the Google Cloud APIs and drop the | ||
| translated messages to the bucket. | ||
| 1. Deletes the files created during the tests. | ||
|
|
||
| To run the tests, use the following commands from the | ||
| `functions/speech-to-speech` folder: | ||
|
|
||
| ``` | ||
| npm install && npm test | ||
| ``` | ||
|
|
||
| ## Sending a request to the emulator | ||
|
|
||
| Once the tests have run, you can send a request to the emulator using an HTTP | ||
| tool, such as [curl][10]. Before sending a request, make sure that the | ||
| `OUTPUT_BUCKET` environment variable points to an existing bucket. If you update | ||
| the environment variables, you must restart the emulator to apply the new | ||
| values. Use the following commands to restart the emulator: | ||
|
|
||
| ``` | ||
| functions-emulator restart | ||
| ``` | ||
|
|
||
| The sample includes a `test/request-body.json` file that includes a JSON object | ||
| that represents the body of a valid request, including the base64-encoded audio | ||
| message. Run the following command to send a request to the emulator: | ||
|
|
||
| ``` | ||
| curl --request POST --header "Content-Type:application/json" \ | ||
| --data @test/request-body.json $BASE_URL/speechTranslate | ||
| ``` | ||
|
|
||
| The command returns a JSON object with information about the translated message. | ||
| You can also see the logs using the following command: | ||
|
|
||
| ``` | ||
| functions-emulator logs read | ||
| ``` | ||
|
|
||
| [0]: https://cloud.google.com | ||
| [1]: https://cloud.google.com/speech-to-text/ | ||
| [2]: https://cloud.google.com/translate/ | ||
| [3]: https://cloud.google.com/text-to-speech/ | ||
| [4]: https://cloud.google.com/functions/ | ||
| [5]: https://cloud.google.com/storage/ | ||
| [6]: https://github.com/creationix/nvm | ||
| [7]: https://cloud.google.com/functions/docs/emulator | ||
| [8]: https://cloud.google.com/docs/authentication/api-keys | ||
| [10]: https://curl.haxx.se/ | ||
| [11]: https://cloud.google.com/functions/docs/locations |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,177 @@ | ||
| /** | ||
| * Copyright 2018, Google, LLC | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| 'use strict'; | ||
|
|
||
| // This sample uses the UUID library to generate the output filename. | ||
| const uuid = require('uuid/v4'); | ||
|
|
||
| const googleCloudProject = process.env.GOOGLE_CLOUD_PROJECT; | ||
| const supportedLanguageCodes = process.env.SUPPORTED_LANGUAGE_CODES.split(','); | ||
| const outputBucket = process.env.OUTPUT_BUCKET; | ||
| const outputAudioEncoding = 'MP3'; | ||
| const voiceSsmlGender = 'NEUTRAL'; | ||
| // Declare the API clients as global variables to allow them to initiaze at cold start. | ||
| const speechToTextClient = getSpeechToTextClient(); | ||
| const textTranslationClient = getTextTranslationClient(); | ||
| const textToSpeechClient = getTextToSpeechClient(); | ||
| const storageClient = getStorageClient(); | ||
|
|
||
| exports.speechTranslate = (request, response) => { | ||
| let responseBody = {}; | ||
|
|
||
| validateRequest(request).then(() => { | ||
| const inputEncoding = request.body.encoding; | ||
| const inputSampleRateHertz = request.body.sampleRateHertz; | ||
| const inputLanguageCode = request.body.languageCode; | ||
| const inputAudioContent = request.body.audioContent; | ||
|
|
||
| console.log(`Input encoding: ${inputEncoding}`); | ||
| console.log(`Input sample rate hertz: ${inputSampleRateHertz}`); | ||
| console.log(`Input language code: ${inputLanguageCode}`); | ||
|
|
||
| return callSpeechToText( | ||
| inputAudioContent, | ||
| inputEncoding, | ||
| inputSampleRateHertz, | ||
| inputLanguageCode | ||
| ); | ||
| }).then(data => { | ||
| const sttResponse = data[0]; | ||
| // The data object contains one or more recognition alternatives ordered by accuracy. | ||
| const transcription = sttResponse.results | ||
| .map(result => result.alternatives[0].transcript) | ||
| .join('\n'); | ||
| responseBody.transcription = transcription; | ||
| responseBody.gcsBucket = outputBucket; | ||
|
|
||
| let translations = []; | ||
| supportedLanguageCodes.forEach(languageCode => { | ||
| let translation = { languageCode: languageCode }; | ||
| const filenameUUID = uuid(); | ||
| const filename = filenameUUID + '.' + outputAudioEncoding.toLowerCase(); | ||
| callTextTranslation(languageCode, transcription).then(data => { | ||
| const textTranslation = data[0]; | ||
| translation.text = textTranslation; | ||
| return callTextToSpeech(languageCode, textTranslation); | ||
| }).then(data => { | ||
| const path = languageCode + '/' + filename; | ||
| return uploadToCloudStorage(path, data[0].audioContent); | ||
| }).then(() => { | ||
| console.log(`Successfully translated input to ${languageCode}.`); | ||
| translation.gcsPath = languageCode + '/' + filename; | ||
| translations.push(translation); | ||
| if (translations.length === supportedLanguageCodes.length) { | ||
| responseBody.translations = translations; | ||
| console.log(`Response: ${JSON.stringify(responseBody)}`); | ||
| response.status(200).send(responseBody); | ||
| } | ||
| }).catch(error => { | ||
| console.error(`Partial error in translation to ${languageCode}: ${error}`); | ||
| translation.error = error.message; | ||
| translations.push(translation); | ||
| if (translations.length === supportedLanguageCodes.length) { | ||
| responseBody.translations = translations; | ||
| console.log(`Response: ${JSON.stringify(responseBody)}`); | ||
| response.status(200).send(responseBody); | ||
| } | ||
| }); | ||
| }); | ||
| }).catch(error => { | ||
| console.error(error); | ||
| response.status(400).send(error.message); | ||
| }); | ||
| }; | ||
|
|
||
| function callSpeechToText (audioContent, encoding, sampleRateHertz, languageCode) { | ||
| console.log(`Processing speech from audio content in ${languageCode}.`); | ||
|
|
||
| const request = { | ||
| config: { | ||
| encoding: encoding, | ||
| sampleRateHertz: sampleRateHertz, | ||
| languageCode: languageCode | ||
| }, | ||
| audio: { content: audioContent } | ||
| }; | ||
|
|
||
| return speechToTextClient.recognize(request); | ||
| } | ||
|
|
||
| function callTextTranslation (targetLangCode, data) { | ||
| console.log(`Translating text to ${targetLangCode}: ${data}`); | ||
|
|
||
| return textTranslationClient.translate(data, targetLangCode); | ||
| } | ||
|
|
||
| function callTextToSpeech (targetLocale, data) { | ||
| console.log(`Converting to speech in ${targetLocale}: ${data}`); | ||
|
|
||
| const request = { | ||
| input: { text: data }, | ||
| voice: { languageCode: targetLocale, ssmlGender: voiceSsmlGender }, | ||
| audioConfig: { audioEncoding: outputAudioEncoding } | ||
| }; | ||
|
|
||
| return textToSpeechClient.synthesizeSpeech(request); | ||
| } | ||
|
|
||
| function uploadToCloudStorage (path, contents) { | ||
| console.log(`Uploading audio file to ${path}`); | ||
|
|
||
| return storageClient | ||
| .bucket(outputBucket) | ||
| .file(path) | ||
| .save(contents); | ||
| } | ||
|
|
||
| function validateRequest (request) { | ||
| return new Promise(function (resolve, reject) { | ||
| if (!request.body.encoding) { | ||
| reject(new Error('Invalid encoding.')); | ||
| } | ||
| if (!request.body.sampleRateHertz || isNaN(request.body.sampleRateHertz)) { | ||
| reject(new Error('Sample rate hertz must be numeric.')); | ||
| } | ||
| if (!request.body.languageCode) { | ||
| reject(new Error('Invalid language code.')); | ||
| } | ||
| if (!request.body.audioContent) { | ||
| reject(new Error('Invalid audio content.')); | ||
| } | ||
|
|
||
| resolve(); | ||
| }); | ||
| } | ||
|
|
||
| function getSpeechToTextClient () { | ||
| const speech = require('@google-cloud/speech'); | ||
| return new speech.SpeechClient(); | ||
| } | ||
|
|
||
| function getTextTranslationClient () { | ||
| const { Translate } = require('@google-cloud/translate'); | ||
| return new Translate({ projectId: googleCloudProject }); | ||
| } | ||
|
|
||
| function getTextToSpeechClient () { | ||
| const textToSpeech = require('@google-cloud/text-to-speech'); | ||
| return new textToSpeech.TextToSpeechClient(); | ||
| } | ||
|
|
||
| function getStorageClient () { | ||
| const { Storage } = require('@google-cloud/storage'); | ||
| return new Storage({ projectId: googleCloudProject }); | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.