Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Add tutorial + example app for server-side whisper #147

Merged
merged 7 commits into from
Jun 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
title: Building an Electron Application
- local: tutorials/node
title: Server-side Inference in Node.js
- local: tutorials/node-audio-processing
title: Server-side Audio Processing in Node.js
title: Tutorials
- sections:
- local: api/transformers
Expand Down
102 changes: 102 additions & 0 deletions docs/source/tutorials/node-audio-processing.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@

# Server-side Audio Processing in Node.js
A major benefit of writing code for the web is that you can access the multitude of APIs that are available in modern browsers. Unfortunately, when writing server-side code, we are not afforded such luxury, so we have to find another way. In this tutorial, we will design a simple Node.js application that uses Transformers.js for speech recognition with [Whisper](https://huggingface.co/Xenova/whisper-tiny.en), and in the process, learn how to process audio on the server.

The main problem we need to solve is that the [Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API) is not available in Node.js, meaning we can't use the [`AudioContext`](https://developer.mozilla.org/en-US/docs/Web/API/AudioContext) class to process audio. So, we will need to install third-party libraries to obtain the raw audio data. For this example, we will only consider `.wav` files, but the same principles apply to other audio formats.

<Tip>

This tutorial will be written as an ES module, but you can easily adapt it to use CommonJS instead. For more information, see the [node tutorial](https://huggingface.co/docs/transformers.js/tutorials/node).

</Tip>


**Useful links:**
- [Source code](https://github.com/xenova/transformers.js/tree/main/examples/node-audio-processing)
- [Documentation](https://huggingface.co/docs/transformers.js)


## Prerequisites

- [Node.js](https://nodejs.org/en/) version 16+
- [npm](https://www.npmjs.com/) version 7+



## Getting started

Let's start by creating a new Node.js project and installing Transformers.js via [NPM](https://www.npmjs.com/package/@xenova/transformers):

```bash
npm init -y
npm i @xenova/transformers
```

<Tip>

Remember to add `"type": "module"` to your `package.json` to indicate that your project uses ECMAScript modules.

</Tip>


Next, let's install the [`wavefile`](https://www.npmjs.com/package/wavefile) package, which we will use for loading `.wav` files:

```bash
npm i wavefile
```


## Creating the application

Start by creating a new file called `index.js`, which will be the entry point for our application. Let's also import the necessary modules:
xenova marked this conversation as resolved.
Show resolved Hide resolved

```js
import { pipeline } from '@xenova/transformers';
import wavefile from 'wavefile';
```

For this tutorial, we will use the `Xenova/whisper-tiny.en` model, but feel free to choose one of the other whisper models from the [Hugging Face Hub](https://huggingface.co/models?library=transformers.js&search=whisper). Let's create our pipeline with:
```js
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
```

Next, let's load an audio file and convert it to the format required by Transformers.js:
```js
// Load audio data
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let buffer = Buffer.from(await fetch(url).then(x => x.arrayBuffer()))

// Read .wav file and convert it to required format
let wav = new wavefile.WaveFile(buffer);
wav.toBitDepth('32f'); // Pipeline expects input as a Float32Array
wav.toSampleRate(16000); // Whisper expects audio with a sampling rate of 16000
let audioData = wav.getSamples();
if (Array.isArray(audioData)) {
// For this demo, if there are multiple channels for the audio file, we just select the first one.
// In practice, you'd probably want to convert all channels to a single channel (e.g., stereo -> mono).
audioData = audioData[0];
}
```

Finally, let's run the model and measure execution duration.
```js
let start = performance.now();
let output = await transcriber(audioData);
let end = performance.now();
console.log(`Execution duration: ${(end - start) / 1000} seconds`);
console.log(output);
```

You can now run the application with `node index.js`. Note that when running the script for the first time, it may take a while to download and cache the model. Subsequent requests will use the cached model, and model loading will be much faster.

You should see output similar to:
```
Execution duration: 0.6460317999720574 seconds
{
text: ' And so my fellow Americans ask not what your country can do for you. Ask what you can do for your country.'
}
```

xenova marked this conversation as resolved.
Show resolved Hide resolved

That's it! You've successfully created a Node.js application that uses Transformers.js for speech recognition with Whisper. You can now use this as a starting point for your own applications.

Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ We'll also create a helper class called `MyClassificationPipeline` control the l

### ECMAScript modules (ESM)

To indicate that your project uses ECMAScript modules, you need to add `type: "module"` to your `package.json`:
To indicate that your project uses ECMAScript modules, you need to add `"type": "module"` to your `package.json`:

```json
{
Expand Down
28 changes: 28 additions & 0 deletions examples/node-audio-processing/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import { pipeline } from '@xenova/transformers';
import wavefile from 'wavefile';

// Load model
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');

// Load audio data
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let buffer = Buffer.from(await fetch(url).then(x => x.arrayBuffer()))

// Read .wav file and convert it to required format
let wav = new wavefile.WaveFile(buffer);
wav.toBitDepth('32f'); // Pipeline expects input as a Float32Array
wav.toSampleRate(16000); // Whisper expects audio with a sampling rate of 16000
let audioData = wav.getSamples();
if (Array.isArray(audioData)) {
// For this demo, if there are multiple channels for the audio file, we just select the first one.
// In practice, you'd probably want to convert all channels to a single channel (e.g., stereo -> mono).
audioData = audioData[0];
}

// Run model
let start = performance.now();
let output = await transcriber(audioData);
let end = performance.now();
console.log(`Execution duration: ${(end - start) / 1000} seconds`);
console.log(output);
// { text: ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.' }
17 changes: 17 additions & 0 deletions examples/node-audio-processing/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"name": "audio-processing",
"version": "1.0.0",
"description": "",
"main": "index.js",
"type": "module",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"keywords": [],
"author": "",
"license": "ISC",
"dependencies": {
"@xenova/transformers": "^2.2.0",
"wavefile": "^11.0.0"
}
}
6 changes: 3 additions & 3 deletions src/utils/audio.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@ export async function read_audio(url, sampling_rate) {
// Running in node or an environment without AudioContext
throw Error(
"Unable to load audio from path/URL since `AudioContext` is not available in your environment. " +
"As a result, audio data must be passed directly to the processor. " +
"If you are running in node.js, you can use an external library (e.g., https://github.com/audiojs/web-audio-api) to do this."
"Instead, audio data should be passed directly to the pipeline/processor. " +
"For more information and some example code, see https://huggingface.co/docs/transformers.js/tutorials/node-audio-processing."
)
}

const response = await (await getFile(url)).arrayBuffer();
const audioCTX = new AudioContext({ sampleRate: sampling_rate });
if(typeof sampling_rate === 'undefined') {
if (typeof sampling_rate === 'undefined') {
console.warn(`No sampling rate provided, using default of ${audioCTX.sampleRate}Hz.`)
}
const decoded = await audioCTX.decodeAudioData(response);
Expand Down