huggingface · xenova · Jun 20, 2023 · Jun 14, 2023 · Jun 14, 2023 · Jun 14, 2023
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -17,6 +17,8 @@
     title: Building an Electron Application
   - local: tutorials/node
     title: Server-side Inference in Node.js
+  - local: tutorials/node-audio-processing
+    title: Server-side Audio Processing in Node.js
   title: Tutorials
 - sections:
   - local: api/transformers

diff --git a/docs/source/tutorials/node-audio-processing.mdx b/docs/source/tutorials/node-audio-processing.mdx
@@ -0,0 +1,102 @@
+
+# Server-side Audio Processing in Node.js
+A major benefit of writing code for the web is that you can access the multitude of APIs that are available in modern browsers. Unfortunately, when writing server-side code, we are not afforded such luxury, so we have to find another way. In this tutorial, we will design a simple Node.js application that uses Transformers.js for speech recognition with [Whisper](https://huggingface.co/Xenova/whisper-tiny.en), and in the process, learn how to process audio on the server.
+
+The main problem we need to solve is that the [Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API) is not available in Node.js, meaning we can't use the [`AudioContext`](https://developer.mozilla.org/en-US/docs/Web/API/AudioContext) class to process audio. So, we will need to install third-party libraries to obtain the raw audio data. For this example, we will only consider `.wav` files, but the same principles apply to other audio formats.
+
+<Tip>
+
+This tutorial will be written as an ES module, but you can easily adapt it to use CommonJS instead. For more information, see the [node tutorial](https://huggingface.co/docs/transformers.js/tutorials/node).
+
+</Tip>
+
+
+**Useful links:**
+- [Source code](https://github.com/xenova/transformers.js/tree/main/examples/node-audio-processing)
+- [Documentation](https://huggingface.co/docs/transformers.js)
+
+
+## Prerequisites
+
+- [Node.js](https://nodejs.org/en/) version 16+
+- [npm](https://www.npmjs.com/) version 7+
+
+
+
+## Getting started
+
+Let's start by creating a new Node.js project and installing Transformers.js via [NPM](https://www.npmjs.com/package/@xenova/transformers):
+
+```bash
+npm init -y
+npm i @xenova/transformers
+```
+
+<Tip>
+
+Remember to add `"type": "module"` to your `package.json` to indicate that your project uses ECMAScript modules.
+
+</Tip>
+
+
+Next, let's install the [`wavefile`](https://www.npmjs.com/package/wavefile) package, which we will use for loading `.wav` files:
+
+```bash
+npm i wavefile
+```
+
+
+## Creating the application
+
+Start by creating a new file called `index.js`, which will be the entry point for our application. Let's also import the necessary modules:
+
+```js
+import { pipeline } from '@xenova/transformers';
+import wavefile from 'wavefile';
+```
+
+For this tutorial, we will use the `Xenova/whisper-tiny.en` model, but feel free to choose one of the other whisper models from the [Hugging Face Hub](https://huggingface.co/models?library=transformers.js&search=whisper). Let's create our pipeline with:
+```js
+let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
+```
+
+Next, let's load an audio file and convert it to the format required by Transformers.js:
+```js
+// Load audio data
+let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
+let buffer = Buffer.from(await fetch(url).then(x => x.arrayBuffer()))
+
+// Read .wav file and convert it to required format
+let wav = new wavefile.WaveFile(buffer);
+wav.toBitDepth('32f'); // Pipeline expects input as a Float32Array
+wav.toSampleRate(16000); // Whisper expects audio with a sampling rate of 16000
+let audioData = wav.getSamples();
+if (Array.isArray(audioData)) {
+    // For this demo, if there are multiple channels for the audio file, we just select the first one.
+    // In practice, you'd probably want to convert all channels to a single channel (e.g., stereo -> mono).
+    audioData = audioData[0];
+}
+```
+
+Finally, let's run the model and measure execution duration.
+```js
+let start = performance.now();
+let output = await transcriber(audioData);
+let end = performance.now();
+console.log(`Execution duration: ${(end - start) / 1000} seconds`);
+console.log(output);
+```
+
+You can now run the application with `node index.js`. Note that when running the script for the first time, it may take a while to download and cache the model. Subsequent requests will use the cached model, and model loading will be much faster.
+
+You should see output similar to:
+```
+Execution duration: 0.6460317999720574 seconds
+{
+  text: ' And so my fellow Americans ask not what your country can do for you. Ask what you can do for your country.'
+}
+```
+
+
+That's it! You've successfully created a Node.js application that uses Transformers.js for speech recognition with Whisper. You can now use this as a starting point for your own applications.
+
diff --git a/docs/source/tutorials/node.md → docs/source/tutorials/node.mdx b/docs/source/tutorials/node.md → docs/source/tutorials/node.mdx
@@ -45,7 +45,7 @@ We'll also create a helper class called `MyClassificationPipeline` control the l
 
 ### ECMAScript modules (ESM)
 
-To indicate that your project uses ECMAScript modules, you need to add `type: "module"` to your `package.json`:
+To indicate that your project uses ECMAScript modules, you need to add `"type": "module"` to your `package.json`:
 
 ```json
 {

diff --git a/examples/node-audio-processing/index.js b/examples/node-audio-processing/index.js
@@ -0,0 +1,28 @@
+import { pipeline } from '@xenova/transformers';
+import wavefile from 'wavefile';
+
+// Load model
+let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
+
+// Load audio data
+let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
+let buffer = Buffer.from(await fetch(url).then(x => x.arrayBuffer()))
+
+// Read .wav file and convert it to required format
+let wav = new wavefile.WaveFile(buffer);
+wav.toBitDepth('32f'); // Pipeline expects input as a Float32Array
+wav.toSampleRate(16000); // Whisper expects audio with a sampling rate of 16000
+let audioData = wav.getSamples();
+if (Array.isArray(audioData)) {
+    // For this demo, if there are multiple channels for the audio file, we just select the first one.
+    // In practice, you'd probably want to convert all channels to a single channel (e.g., stereo -> mono).
+    audioData = audioData[0];
+}
+
+// Run model
+let start = performance.now();
+let output = await transcriber(audioData);
+let end = performance.now();
+console.log(`Execution duration: ${(end - start) / 1000} seconds`);
+console.log(output);
+// { text: ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.' }
diff --git a/examples/node-audio-processing/package.json b/examples/node-audio-processing/package.json
@@ -0,0 +1,17 @@
+{
+  "name": "audio-processing",
+  "version": "1.0.0",
+  "description": "",
+  "main": "index.js",
+  "type": "module",
+  "scripts": {
+    "test": "echo \"Error: no test specified\" && exit 1"
+  },
+  "keywords": [],
+  "author": "",
+  "license": "ISC",
+  "dependencies": {
+    "@xenova/transformers": "^2.2.0",
+    "wavefile": "^11.0.0"
+  }
+}
diff --git a/src/utils/audio.js b/src/utils/audio.js
@@ -23,14 +23,14 @@ export async function read_audio(url, sampling_rate) {
         // Running in node or an environment without AudioContext
         throw Error(
             "Unable to load audio from path/URL since `AudioContext` is not available in your environment. " +
-            "As a result, audio data must be passed directly to the processor. " +
-            "If you are running in node.js, you can use an external library (e.g., https://github.com/audiojs/web-audio-api) to do this."
+            "Instead, audio data should be passed directly to the pipeline/processor. " +
+            "For more information and some example code, see https://huggingface.co/docs/transformers.js/tutorials/node-audio-processing."
         )
     }
 
     const response = await (await getFile(url)).arrayBuffer();
     const audioCTX = new AudioContext({ sampleRate: sampling_rate });
-    if(typeof sampling_rate === 'undefined') {
+    if (typeof sampling_rate === 'undefined') {
         console.warn(`No sampling rate provided, using default of ${audioCTX.sampleRate}Hz.`)
     }
     const decoded = await audioCTX.decodeAudioData(response);
-Original file line number
+Diff line change
@@ Expand Up @@
     ### ECMAScript modules (ESM)
-    To indicate that your project uses ECMAScript modules, you need to add `type: "module"` to your `package.json`:
+    To indicate that your project uses ECMAScript modules, you need to add `"type": "module"` to your `package.json`:
     ```json
     {
@@ Expand Down @@