Add AudioInput #4048

MarcSkovMadsen · 2022-10-23T07:07:32Z

Request

Add AudioInput widget for working with streaming audio

Motivation

Looking at the awesome results of the VideoStream example PR it is clear that Panel can do something truly amazing for streaming data sources.

Looking at the reference gallery I notice that we cannot provide users to work with an audio stream.

Adding this would provide something unique to our users.

You can see how Gradio supports it here https://gradio.app/real_time_speech_recognition/ for inspiration.

The text was updated successfully, but these errors were encountered:

MarcSkovMadsen · 2024-07-27T06:52:03Z

+1. Working with speech to text - also in pyodide via transformers.js.py becomes more and more realistic.

For example also implicitly requested in #7021.

Gradio: https://www.gradio.app/docs/gradio/audio

MarcSkovMadsen · 2024-07-28T14:14:20Z

I we wan't to make the AudioInput or Audio pane more engaging we can use https://github.com/katspaugh/wavesurfer.js. This is what the Gradio Audio component is built on top of.

For recording they use the record plugin. See https://wavesurfer.xyz/examples/?record.js.

MarcSkovMadsen · 2024-07-28T16:05:15Z

Here is a very rough and basic implementation.

script.javascript

const startRecording = `Start Recording`
const stopRecording = `Stop Recording`

class AudioStreamWidget {
    constructor(model) {
        this.audioContext = new (window.AudioContext || window.webkitAudioContext)();
        this.stream = null;
        this.source = null;
        this.mediaRecorder = null;
        this.chunks = [];
        this.model = model
    }

    async start() {
        try {
            this.stream = await navigator.mediaDevices.getUserMedia({ audio: true });
            this.source = this.audioContext.createMediaStreamSource(this.stream);

            this.mediaRecorder = new MediaRecorder(this.stream);
            this.mediaRecorder.ondataavailable = (event) => {
                if (event.data.size > 0) {
                    this.chunks.push(event.data);
                }
        };
        this.mediaRecorder.onstop = this.onStopRecording.bind(this);

        this.mediaRecorder.start();
        console.log('Audio stream started and recording');
        } catch (err) {
        console.error('Error accessing audio stream', err);
        }
    }

    stop() {
        if (this.mediaRecorder && this.mediaRecorder.state !== 'inactive') {
        this.mediaRecorder.stop();
        }

        if (this.stream) {
        this.stream.getTracks().forEach(track => track.stop());
        this.stream = null;
        console.log('Audio stream stopped');
        }
    }

    onStopRecording() {
        // const blob = new Blob(this.chunks, { type: 'audio/webm' });
        const blob = new Blob(this.chunks, { type: 'audio/webm' });
        this.chunks = [];

        // Create a download link for the audio file
        this.blobToBase64(blob).then(base64 => {
            this.sendToBackend(base64);
            
            console.log('Recording sent to server');
          });
        
        
    }

    blobToBase64(blob) {
        return new Promise((resolve, reject) => {
            const reader = new FileReader();
            reader.readAsDataURL(blob);
            reader.onloadend = () => {
                resolve(reader.result);
            };
            reader.onerror = error => reject(error);
        });
    }

    sendToBackend(base64) {
        this.model._data_url = base64
    }
}

export function render({ model }) {
  let audio = new AudioStreamWidget(model);
  let state = "start"
  let btn = document.createElement("button");
  btn.innerHTML = startRecording;
  btn.addEventListener("click", () => {
    console.log(btn.innerHTML)
    if (state == "start") {
        audio.start();
        btn.innerHTML = stopRecording;
        state = "stop"
    } else {
        audio.stop();
        btn.innerHTML = startRecording;
        state = "start"
    }
  });
  return btn
}

script.py

import panel as pn
import param
from base64 import b64decode
import numpy as np
from panel.custom import JSComponent
import tempfile

pn.extension()

class AudioInput(JSComponent):
    value = param.Parameter()
    
    format = param.Selector(default='webm', objects=['webm'], doc="The name of the audio format to provide the value in.")
    
    _data_url = param.Parameter()

    _esm = 'script.js'

    @param.depends("_data_url", watch=True)
    def _update_value(self):
        data_url = self._data_url
        data = data_url.split(",")[1]
        self.value = b64decode(data)

AudioInput = AudioInput()


def download_webm_file(value):
    if not value:
        return "No audio available"
    
    return f'<a id="download-link" href="{value}" download="sound.webm">Download File</a>'

def audio_value(value):
    if not value:
        return None
    else:
        with tempfile.NamedTemporaryFile(delete=False, suffix=".webm") as temp_file:
            temp_file.write(value)
            temp_file_name = temp_file.name
        return temp_file_name

audio = pn.pane.Audio(pn.bind(audio_value, AudioInput.param.value), loop=True, width=300, height=50)

pn.Column(
    AudioInput,
    pn.bind(download_webm_file, AudioInput.param._data_url),
    audio,
).servable()

Notes

An AudioInput widget should probably align with the FileInput and the value parameter should be a bytes parameter with the raw value. Its should work from 1.5.0 due to Ensure Bytes default is deserialized correctly #7032.
In the browser you get audio in webm format. The user would probably like another format. If we are to convert for the user we will probably have to use ffmpeg on server or in browser. Both can be hard to get installed or working.
Its not easy to play the recorded audio in the Audio pane. See Make it easy to play audio input in Audio pane #7035. I believe the media widgets and panes could use an overhaul inspired by Gradio. Their input/ output formats are aligned and they are in formats people are using for ML or DL.
I transfer it as a _data_url and then decode to a value. This value is then sent to browser. Its not documented how we can avoid that. Should be.
We also need to figure out if this should work like an audio record (above) or whether it should stream data enabling "live" transformations and updates of the input. Gradio seems to be able to do that for Video.

MarcSkovMadsen · 2024-07-28T18:21:34Z

Gradio has a webm to wav function here https://github.com/gradio-app/gradio/blob/main/js/audio/shared/audioBufferToWav.ts.

ahuang11 · 2024-07-29T20:11:36Z

Just to clarify, this is separate from SpeechToText, i.e. AudioInput is agnostic to speech, and it records any sound?

MarcSkovMadsen · 2024-07-30T01:06:19Z

Yes. Should be able to record Audio in some Audio format like webm, wav, mp3, numpy, torch tensor or similar.

Use Python to analyse or transform.

Use Panel to display the transformed result whether its Audio or something Else.

MarcSkovMadsen · 2024-07-30T01:08:10Z

Whether its should record Full file, stream intermediære chunks or be able to do both is not clear to me.

MarcSkovMadsen · 2024-07-30T01:09:23Z

We need similar functionality for Video. The current VideoStream takes pictures which is something Else.

MarcSkovMadsen · 2024-10-06T05:15:58Z

I believe we will see more and more Audio and Speech use cases. For example OpenAI recently released https://openai.com/index/introducing-the-realtime-api/.

MarcSkovMadsen added the type: feature A major new feature label Oct 23, 2022

MarcSkovMadsen added this to the Wishlist milestone Oct 23, 2022

MarcSkovMadsen changed the title ~~Add AudioStream~~ Add AudioInput Jul 28, 2024

MarcSkovMadsen mentioned this issue Oct 6, 2024

Add Audio Input Widget #7363

Draft

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AudioInput #4048

Add AudioInput #4048

MarcSkovMadsen commented Oct 23, 2022 •

edited

Loading

MarcSkovMadsen commented Jul 27, 2024 •

edited

Loading

MarcSkovMadsen commented Jul 28, 2024

MarcSkovMadsen commented Jul 28, 2024 •

edited

Loading

MarcSkovMadsen commented Jul 28, 2024

ahuang11 commented Jul 29, 2024

MarcSkovMadsen commented Jul 30, 2024 •

edited

Loading

MarcSkovMadsen commented Jul 30, 2024

MarcSkovMadsen commented Jul 30, 2024 •

edited

Loading

MarcSkovMadsen commented Oct 6, 2024

Add AudioInput #4048

Add AudioInput #4048

Comments

MarcSkovMadsen commented Oct 23, 2022 • edited Loading

Request

Motivation

MarcSkovMadsen commented Jul 27, 2024 • edited Loading

MarcSkovMadsen commented Jul 28, 2024

MarcSkovMadsen commented Jul 28, 2024 • edited Loading

MarcSkovMadsen commented Jul 28, 2024

ahuang11 commented Jul 29, 2024

MarcSkovMadsen commented Jul 30, 2024 • edited Loading

MarcSkovMadsen commented Jul 30, 2024

MarcSkovMadsen commented Jul 30, 2024 • edited Loading

MarcSkovMadsen commented Oct 6, 2024

MarcSkovMadsen commented Oct 23, 2022 •

edited

Loading

MarcSkovMadsen commented Jul 27, 2024 •

edited

Loading

MarcSkovMadsen commented Jul 28, 2024 •

edited

Loading

MarcSkovMadsen commented Jul 30, 2024 •

edited

Loading

MarcSkovMadsen commented Jul 30, 2024 •

edited

Loading