[Feature request] Real time whisper transcription #405

vjeux · 2023-11-19T22:46:54Z

Real time whisper transcription

Right now the demo works for a recording but does it in one shot. I'd love to be able to do it as I speak. Sadly the interface seems to be accepting only a Float32Array (or arrays of) and not a way to keep feeding it float32 arrays as we receive them from the audio source.

Would be great to be able to do it in a streaming fashion.

Reason for request

I want to build a tool to help recording off voice and want to get a real time transcription to overlay on-top of the existing one to help get a sense of progress.

Thanks <3

xenova · 2023-11-23T20:49:41Z

Real-time transcription will hopefully be possible once webgpu support is added, and we'll definitely revisit (and update the demo) once it is. If someone in the community would like to try modify the whisper-web source code (or provide a basic streaming) implementation, which could be adapted once webgpu is supported, that would be great! 😇

vjeux · 2023-11-23T20:53:14Z

Curious why is it waiting for WebGPU, at least on my macbook pro pre-m1, the decoding is faster than the time of the recording. What would be needed is to be able to feed audio frames in an async way instead of all at once.

xenova · 2023-11-23T21:04:11Z

The major bottleneck at the moment is the encoder, which can take a few seconds to process ~30 seconds. Ideally, if we were to process shorter audio sequences, it would take much shorter, however, this is a hard constraint of the architecture. The initial transformations into log-mel spectrogram space produce 30 second chunks that are fed into the encoder. See here for more discussion on this.

vjeux · 2023-11-28T21:25:05Z

Sorry for the super late reply. That makes sense. Thanks for the link to the discussions. Let me bring more visibility to this issue see if someone is interested in contributing.

luwes · 2023-12-07T18:39:24Z

it's not real time but it might give someone some inspiration for chunked processing.
I created this custom video element that automatically generates captions from the source (mp4 only atm)
repo: https://github.com/luwes/ai-media-element
demo: https://luwes.github.io/ai-media-element/

arpu · 2023-12-26T23:33:47Z

does onnx deprecate the webgl backend?

avie41 · 2024-02-19T13:44:40Z

Hi luwes, xenova,
Did you finally manage to implement realtime transcription with Whisper ? Do you think it is still too early to think about it regarding the required processing time for the encoder when running the inference ?

everythinginjs · 2024-05-17T04:02:46Z

Hi @xenova ,
a must-have feature looking forward any updates ?

xenova · 2024-09-06T10:18:05Z

This is now possible with Transformers.js v3: https://x.com/xenovacom/status/1799110540700078422 🥳
Online demo: https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu

whisper-realtime.mp4

I'll close this issue once Transformers.js v3 is officially out and #545 is merged 🚀

vjeux · 2024-09-06T23:09:15Z

paschaldev · 2024-11-06T12:43:59Z

@xenova I tried the demo, the latency is still poor...

What can be done to improve this? Smaller models? Custom GPUs?

See a video preview here: https://streamable.com/m7oyq1

vishnusureshperumbavoor · 2025-01-06T15:41:35Z

It is showing loading model since a while

vjeux added the enhancement New feature or request label Nov 19, 2023

xenova mentioned this issue Jan 27, 2024

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Merged

13 tasks

xenova closed this as completed in #545 Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Real time whisper transcription #405

[Feature request] Real time whisper transcription #405

vjeux commented Nov 19, 2023 •

edited

Loading

xenova commented Nov 23, 2023

vjeux commented Nov 23, 2023

xenova commented Nov 23, 2023 •

edited

Loading

vjeux commented Nov 28, 2023

luwes commented Dec 7, 2023

arpu commented Dec 26, 2023

avie41 commented Feb 19, 2024 •

edited

Loading

everythinginjs commented May 17, 2024

xenova commented Sep 6, 2024

vjeux commented Sep 6, 2024

paschaldev commented Nov 6, 2024

vishnusureshperumbavoor commented Jan 6, 2025

[Feature request] Real time whisper transcription #405

[Feature request] Real time whisper transcription #405

Comments

vjeux commented Nov 19, 2023 • edited Loading

xenova commented Nov 23, 2023

vjeux commented Nov 23, 2023

xenova commented Nov 23, 2023 • edited Loading

vjeux commented Nov 28, 2023

luwes commented Dec 7, 2023

arpu commented Dec 26, 2023

avie41 commented Feb 19, 2024 • edited Loading

everythinginjs commented May 17, 2024

xenova commented Sep 6, 2024

vjeux commented Sep 6, 2024

paschaldev commented Nov 6, 2024

vishnusureshperumbavoor commented Jan 6, 2025

vjeux commented Nov 19, 2023 •

edited

Loading

xenova commented Nov 23, 2023 •

edited

Loading

avie41 commented Feb 19, 2024 •

edited

Loading