Whisper in web-llm with WebGPU? #68

sandorkonya · 2023-04-25T09:36:39Z

Great Repository!

Is it within your scope to implement a webGPU accelerated version of Whisper?

Not sure if this helps, but there is a C port for Whisper wirh CPU implementation, and as mentioned in this discussion, the main thing that needs to be offloaded to the GPU is the GGML_OP_MUL_MAT operator.

thy

tqchen · 2023-04-25T14:17:22Z

great suggestion, yes this is something that we can push for

sandorkonya · 2023-04-25T18:31:39Z

@tqchen my ultimate goal would be to get it run the most efficient way on android edge device.

Although there is already a solution in the onnx framework onnx framework, based on the recent merge, but i am not sure when it will be usable on android.

There were some who tried with GPU delegates, but no success yet.

Any idea how one could solve it on the edge (Android) device?

DustinBrett · 2023-04-26T04:54:27Z

There is also a demo of Whisper running via WebAssembly in that repo. https://github.com/ggerganov/whisper.cpp/tree/master/examples/talk.wasm

sandorkonya · 2023-04-26T07:33:03Z

There is also a demo of Whisper running via WebAssembly in that repo. https://github.com/ggerganov/whisper.cpp/tree/master/examples/talk.wasm

Yes, it runs on CPU. I hope, that with a GPU version one could reach real time inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper in web-llm with WebGPU? #68

Whisper in web-llm with WebGPU? #68

sandorkonya commented Apr 25, 2023

tqchen commented Apr 25, 2023

sandorkonya commented Apr 25, 2023

DustinBrett commented Apr 26, 2023

sandorkonya commented Apr 26, 2023

Whisper in web-llm with WebGPU? #68

Whisper in web-llm with WebGPU? #68

Comments

sandorkonya commented Apr 25, 2023

tqchen commented Apr 25, 2023

sandorkonya commented Apr 25, 2023

DustinBrett commented Apr 26, 2023

sandorkonya commented Apr 26, 2023