-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whisper on webGPU? #100
Comments
As I understand, it's simply a matter of changing the Execution provider now to JSEP. The C++ port uses GGML format for the model and this repo uses onnx models alongside onnxruntime to run infrence. Both implementations are different. And with the WebGPU support for onnxruntime (check this PR: [js/web] WebGPU backend via JSEP #14579) which was merged today and official release build will come soon enough, I believe we don't have to worry about CUDA or DirectML endpoints, JSEP does the work for us. It's only a matter of updating the onnxruntime dependency and using JSEP for execution provider. @xenova correct me if I'm wrong. |
Yep, that's correct! It should be as simple as changing the execution provided to Hopefully they will make the release soon, but in the meantime, I'll do some testing by building the main branch locally. |
@DK013 & @xenova thank you for the clarification! I would like to find a way to utilize the GPUs on edge devices (Android mobile) for inference. As far i understand (as for now) webGPU works on Windows & iOS (my assumption based on this blog post), so we have to wait until webGPU targets the Android devices too? Or am I simply wrong and onnxruntime won't be the way for edge devices? best regards |
Yes, you are correct. WebGPU would need to be available in your browser, as onnxruntime just uses the api provided by the browser. That said, you might not have to wait for very long. As stated in the blog post you linked: "This initial release of WebGPU is available on ChromeOS, macOS, and Windows. Support for other platforms is coming later this year." If you'd like to test while you develop (so you can be ready when it releases fully), you can test using Chrome canary. As demoed here, some users have already got webgpu running on their android devices with this browser (which is just an experimental version of chrome) |
@xenova how we can use gpu power when we use nodejs ? i try to build a local server with node, all works but very slow on an AMD 5950X , i would like to use my RTX 4070TI |
@xenova, are there any news? Will we be allowed to use webgpu with transformers.js any time soon? |
AFAIU onnx runtime's support for WebGPU is still pretty minimal/experimental, so likely isn't able to run Whisper today Overview issue is here: microsoft/onnxruntime#15796 There doesn't seem to be much up-to-date detailed documentation about the current status publicly available, but as of May many operators were still yet to be ported: microsoft/onnxruntime#15952 |
ort-web on webgpu has now good ops coverage and we can run most models that transformers.js supports. whisper is fine, it is part of our test suite. |
thanks for the update @guschmue ! Is there a GH issue for the problem you're describing? Is it this? microsoft/onnxruntime#17373 |
That issue contains a couple of problems, like missing ops resulted in cross device copies and missing io-bindings resulted in a lot of cross device copies. I think we fix most of those. But this decoder issue has been in this too. Ie the io-bindings should have gained much more than they did. |
What about node.js? Will webGPU/GPU acceleration be available on server/desktop side w/o browser? |
@xenova I am curious to try. Do you have builds with WebGPU ? I've built onnxruntime with the jsep option but I am not entirely sure what are the spots to change in |
Additionally another optimization should be done: STFT |
For anyone coming here who didn't see it yet, there is webGPU support now thanks to Xenova's efforts described here Code in this branch: https://github.com/xenova/whisper-web/tree/experimental-webgpu |
There is some experimental code path in dawn that one could use to make onnxruntime work with webgpu on node.js. |
Somewhat related to this thread.
Is it within scope to implement a webGPU accelerated version of Whisper?
Not sure if this helps, but there is a C port for Whisper wirh CPU implementation, and as mentioned in this discussion, the main thing that needs to be offloaded to the GPU is the GGML_OP_MUL_MAT operator.
The text was updated successfully, but these errors were encountered: