Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Update ONNX runtime to version 1.15.1 #298

Closed
ocavue opened this issue Sep 12, 2023 · 2 comments · Fixed by #545
Closed

[Feature request] Update ONNX runtime to version 1.15.1 #298

ocavue opened this issue Sep 12, 2023 · 2 comments · Fixed by #545
Labels
enhancement New feature or request

Comments

@ocavue
Copy link
Contributor

ocavue commented Sep 12, 2023

Name of the feature

Currently, the latest transformers.js is dependent on onnxruntime-web v1.14.0. Since onnxruntime-web v1.15.1 has been released for months, I'd love to see if it's possible to update this dependency version.

Reason for request

ONNX runtime has added a preview support for WebGPU in version 1.15.0. I'd love to try it with transfomers.js. I understand that there may be additional work required for WebGPU support, but updating the ONNX runtime version could be a good starting point I guess. Really appreciate your work here.

Additional context

N/A

@ocavue ocavue added the enhancement New feature or request label Sep 12, 2023
@xenova
Copy link
Collaborator

xenova commented Sep 12, 2023

Hi there 👋 The main reason we haven't yet updated to 1.15 (or above) is because the WebGPU support at that stage was very incomplete and was not available as a simple "execution provider".

Now, the support for it is nearing completion, and quite a few encoder-only models work quite well with it! However, for models with decoders, there is a performance bottleneck which is a result of a lack of IO-binding. See here.

See here to keep updated with the progress,

@dakenf
Copy link

dakenf commented Sep 12, 2023

I think the next release will be huge since it also includes GPU support for node. And i'll add LLM support to Attention operator. It does not only increase speed but also reduced VRAM usage for StableDiffusion from 10gb to ~5.6 with fp32 model
Also, fp16 support is almost done, so stay tuned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants