-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNX support #165
Comments
FWIW, I recently pushed |
Oh wow, that's great! I see you landed pykeio/ort@092907a a bit after I created this issue :) I'm working on getting the GPT2 example working from WASM and I'll comment with how it goes! Is there a WebGPU or WebGL execution provider btw? The ONNX Runtime website says:
|
I couldn't find any documentation on how to actually use either backend. I think it may be automatically available just by compiling with |
Thanks! It looks like a WASM build with
I am using |
I got a simple MNIST test working on |
@decahedron1 Could post your test code somewhere please? The emscripten thing makes sense. Even if we compiled the rest of the code without emscripten, we'd still need all the emscripten runtime components to actually make the ONNX Runtime itself work @katopz I know we spoke in #159 (comment) about you exploring Ideally, we'd first test that |
@VivekPanyam Certainly: https://github.com/decahedron1/carton-ort-wasm-example It seems like WebGPU support with Microsoft ONNX Runtime would be much more difficult than I was anticipating - you'd have to somehow include their JavaScript code (slightly more info in the PR - microsoft/onnxruntime#14579) and connect it to the proper places, which I'm not sure is even possible with |
Thank you! I'll check it out. Okay, so then I think we have a few potential solutions: 1. Straightforward, but could cause issues if a model works with 2. Use This provides a consistent user experience. I think we'd need to explore inference performance and supported operators vs 3. Integrate all three runtimes into a single runner Another approach is to integrate all three runtimes into a single runner and allow users to do the following:
I think there might be a way to do option 3 in a way where it has a clean user experience, but we'd have to be careful about the default logic. I think it would be confusing to users/could break things if we changed the default implementation selection logic after the runner was released. Maybe a hybrid of 1 and 3 would work and users can decide to use WebGPU or not at inference time. ProposalI think we should start by implementing a runner that uses So it'll always use And if we want to, there's nothing stopping us from extending that to Thoughts? Also @decahedron1, would you be open to building/helping build a runner for Carton that uses If so, @katopz could continue exploring |
Will do, |
@VivekPanyam I generally agree with your assessment. I have no experience with CPU-based implementations of WebGPU, apart from the fact that we use it in CI to run some tests. An important thing to consider is the support for ops, which differs between the engines. |
@pixelspark That makes sense. So explicit opt-in is probably a safe bet (as long as we can design that in a way that isn't confusing to users). @pixelspark @decahedron1 Thank you both for taking the time to provide your thoughts! |
In general, please try to keep issues focused on their original topic. For more open ended conversations, consider creating a discussion. Thanks! |
@katopz do you want to build an ONNX runner using |
Sorry, to say but not real soon because
In the meantime you can assign that task to anyone. |
@VivekPanyam I also agree with your assessment on using pykeio/ort by default and having I also agree with @pixelspark's comment on considering support for ops. It’s more than reasonable to assume ONNX Runtime supports all operator kernels. With contrib and custom ops there seems to be support, but I'd be careful starting out. At the moment pykeio/ort seems to support ONNX v.1.15.1 whereas ONNX Runtime's latest version is v1.16.0, so for a given custom op it's worth verifying its support in pykeio/ort first. |
@katopz I'm sorry to hear that. I hope things work out in a way you'd like them to. |
@mstfbl Makes sense, thanks! If anyone is interested in implementing a runner with |
There are many different ways of running an ONNX model from Rust:
tract
"Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference".
Notes:
wonnx
"A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web"
Notes:
wgpu
supports Vulkan and there are software implementations of it (e.g. SwiftShader), but not sure how plug-and-play it is.ort
"A Rust wrapper for ONNX Runtime"
Notes:
If we're going to have one "official" ONNX runner, it should probably use
ort
. Unfortunately, sinceort
doesn't have WASM support, we need another solution for running from WASM environments.This could be:
ort
on desktop,tract
on WASM without GPU, andwonnx
on WASM with GPUs. This seems like a complex solution especially because they don't all support the same set of ONNX operators.tract
everywhere, but don't have GPU supportwonnx
everywhere, but require GPU/WebGPU@kali @pixelspark @decahedron1 If you get a chance, I'd really appreciate any thoughts you have on the above. Thank you!
The text was updated successfully, but these errors were encountered: