Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX support #165

Open
VivekPanyam opened this issue Sep 29, 2023 · 17 comments
Open

ONNX support #165

VivekPanyam opened this issue Sep 29, 2023 · 17 comments
Labels
new ml framework Adding support for a new runner/ML framework

Comments

@VivekPanyam
Copy link
Owner

There are many different ways of running an ONNX model from Rust:

tract

"Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference".

Notes:

wonnx

"A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web"

Notes:

  • This uses wgpu under the hood so it supports a lot of platforms
  • Importantly, this supports WASM and WebGPU.
  • I'm unclear on how strong its CPU inference support is. wgpu supports Vulkan and there are software implementations of it (e.g. SwiftShader), but not sure how plug-and-play it is.
  • Can it run in WASM without WebGPU?

ort

"A Rust wrapper for ONNX Runtime"

Notes:

  • Rust bindings for the official ONNX Runtime
  • Seems to be used in prod
  • Doesn't appear to support WASM yet. The underlying runtime does support it so maybe that's coming soon. There's an issue about it with recent activity.

If we're going to have one "official" ONNX runner, it should probably use ort. Unfortunately, since ort doesn't have WASM support, we need another solution for running from WASM environments.

This could be:

  • One "official" ONNX runner for Carton that uses ort on desktop, tract on WASM without GPU, and wonnx on WASM with GPUs. This seems like a complex solution especially because they don't all support the same set of ONNX operators.
  • Use tract everywhere, but don't have GPU support
  • Use wonnx everywhere, but require GPU/WebGPU

@kali @pixelspark @decahedron1 If you get a chance, I'd really appreciate any thoughts you have on the above. Thank you!

@VivekPanyam VivekPanyam added the new ml framework Adding support for a new runner/ML framework label Sep 29, 2023
@decahedron1
Copy link

FWIW, I recently pushed ort v1.15.5 which adds support for WASM.

@VivekPanyam
Copy link
Owner Author

Oh wow, that's great! I see you landed pykeio/ort@092907a a bit after I created this issue :)

I'm working on getting the GPT2 example working from WASM and I'll comment with how it goes!

Is there a WebGPU or WebGL execution provider btw?

The ONNX Runtime website says:

you have the option to use webgl or webgpu for GPU processing, and WebAssembly (wasm, alias to cpu) for CPU processing. All ONNX operators are supported by WASM but only a subset are currently supported by WebGL and WebGPU.

@decahedron1
Copy link

you have the option to use webgl or webgpu for GPU processing, and WebAssembly (wasm, alias to cpu) for CPU processing. All ONNX operators are supported by WASM but only a subset are currently supported by WebGL and WebGPU.

I couldn't find any documentation on how to actually use either backend. I think it may be automatically available just by compiling with --use_jsep but I'm not sure. I'll keep looking into it.

@VivekPanyam
Copy link
Owner Author

Thanks! It looks like a WASM build with 1.15.5 fails:

  1. close_lib_handle is not defined for WASM

https://github.com/pykeio/ort/blob/bca00dc96d8e6fd047fa44ebd5c5287517ed0af1/src/session.rs#L759-L767

  1. std::os::unix::ffi::OsStrExt; doesn't exist on WASM

https://github.com/pykeio/ort/blob/bca00dc96d8e6fd047fa44ebd5c5287517ed0af1/src/session.rs#L5-L6

I am using wasm32-unknown-unknown though. I don't believe using wasm32-wasi would fix it, but I noticed you're building using emscripten. Does that work?

@decahedron1
Copy link

I got a simple MNIST test working on wasm32-unknown-emscripten. Since ONNX Runtime itself is compiling with Emscripten I don't believe it would work on wasm32-unknown-unknown either way.

@VivekPanyam
Copy link
Owner Author

@decahedron1 Could post your test code somewhere please?

The emscripten thing makes sense. Even if we compiled the rest of the code without emscripten, we'd still need all the emscripten runtime components to actually make the ONNX Runtime itself work

@katopz I know we spoke in #159 (comment) about you exploring wonnx and WASM, but would you be open to trying to get this working with ort?

Ideally, we'd first test that ort works from WASM (with and without WebGPU) and then we can build a basic ONNX runner that supports Linux, macOS and WASM.

@decahedron1
Copy link

decahedron1 commented Sep 30, 2023

@VivekPanyam Certainly: https://github.com/decahedron1/carton-ort-wasm-example

It seems like WebGPU support with Microsoft ONNX Runtime would be much more difficult than I was anticipating - you'd have to somehow include their JavaScript code (slightly more info in the PR - microsoft/onnxruntime#14579) and connect it to the proper places, which I'm not sure is even possible with --build_wasm_static_lib, so wonnx might be worth exploring for GPU acceleration on web.

@VivekPanyam
Copy link
Owner Author

Thank you! I'll check it out.

Okay, so then I think we have a few potential solutions:

1. wonnx on platforms with WebGPU available and ort everywhere else.

Straightforward, but could cause issues if a model works with ort, but fails with wonnx (or vice versa).

2. Use wonnx everywhere (if it can also run without GPUs).

This provides a consistent user experience.

I think we'd need to explore inference performance and supported operators vs ort if we decide to look into the second approach.

3. Integrate all three runtimes into a single runner

Another approach is to integrate all three runtimes into a single runner and allow users to do the following:

  • specify disallowed implementations at packing time or inference time
  • explicitly pick an implementation to run with (at packing or inference time)
  • set a priority order (at packing or inference time)
  • leave it up to the runner to decide based on some logic (if none of the above are specified)

I think there might be a way to do option 3 in a way where it has a clean user experience, but we'd have to be careful about the default logic. I think it would be confusing to users/could break things if we changed the default implementation selection logic after the runner was released.

Maybe a hybrid of 1 and 3 would work and users can decide to use WebGPU or not at inference time.

Proposal

I think we should start by implementing a runner that uses ort everywhere it's supported. We can then add in WebGPU support with wonnx and make it an explicit opt-in at inference time.

So it'll always use ort (and the "official" ONNX Runtime) unless you explicitly tell it to use wonnx with WebGPU.

And if we want to, there's nothing stopping us from extending that to tract. ort is always the default and everything else is an explicit opt-in.

Thoughts?

Also @decahedron1, would you be open to building/helping build a runner for Carton that uses ort?

If so, @katopz could continue exploring wonnx

@katopz
Copy link

katopz commented Sep 30, 2023

Will do, wonnx and ort is on my waiting list. Anyway yesterday I try explore/build/compile native/wasm examples from https://github.com/huggingface/candle (yes i still evaluate things here) I like to know what your thought on candle approach?

@pixelspark
Copy link

@VivekPanyam I generally agree with your assessment. wonnx is an option if you are looking for a relatively lightweight (and Rust native) way to run ONNX models on GPU. In essence wonnx translates ONNX models to wgsl shaders and executes these using wgpu on the GPU.

I have no experience with CPU-based implementations of WebGPU, apart from the fact that we use it in CI to run some tests. wonnx can't run on the web in WASM if the browser does not offer (or has disabled) WebGPU support. On the web, wgpu is merely a passthrough layer to the underlying browser-implemented WebGPU API (which in Firefox again is based on wgpu by the way!).

An important thing to consider is the support for ops, which differs between the engines. wonnx certainly does not support all ops. Additionally wonnx works using 'ahead of time' compilation of shaders (which means all shapes need to be known in advance - there is shape inference functionality for this), and because of this certain ops with dynamic shapes are not supported and will be very hard to support in the future.

@VivekPanyam
Copy link
Owner Author

@pixelspark That makes sense. So explicit opt-in is probably a safe bet (as long as we can design that in a way that isn't confusing to users).

@pixelspark @decahedron1 Thank you both for taking the time to provide your thoughts!

@VivekPanyam
Copy link
Owner Author

Will do, wonnx and ort is on my waiting list. Anyway yesterday I try explore/build/compile native/wasm examples from https://github.com/huggingface/candle (yes i still evaluate things here) I like to know what your thought on candle approach?

@katopz see #164

In general, please try to keep issues focused on their original topic. For more open ended conversations, consider creating a discussion. Thanks!

@VivekPanyam
Copy link
Owner Author

@katopz do you want to build an ONNX runner using ort (and then we can add wonnx and WebGPU support once the runner is working)?

@katopz
Copy link

katopz commented Oct 3, 2023

Sorry, to say but not real soon because

  1. I still has no idea how to accomplish that yet.
  2. I will get fire in the next 2 months so no time for hobby yet. 🫠

In the meantime you can assign that task to anyone.

@mstfbl
Copy link

mstfbl commented Oct 3, 2023

@VivekPanyam I also agree with your assessment on using pykeio/ort by default and having wonnx as an explicit opt-in. There can be a case made for having wonnx as default with WebGPU pending performance seen in experiments comparing the two Rust ONNX wrappers.

I also agree with @pixelspark's comment on considering support for ops. It’s more than reasonable to assume ONNX Runtime supports all operator kernels. With contrib and custom ops there seems to be support, but I'd be careful starting out. At the moment pykeio/ort seems to support ONNX v.1.15.1 whereas ONNX Runtime's latest version is v1.16.0, so for a given custom op it's worth verifying its support in pykeio/ort first.

@VivekPanyam
Copy link
Owner Author

@katopz I'm sorry to hear that. I hope things work out in a way you'd like them to.

@VivekPanyam
Copy link
Owner Author

@mstfbl Makes sense, thanks!

If anyone is interested in implementing a runner with ort, feel free to comment below :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new ml framework Adding support for a new runner/ML framework
Projects
None yet
Development

No branches or pull requests

5 participants