Support for GPU/CUDA #461

haixuanTao · 2021-03-30T07:45:46Z

I tried this package on the exemple Pytorch and it seems that there's currently no GPU/CUDA support.

Is it a feature that is going to be added in the future?

Thanks in advance for this package :)

kali · 2021-03-30T12:16:14Z

Thanks for your interest

This is correct. There is no CUDA or GPU support, and it is not on the roadmap. We are not putting too much energy on the PC platform, tract main focus being arm32 and arm64 devices. If something happens on the GPU, it won't come from us.

haixuanTao · 2021-03-30T12:38:21Z

Make sense.

Best option for GPU ONNX on rust at the moment seems to be https://github.com/nbigaouette/onnxruntime-rs

I'll close the issue.

dvc94ch · 2022-04-13T22:35:32Z

That might make sense for training the nn on a gpu. Support for gpu in tract is probably more along the lines of doing inference on a mobile gpu. Might make sense to reopen the issue as it's not resolved and other people might be interested in the status of this.

dsseng · 2022-04-24T07:27:00Z

+1 for reopening. Yes, I beleive support for, e.g., Vulkan, would boost the library capabilities. I have found this project and liked it because Rust is definitely more maintainable than C. With hardware acceleration tract will be a really great tool for mobile as well (mobile devices have GL/CL/Vulkan to accelerate inference, and those are a lot more power-efficient). I would like to work on some HW acceleration approach if such is wanted.

One of ideas is to create a loadable C-ABI libraries for accel methods. It would require creating an API and binary-stable interface to C-style libraries (built with Rust, C/C++ or many other languages), which are discovered and loaded by core using libloading. They would need to have attributes such as AVX2 (arch features necessary to load, btw, SIMD could also help accelerating inference). API should include generic buffer management functions (create, free, pass to dev/to CPU, get location (shm, unified memory, dedicated memory, RDMA etc), import/export platform buffer) and operators' implementations.

I propose implementing either wgpu or direct Vulkan backend. Former is preferred, as it should be able to activate GPU nearly on anything: Windows, Linux (incl. Android) and macOS + MVK with Vulkan backend, Metal for macOS and iOS, fallback OpenGL on Linux (and Android) with older GPUs, DirectX fallback for Windows when Vulkan is not available, WebGPU with WebGL fallback with WebAssembly.

kali · 2022-04-24T08:44:02Z

FYI, tract-linalg already provides runtime switchable implementations for some high cost operations (matrix multiplication, sigmoid and tanh, mostly). It is accelerated already on SIMD (with specific code for intel v2, and some armv7 and arm8 configurations).

One possible path to accelerating tract when a GPU is available is to implement the matrix multiplication on GPU. I think there is a MVP here with local changes only (in tract-linalg). We could then move on to lowering more operators in tract-linalg, discuss buffer locality and stuff, that would require some awareness from tract-core and tract-data.

One other option is to use tract-core as infrastructure and semantic pivot, but instead of optimizing the network in tract, substitute some operators with external ones from another crate that provides acceleration. This path is used by some teams who have specific hardware with capabilities they want to use without exposing themselves, as it allows things to happen outside of tract. This is not a requirement, such a crate could also be part of tract. Also, a source of friction for external consumer of tract-core is tract-core stability, on which I have not make commitments so far as I considered it internal, even if the API is de-facto pretty stable.

The reason why I closed this as an issue is not that I don't want it to happen: I don't have resources for it. tract is mostly a one-man-show so far, so I must pick my battles. If somebody else picks this one up and start working on it, I can help with guidance and discussing some changes to make it easier, but I can't do the heavy lifting. Until somebody does, this conversation is just daydreaming while I believe issues have to be actionable.

I'm gonna move this topic to the discussion section at this stage.

haixuanTao changed the title ~~Support for GPU~~ Support for GPU/CUDA Mar 30, 2021

haixuanTao closed this as completed Mar 30, 2021

sonos locked and limited conversation to collaborators Apr 24, 2022

kali converted this issue into discussion #688 Apr 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Support for GPU/CUDA #461

Support for GPU/CUDA #461

haixuanTao commented Mar 30, 2021

kali commented Mar 30, 2021

haixuanTao commented Mar 30, 2021

dvc94ch commented Apr 13, 2022

dsseng commented Apr 24, 2022 •

edited

Loading

kali commented Apr 24, 2022

This issue was moved to a discussion.

This issue was moved to a discussion.

Support for GPU/CUDA #461

Support for GPU/CUDA #461

Comments

haixuanTao commented Mar 30, 2021

kali commented Mar 30, 2021

haixuanTao commented Mar 30, 2021

dvc94ch commented Apr 13, 2022

dsseng commented Apr 24, 2022 • edited Loading

kali commented Apr 24, 2022

This issue was moved to a discussion.

dsseng commented Apr 24, 2022 •

edited

Loading