Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for GPU/CUDA #461

Closed
haixuanTao opened this issue Mar 30, 2021 · 5 comments
Closed

Support for GPU/CUDA #461

haixuanTao opened this issue Mar 30, 2021 · 5 comments

Comments

@haixuanTao
Copy link

I tried this package on the exemple Pytorch and it seems that there's currently no GPU/CUDA support.

Is it a feature that is going to be added in the future?

Thanks in advance for this package :)

@haixuanTao haixuanTao changed the title Support for GPU Support for GPU/CUDA Mar 30, 2021
@kali
Copy link
Collaborator

kali commented Mar 30, 2021

Thanks for your interest

This is correct. There is no CUDA or GPU support, and it is not on the roadmap. We are not putting too much energy on the PC platform, tract main focus being arm32 and arm64 devices. If something happens on the GPU, it won't come from us.

@haixuanTao
Copy link
Author

Make sense.

Best option for GPU ONNX on rust at the moment seems to be https://github.com/nbigaouette/onnxruntime-rs

I'll close the issue.

@dvc94ch
Copy link

dvc94ch commented Apr 13, 2022

That might make sense for training the nn on a gpu. Support for gpu in tract is probably more along the lines of doing inference on a mobile gpu. Might make sense to reopen the issue as it's not resolved and other people might be interested in the status of this.

@dsseng
Copy link

dsseng commented Apr 24, 2022

+1 for reopening. Yes, I beleive support for, e.g., Vulkan, would boost the library capabilities. I have found this project and liked it because Rust is definitely more maintainable than C. With hardware acceleration tract will be a really great tool for mobile as well (mobile devices have GL/CL/Vulkan to accelerate inference, and those are a lot more power-efficient). I would like to work on some HW acceleration approach if such is wanted.

One of ideas is to create a loadable C-ABI libraries for accel methods. It would require creating an API and binary-stable interface to C-style libraries (built with Rust, C/C++ or many other languages), which are discovered and loaded by core using libloading. They would need to have attributes such as AVX2 (arch features necessary to load, btw, SIMD could also help accelerating inference). API should include generic buffer management functions (create, free, pass to dev/to CPU, get location (shm, unified memory, dedicated memory, RDMA etc), import/export platform buffer) and operators' implementations.

I propose implementing either wgpu or direct Vulkan backend. Former is preferred, as it should be able to activate GPU nearly on anything: Windows, Linux (incl. Android) and macOS + MVK with Vulkan backend, Metal for macOS and iOS, fallback OpenGL on Linux (and Android) with older GPUs, DirectX fallback for Windows when Vulkan is not available, WebGPU with WebGL fallback with WebAssembly.

@kali
Copy link
Collaborator

kali commented Apr 24, 2022

FYI, tract-linalg already provides runtime switchable implementations for some high cost operations (matrix multiplication, sigmoid and tanh, mostly). It is accelerated already on SIMD (with specific code for intel v2, and some armv7 and arm8 configurations).

One possible path to accelerating tract when a GPU is available is to implement the matrix multiplication on GPU. I think there is a MVP here with local changes only (in tract-linalg). We could then move on to lowering more operators in tract-linalg, discuss buffer locality and stuff, that would require some awareness from tract-core and tract-data.

One other option is to use tract-core as infrastructure and semantic pivot, but instead of optimizing the network in tract, substitute some operators with external ones from another crate that provides acceleration. This path is used by some teams who have specific hardware with capabilities they want to use without exposing themselves, as it allows things to happen outside of tract. This is not a requirement, such a crate could also be part of tract. Also, a source of friction for external consumer of tract-core is tract-core stability, on which I have not make commitments so far as I considered it internal, even if the API is de-facto pretty stable.

The reason why I closed this as an issue is not that I don't want it to happen: I don't have resources for it. tract is mostly a one-man-show so far, so I must pick my battles. If somebody else picks this one up and start working on it, I can help with guidance and discussing some changes to make it easier, but I can't do the heavy lifting. Until somebody does, this conversation is just daydreaming while I believe issues have to be actionable.

I'm gonna move this topic to the discussion section at this stage.

@sonos sonos locked and limited conversation to collaborators Apr 24, 2022
@kali kali converted this issue into discussion #688 Apr 24, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants