-
Notifications
You must be signed in to change notification settings - Fork 215
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for GPU/CUDA #461
Comments
Thanks for your interest This is correct. There is no CUDA or GPU support, and it is not on the roadmap. We are not putting too much energy on the PC platform, tract main focus being arm32 and arm64 devices. If something happens on the GPU, it won't come from us. |
Make sense. Best option for GPU ONNX on rust at the moment seems to be https://github.com/nbigaouette/onnxruntime-rs I'll close the issue. |
That might make sense for training the nn on a gpu. Support for gpu in tract is probably more along the lines of doing inference on a mobile gpu. Might make sense to reopen the issue as it's not resolved and other people might be interested in the status of this. |
+1 for reopening. Yes, I beleive support for, e.g., Vulkan, would boost the library capabilities. I have found this project and liked it because Rust is definitely more maintainable than C. With hardware acceleration tract will be a really great tool for mobile as well (mobile devices have GL/CL/Vulkan to accelerate inference, and those are a lot more power-efficient). I would like to work on some HW acceleration approach if such is wanted. One of ideas is to create a loadable C-ABI libraries for accel methods. It would require creating an API and binary-stable interface to C-style libraries (built with Rust, C/C++ or many other languages), which are discovered and loaded by core using I propose implementing either |
FYI, tract-linalg already provides runtime switchable implementations for some high cost operations (matrix multiplication, sigmoid and tanh, mostly). It is accelerated already on SIMD (with specific code for intel v2, and some armv7 and arm8 configurations). One possible path to accelerating tract when a GPU is available is to implement the matrix multiplication on GPU. I think there is a MVP here with local changes only (in tract-linalg). We could then move on to lowering more operators in tract-linalg, discuss buffer locality and stuff, that would require some awareness from tract-core and tract-data. One other option is to use tract-core as infrastructure and semantic pivot, but instead of optimizing the network in tract, substitute some operators with external ones from another crate that provides acceleration. This path is used by some teams who have specific hardware with capabilities they want to use without exposing themselves, as it allows things to happen outside of tract. This is not a requirement, such a crate could also be part of tract. Also, a source of friction for external consumer of tract-core is tract-core stability, on which I have not make commitments so far as I considered it internal, even if the API is de-facto pretty stable. The reason why I closed this as an issue is not that I don't want it to happen: I don't have resources for it. tract is mostly a one-man-show so far, so I must pick my battles. If somebody else picks this one up and start working on it, I can help with guidance and discussing some changes to make it easier, but I can't do the heavy lifting. Until somebody does, this conversation is just daydreaming while I believe issues have to be actionable. I'm gonna move this topic to the discussion section at this stage. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I tried this package on the exemple Pytorch and it seems that there's currently no GPU/CUDA support.
Is it a feature that is going to be added in the future?
Thanks in advance for this package :)
The text was updated successfully, but these errors were encountered: