You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As stated, the CUDA code in the candle-kernels repository seems to only contain kernel functions. When I want to implement new operators (such as nonzero), it seems I'm only able to use Rust for higher-level functionality, which means I cannot utilize the device_vector from Thrust or the flagged APIs from CUB. This poses a significant challenge for implementing my algorithms. For example, to implement nonzero, it seems I would have to reimplement algorithms like exclusive_scan and scatter using the current approach?
I am hoping for a better way to utilize the CUDA ecosystem!
Specifically, I'm interested in how to:
Incorporate host functions in CUDA code to facilitate the use of libraries like Thrust and CUB.
Effectively leverage these libraries to implement algorithms and operators that are not natively supported in the current codebase.
Any guidance or best practices for achieving this would be greatly appreciated.
(Translate from Chinese using LLM, Might be a little bit.. formal^_^)
The text was updated successfully, but these errors were encountered:
I have finished a GPU version of nonzero candle-nonzero. It uses FFI to provoke CUDA functions.
I'm still wondering what is the best way to integrate it to this project🧐
As stated, the CUDA code in the candle-kernels repository seems to only contain kernel functions. When I want to implement new operators (such as nonzero), it seems I'm only able to use Rust for higher-level functionality, which means I cannot utilize the device_vector from Thrust or the flagged APIs from CUB. This poses a significant challenge for implementing my algorithms. For example, to implement nonzero, it seems I would have to reimplement algorithms like exclusive_scan and scatter using the current approach?
I am hoping for a better way to utilize the CUDA ecosystem!
Specifically, I'm interested in how to:
Any guidance or best practices for achieving this would be greatly appreciated.
(Translate from Chinese using LLM, Might be a little bit.. formal^_^)
The text was updated successfully, but these errors were encountered: