-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[DeviceAPI] Support "GetCurrentStream" #16689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DeviceAPI] Support "GetCurrentStream" #16689
Conversation
This PR introduces a new function `GetCurrentStream`to device API, which returns the current stream of the given device. Meanwhile, this PR updates the "CreateStream" of CUDA to creating a non-blocking stream, so that the execution on this stream can overlap with the execution of other streams. This PR also changes the `GPUCopy` of CUDA device API to always using `cudaMemcpyAsync`.
This PR introduces a new function `GetCurrentStream`to device API, which returns the current stream of the given device. Meanwhile, this PR updates the "CreateStream" of CUDA to creating a non-blocking stream, so that the execution on this stream can overlap with the execution of other streams. This PR also changes the `GPUCopy` of CUDA device API to always using `cudaMemcpyAsync`.
I think this portion of the commit needs to be reverted. Prior to this commit, the
This function is used in many locations which relied on the previous semantics. For example:
|
|
Indeed agree that this makes things more relaxed than beahvior. On the other hand, from the device api's pov, we don't really guarantee the sync behavior in generic DeviceAPI:
One possible middleground we could have is to update CopyTo to always enable a StreamSync before CopyTo ends, this would help us preserve original usage of CopyTo, but still allows low level device API to enable async copy behavior that would generally provide more optimizations opportunities. |
|
actually it is great this topic get revealed! since the original logics would cause issues for backends like metal/vulkan due to the fact that these copies are async, we also explicitly documented the possible sync/async behavior in NDArray interface
|
|
#16716 contains the followup |
|
Thank you for the quick turnaround on the fix, and I like it. I agree that most GPU frameworks are asynchronous by design, and by necessity. My concern was mainly that it was a change in the existing
Ah, I had thought that was intentional. Absent any explicit opt-in, the GPU operations would be synchronized on attempting to read, but all sequences of GPU operations would be asynchronous. With the stream parameter, the transfers to the GPU would also be async. I like the change, to have the most common API be synchronous, while all internal APIs are asynchronous. |
This PR introduces a new function `GetCurrentStream`to device API, which returns the current stream of the given device. Meanwhile, this PR updates the "CreateStream" of CUDA to creating a non-blocking stream, so that the execution on this stream can overlap with the execution of other streams. This PR also changes the `GPUCopy` of CUDA device API to always using `cudaMemcpyAsync`.
This PR introduces a new function
GetCurrentStreamto device API, which returns the current stream of the given device.Meanwhile, this PR updates the "CreateStream" of CUDA to creating a non-blocking stream, so that the execution on this stream can overlap with the execution of other streams.
This PR also changes the
GPUCopyof CUDA device API to always usingcudaMemcpyAsync.