-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while freeing DeviceBuffer
-warning when using multiple GPUs
#1454
Comments
I have been working on something similar. Perhaps the main difference in my code is that I've replaced I wonder if there is something that is actually thread local rather than task local. |
Thanks for the information. |
Ah... It also appears with |
I can't reproduce this. Maybe this is due to task migration and the thread-bound state is desynchronized from the task-local one. Could you try: diff --git a/src/pool.jl b/src/pool.jl
index 9a18f51b0..664b3af40 100644
--- a/src/pool.jl
+++ b/src/pool.jl
@@ -352,17 +352,9 @@ Releases a buffer `buf` to the memory pool.
return
end
@inline function _free(buf::Mem.DeviceBuffer; stream::Union{Nothing,CuStream})
- # NOTE: this function is often called from finalizers, from which we can't switch tasks,
- # so we need to take care not to call managed functions (i.e. functions that may
- # initialize the CUDA context) because querying the active context using
- # `current_context()` takes a lock
-
- # verify that the caller has called `context!` already, which eagerly activates the
- # context (i.e. doesn't only set it in the state, but configures the CUDA APIs).
- handle_ref = Ref{CUcontext}()
- cuCtxGetCurrent(handle_ref)
- if buf.ctx.handle != handle_ref[]
- error("Trying to free $buf from a different context than the one it was allocated from ($(handle_ref[]))")
+ # verify that the caller has switched contexts
+ if buf.ctx != context()
+ error("Trying to free $buf from an unrelated context")
end
dev = current_device() |
Perhaps setting the environmental variable This will pin threads to specific processors. https://docs.julialang.org/en/v1/manual/environment-variables/#JULIA_EXCLUSIVE |
That shouldn't matter. Being pinned to a processor does not change semantics. |
This seems to solve the issue. By the way, the version inside the |
There are different branches. For example, see: |
The Issue
My application (in the package
SolidStateDetectors.jl
) is a 3D dimensional field solver with a custom kernel which is implemented usingKernelAbstractions.jl
andAdapt.jl
.To my knowledge, a single field calculations runs without issues on a single NVIDIA GPU (in the end via
CUDA.jl
).I just started to run multiple field simulations in parallel on multiple GPUs like stated in the documentation
and get warnings messages like:
The calculations seem to run through though and seems to be calculated correctly.
But I guess those CUDA warnings should not be produced none the less.
MWE
The issue does not always appear. If not, just execute the loop again.
Please let me know if you can reproduce it and whether you want an example
where I extracted the necessary parts from my package. This would need some time though.
Manifest.toml
Version info
Details on Julia:
Details on CUDA:
Stacktrace
The text was updated successfully, but these errors were encountered: