-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Distributed] Make worker state variable threadsafe #42239
Conversation
Tentatively marking backport-1.7 (i.e. assuming a happy form of this patch falls out and merges), as the original patch targeted an issue that appeared on 1.7. |
exeflags = ("--startup-file=no", | ||
"--check-bounds=yes", | ||
"--depwarn=error", | ||
"--threads=2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test seems invalid for now
e6bd8fb
to
986a8d6
Compare
# Wait on the spawned tasks on the owner | ||
@sync begin | ||
Threads.@spawn fetch_from_owner(wait, recv) | ||
Threads.@spawn fetch_from_owner(wait, send) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a very bad experience with this wait pattern, as it doesn't catch deadlocks: if the sender fails before actually sending a value, the receiver will wait forever. Similarly, if the receiver fails before retrieving the message, the sender will wait forever (if the channel is unbuffered). The @sync
block will only "forward" that failure, if all tasks finished, which never happens in this case. That's why I've resorted to a timedwait
.
IIRC, I saw some work on an eager version of @sync
, Experimental.@sync
maybe, but it's still experimental (i.e. discouraged for my use case). I couldn't find it quickly. #32677 describes some related ideas. I was looking into a custom @sync
a la
tasks = ...
# new sync end:
@sync for t in tasks
@async try
wait(t)
catch
for t2 in tasks
istaskdone(t2) || yieldto(t2, InterruptException())
end
end
end
but yieldto
is, again, discouraged and people much more adept than me suggest this is a bad idea (golang/go#29011 (comment)).
Moved to JuliaLang/Distributed.jl#4 |
Partial revert of #41722
This is the change that I suspect caused the hangs we observed on CI.