Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Distributed] Make worker state variable threadsafe #42239

Closed
wants to merge 2 commits into from

Conversation

vchuravy
Copy link
Member

Partial revert of #41722

This is the change that I suspect caused the hangs we observed on CI.

@Sacha0
Copy link
Member

Sacha0 commented Sep 13, 2021

Tentatively marking backport-1.7 (i.e. assuming a happy form of this patch falls out and merges), as the original patch targeted an issue that appeared on 1.7.

exeflags = ("--startup-file=no",
"--check-bounds=yes",
"--depwarn=error",
"--threads=2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test seems invalid for now

@vchuravy vchuravy force-pushed the vc/distributed_ts_state branch from e6bd8fb to 986a8d6 Compare September 29, 2021 03:51
Comment on lines +47 to +51
# Wait on the spawned tasks on the owner
@sync begin
Threads.@spawn fetch_from_owner(wait, recv)
Threads.@spawn fetch_from_owner(wait, send)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a very bad experience with this wait pattern, as it doesn't catch deadlocks: if the sender fails before actually sending a value, the receiver will wait forever. Similarly, if the receiver fails before retrieving the message, the sender will wait forever (if the channel is unbuffered). The @sync block will only "forward" that failure, if all tasks finished, which never happens in this case. That's why I've resorted to a timedwait.

IIRC, I saw some work on an eager version of @sync, Experimental.@sync maybe, but it's still experimental (i.e. discouraged for my use case). I couldn't find it quickly. #32677 describes some related ideas. I was looking into a custom @sync a la

tasks = ...
# new sync end:
@sync for t in tasks
    @async try
        wait(t)
    catch
        for t2 in tasks
            istaskdone(t2) || yieldto(t2, InterruptException())
        end
    end
end

but yieldto is, again, discouraged and people much more adept than me suggest this is a bad idea (golang/go#29011 (comment)).

@vchuravy
Copy link
Member Author

Moved to JuliaLang/Distributed.jl#4

@vchuravy vchuravy closed this Oct 28, 2023
@vchuravy vchuravy deleted the vc/distributed_ts_state branch October 28, 2023 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrency violation on interplay between Distributed and Base.Threads
5 participants