Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix a race condition in AbstractParallelProcessor #3844

Merged

Conversation

slarse
Copy link
Collaborator

@slarse slarse commented Mar 16, 2021

Fix #3806

Currently just a test to reproduce the problem. Want to see if it works on GitHub Actions machines, which I believe have 2 cores. So it should work.

This PR fixes a race condition in AbstractParallelProcessor. 67c8aa3 introduces a test that reproduces the problem, and 40a9602 fixes the problem by keeping track of the last job submitted for each processor and then waiting (indefinitely) for these to terminate. We only need to keep track of the latest job, as each processor only handles one job at a time.

We could also busy-wait on the processors queue to fill back up, which is somewhat less complex, but less efficient and less explicit. I prefer the current solution.

@slarse
Copy link
Collaborator Author

slarse commented Mar 16, 2021

Test does indeed work. Now to see if I can fix this without spinning.

@slarse slarse changed the title wip: fix: Fix race condition in AbstractParallelProcessor wip: fix: Wait for processors to complete in AbstractParallelProcessor Mar 16, 2021
@slarse slarse changed the title wip: fix: Wait for processors to complete in AbstractParallelProcessor review: fix: Wait for processors to complete in AbstractParallelProcessor Mar 16, 2021
@monperrus
Copy link
Collaborator

LGTM. Will merge.

As a side note for future work, parallelization through a higher level of abstraction such as the actor model would be less error-prone.

@slarse
Copy link
Collaborator Author

slarse commented Mar 22, 2021

As a side note for future work, parallelization through a higher level of abstraction such as the actor model would be less error-prone.

Possibly. I was thinking that fork/join might be a good fit here, as it's very easy to apply recursively, and the standard library has good support for it. The downside is that the client code (i.e. the processor itself) must then be aware of the parallelism. The upside would be to grant the client code significantly more control (just fork when appropriate), and in doing so one could actually mutate the Spoon tree in parallel (if done carefully). That's not really possible right now, as the tree is scanned sequentially, and the processing of each element is delegated to the first available thread.

I'm not sure how to make advanced parallelism possible through just a subclass of AbstractProcessor. I think it needs to be implemented as a scanner, which in turn has special support for parallel processors.

But that's a topic for another day, this PR was just a bugfix with the current architecture :)

@monperrus monperrus changed the title review: fix: Wait for processors to complete in AbstractParallelProcessor fix: fix a race condition in AbstractParallelProcessor Mar 24, 2021
@monperrus monperrus merged commit e839271 into INRIA:master Mar 24, 2021
@monperrus
Copy link
Collaborator

Thanks a lot @slarse

@slarse slarse deleted the issue/3806-fix-parallel-processor-race-condition branch March 24, 2021 07:47
@monperrus monperrus mentioned this pull request Aug 19, 2021
woutersmeenk pushed a commit to woutersmeenk/spoon that referenced this pull request Aug 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug/chore: ParallelProcessor tests are flaky
2 participants