ignore: gracefully quit worker threads upon panic in ParallelVisitor#3010
ignore: gracefully quit worker threads upon panic in ParallelVisitor#3010cosmicexplorer wants to merge 4 commits intoBurntSushi:masterfrom
Conversation
| + std::panic::UnwindSafe | ||
| + Send | ||
| + Sync | ||
| + 'static, |
There was a problem hiding this comment.
Unfortunately, I think this is plausibly too big of a breaking change for me to stomach in a semver compatible release. And I don't have the bandwidth to do a semver incompatible release right now. Which kind of makes this PR stuck.
One possible way to get this unstuck is to add a new API that avoid this problem and deprecate the old one. But I'd only want to go this route if it was a very small addition that doesn't significantly complicate the implementation. I'm not sure if such a solution exists.
Otherwise, I'm inclined to leave this bug open for now, and just something to address in the next semver incompatible release.
|
Closing, but keeping #3010 open. |
|
Also, thank you for your work on this! Even though I ended up not taking this PR, your work will be something worth building on for |
|
Wanted to confirm here that:
And I am super glad to hear I would like you to know that as usual, it is extremely difficult to improve upon the kind of work you do. I have tried to argue with your comments in my head and largely failed. I have spent months investigating this, and I have had to introduce some immense complexity in order to make a pretty minor use case more robust. Great work. I'm also looking at the API, and while I'm not done with mine yet, I would absolutely recommend considering the use of I also think the deadlock that occurred here is a symptom of a more general misalignment of the threading model to the input distribution. I think work stealing + recursive data generation (so each work item can always produce more) is necessarily prone to this deadlock. I do not have a proof of this, and I also do not have a more appropriate answer myself yet. I do understand that work stealing is a good approach to avoid having a single very large directory fill up its own queue. I also wonder whether there is any benefit to maximizing thread locality. Finally, I was informed about a very widely available syscall I am actually very confident that supporting
Again, very impressed by your engineering judgement, and I appreciate when you write a comment about decisions you didn't take too. Keep it up!! |
Fixes #3009.
Problem
WalkParallel::visit()will nondeterministically hang if theParallelVisitor::visit()implementation panics. This also occurs when providing a closure toWalkParallel::run(). Minimal repro is provided in #3009:The above code will nondeterministically hang, because of an infinite loop when no new work is available:
ripgrep/crates/ignore/src/walk.rs
Lines 1695 to 1707 in de4baa1
Solution
quit_nowflag in our wait loop.run()method and setquit_nowbefore propagating the panic.Breaking change: In order to ensure soundness, we also enforce that the filter method provided to
WalkBuilder#filter_entry()isUnwindSafe. Users can circumvent this by wrapping inAssertUnwindSafeas needed.Result
The added test
panic_in_parallel()always succeeds instead of hanging.