-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Never block a thread on the PeerManager
event handling lock
#2280
Never block a thread on the PeerManager
event handling lock
#2280
Conversation
8aede17
to
f868d86
Compare
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #2280 +/- ##
==========================================
+ Coverage 90.94% 91.97% +1.03%
==========================================
Files 104 104
Lines 52741 64504 +11763
Branches 52741 64504 +11763
==========================================
+ Hits 47966 59330 +11364
- Misses 4775 5174 +399
☔ View full report in Codecov by Sentry. |
Can't this also happen now with |
We used to I think but we no longer do, a second thread trying to process events in CM will now return immediately. |
I would suggest to encapsulate locking logic into a struct and make it a bit differently with incrementing the variable: use std::sync::atomic::{AtomicI32, Ordering};
pub struct CountingMutex {
counter: AtomicI32,
}
impl CountingMutex {
pub fn new() -> CountingMutex {
CountingMutex {
counter: AtomicI32::new(0),
}
}
pub fn try_lock(&self) -> bool {
self.counter.fetch_add(1, Ordering::AcqRel) == 0
}
pub fn try_unlock(&self) -> bool {
let prev = self.counter.fetch_add(-1, Ordering::AcqRel);
debug_assert!(prev > 0, "CountingMutex is in inconsisten state");
if prev == 1 {
return true;
}
self.counter.store(1, Ordering::Release);
false
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
let lock = CountingMutex::new();
// Lock, unlock
assert!(lock.try_lock());
assert!(lock.try_unlock());
assert!(lock.try_lock());
assert!(!lock.try_lock());
assert!(!lock.try_unlock());
assert!(lock.try_unlock());
}
}
|
Your alternative suggestion doesn't have the "go around again" logic that we need from this PR. We could definitely encapsulate it, though then we're adding a bunch of extra match'es and returning an enum, which would give us almost as much code to call the encapsulted struct as to actually do the atomics. |
Sorry forgot to show the usage: pub fn process_events(&self) {
if !self.event_processing_state.try_lock() {
return;
}
loop {
// The body of the function.
if self.event_processing_state.try_unlock() {
return;
}
}
} I view such encapsulation no so much as a reducing amount of code, but reducing the complexity. |
Right, but we don't have to go around more than once, we only need to go around one more time if any number of other threads signaled to us that they wanted to process events. |
Right, that is why I set But we need to loop the third time if during the second loop there was another thread reaching this function. Right? |
Ah, apologies, I'd missed the extra store(1) there, indeed, your solution is much simpler and works fine. |
Took your design, but left out the struct - this is really simple logic, and while encapsulation is great generally its also the case that bugs tend to creep in at the border, so for things that are simple enough that they can be read in-line, I tend to prefer to avoid adding a boundary that requires context switching to read and make sure we're using an API correctly. |
7f35f63
to
47cff1e
Compare
LGTM after squash. |
47cff1e
to
7925c52
Compare
Pushed a smoke test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after squash and CI fix
Huh, looks like the test is failing on all windows builds which is.....strange af? |
If thre's a thread currently handling `PeerManager` events, the next thread which attempts to handle events will block on the first and then handle events after the first completes. (later threads will return immediately to avoid blocking more than one thread). This works fine as long as the user has a spare thread to leave blocked, but if they don't (e.g. are running with a single-threaded tokio runtime) this can lead to a full deadlock. Instead, here, we never block waiting on another event processing thread, returning immediately after signaling that the first thread should start over once its complete to ensure all events are handled. While this could lead to starvation as we cause one thread to go around and around and around again, the risk of that should be relatively low as event handling should be pretty quick, and it's certainly better than deadlocking. Fixes lightningdevkit/rapid-gossip-sync-server#32 Atomic lock simplification suggestion from @andrei-21
3a68150
to
0c034e9
Compare
Huh! TIL windowz doesn't bother interrupting a thread if you only have one CPU core and the other threads waiting to run are in the same process...what a terrible OS. Anyway, squashed with a trivial fix for Winblowz and the above spelling issue:
|
If thre's a thread currently handling
PeerManager
events, the next thread which attempts to handle events will block on the first and then handle events after the first completes. (later threads will return immediately to avoid blocking more than one thread).This works fine as long as the user has a spare thread to leave blocked, but if they don't (e.g. are running with a single-threaded tokio runtime) this can lead to a full deadlock.
Instead, here, we never block waiting on another event processing thread, returning immediately after signaling that the first thread should start over once its complete to ensure all events are handled.
While this could lead to starvation as we cause one thread to go around and around and around again, the risk of that should be relatively low as event handling should be pretty quick, and it's certainly better than deadlocking.
Fixes lightningdevkit/rapid-gossip-sync-server#32