fix(banking-stage): shutdown log spam by OliverNChalk · Pull Request #9417 · anza-xyz/agave

OliverNChalk · 2025-12-04T21:33:58Z

Problem

The scheduler controller will shutdown if the upstream sender disconnects. This is seen as an "unexpected exit" by the manager thread and the scheduler is restarted infinitely (until the shutdown signal reaches the manager).

Summary of Changes

If we shutdown the scheduler due to an upstream disconnect, then we shutdown he manager too as there is no point in spawning additional schedulers.

codecov-commenter · 2025-12-04T22:06:10Z

Codecov Report

❌ Patch coverage is 78.26087% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.6%. Comparing base (16af42c) to head (ebf62ae).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##           master    #9417     +/-   ##
=========================================
- Coverage    82.7%    82.6%   -0.2%     
=========================================
  Files         843      852      +9     
  Lines      315498   318151   +2653     
=========================================
+ Hits       261105   262935   +1830     
- Misses      54393    55216    +823

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tao-stones

lgtm, assume you find way to test it?

OliverNChalk · 2025-12-05T20:44:23Z

lgtm, assume you find way to test it?

Yep, tested manually on devnet by starting & stopping an rpc

t-nelson · 2025-12-05T20:50:08Z

        },
        time::Instant,
    },
+    tokio_util::sync::CancellationToken,


is this suitable outside async contexts?

Pretty sure, it's just either an atomic operation if the runtime isnt parked or a syscall to unpark the runtime (should be a write syscall for our single threaded runtime).

Fairly sure this is where we get to on our sync caller side:

pub fn wake(self) { // The actual wakeup call is delegated through a virtual function call // to the implementation which is defined by the executor. // Don't call `drop` -- the waker will be consumed by `wake`. let this = ManuallyDrop::new(self); // SAFETY: This is safe because `Waker::from_raw` is the only way // to initialize `wake` and `data` requiring the user to acknowledge // that the contract of `RawWaker` is upheld. unsafe { (this.waker.vtable.wake)(this.waker.data) }; }

After this the queued waker runs which is type erased. Most likely this is the wake method on the other side:

pub(crate) fn wake(&self) -> io::Result<()> { // The epoll emulation on some illumos systems currently requires // the eventfd to be read before an edge-triggered read event is // generated. // See https://www.illumos.org/issues/16700. #[cfg(target_os = "illumos")] self.reset()?; let buf: [u8; 8] = 1u64.to_ne_bytes(); match (&self.fd).write(&buf) { Ok(_) => Ok(()), Err(ref err) if err.kind() == io::ErrorKind::WouldBlock => { // Writing only blocks if the counter is going to overflow. // So we'll reset the counter to 0 and wake it again. self.reset()?; self.wake() } Err(err) => Err(err), } }

tao-stones

Thanks for sorting it out for both internal and external paths.

t-nelson · 2026-01-07T03:11:01Z

this will be superseded by #9786, right?

OliverNChalk · 2026-01-07T08:30:23Z

this will be superseded by #9786, right?

No these are unrelated. Cavey's fix is a perf issue that results in "poh tick reached" spam. This fix results in "worker shutdown unexpectedly" spam. Basically the workers see the exit signal before the management thread and it causes the management thread to spam warn/error.

EDIT: Forgot about this PR so will refresh myself and determine if its mergeable

t-nelson · 2026-01-07T14:58:45Z

wow. so much spam. the logs are just like my inbox

apfitzge · 2026-01-20T15:27:26Z

anything holding the merge of this up?

OliverNChalk · 2026-01-20T16:52:20Z

Was waiting on a trent response then forgot. Have synced master in, re self-reviewed, and fixed a typo (recieve -> receive). Re-running CI with latest master then will request re-sign off for merge.

t-nelson · 2026-01-20T18:46:30Z

r+ sme sign off

i think i'm fine with this fix, but it's necessity hints at an architectural flaw. seems like the exit bool should be enough, but we're not using it correctly

OliverNChalk · 2026-01-20T18:59:55Z

r+ sme sign off

i think i'm fine with this fix, but it's necessity hints at an architectural flaw. seems like the exit bool should be enough, but we're not using it correctly

IMO the flaw is in Agave, it uses an exit bool and all threads shutdown simultaneously. In a civilized binary we would have a shutdown sequence rather than a bunch of racey threads all shutting down at the same time. To this end the banking stage actually enforces a reasonable shutdown order (manager waits for all workers to exit cleanly before it itself exits).

apfitzge

question about non-vote threads

OliverNChalk marked this pull request as ready for review December 4, 2025 21:34

OliverNChalk requested review from steviez and tao-stones December 4, 2025 21:34

tao-stones reviewed Dec 4, 2025

View reviewed changes

Comment thread core/src/banking_stage.rs

tao-stones previously approved these changes Dec 5, 2025

View reviewed changes

OliverNChalk dismissed tao-stones’s stale review via 202d36d December 5, 2025 16:54

OliverNChalk requested review from apfitzge and tao-stones December 5, 2025 19:22

OliverNChalk force-pushed the fix/banking-shutdown-log-spam branch from 4298e26 to 216272c Compare December 5, 2025 20:29

t-nelson reviewed Dec 5, 2025

View reviewed changes

tao-stones reviewed Dec 5, 2025

View reviewed changes

Comment thread core/src/banking_stage.rs

OliverNChalk force-pushed the fix/banking-shutdown-log-spam branch from 35069b9 to e091a3b Compare December 5, 2025 21:28

tao-stones previously approved these changes Dec 5, 2025

View reviewed changes

fix(banking-stage): shutdown log spam

ebf62ae

OliverNChalk dismissed tao-stones’s stale review via ebf62ae January 20, 2026 16:51

OliverNChalk force-pushed the fix/banking-shutdown-log-spam branch from e091a3b to ebf62ae Compare January 20, 2026 16:51

apfitzge reviewed Jan 21, 2026

View reviewed changes

Comment thread core/src/banking_stage/vote_worker.rs

apfitzge approved these changes Jan 21, 2026

View reviewed changes

OliverNChalk added this pull request to the merge queue Jan 21, 2026

Merged via the queue into anza-xyz:master with commit 5826527 Jan 21, 2026
47 checks passed

OliverNChalk deleted the fix/banking-shutdown-log-spam branch January 21, 2026 16:20

Conversation

OliverNChalk commented Dec 4, 2025

Problem

Summary of Changes

Uh oh!

codecov-commenter commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

tao-stones left a comment

Choose a reason for hiding this comment

Uh oh!

OliverNChalk commented Dec 5, 2025

Uh oh!

t-nelson Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

OliverNChalk Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tao-stones left a comment

Choose a reason for hiding this comment

Uh oh!

t-nelson commented Jan 7, 2026

Uh oh!

OliverNChalk commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-nelson commented Jan 7, 2026

Uh oh!

apfitzge commented Jan 20, 2026

Uh oh!

OliverNChalk commented Jan 20, 2026

Uh oh!

t-nelson commented Jan 20, 2026

Uh oh!

OliverNChalk commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apfitzge left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Dec 4, 2025 •

edited

Loading

OliverNChalk Dec 5, 2025 •

edited

Loading

OliverNChalk commented Jan 7, 2026 •

edited

Loading

OliverNChalk commented Jan 20, 2026 •

edited

Loading