fix: mitigate head-of-line blocking in QUIC stream handling by stablebits · Pull Request #10937 · anza-xyz/agave

stablebits · 2026-03-03T10:37:46Z

Note: I'll do more measurements, but making it available for general feedback and review in the meantime.

Replace sequential stream processing in handle_connection() with concurrent polling of multiple streams within a single tokio task.

Problem

See https://docs.google.com/document/d/1nS7gsPqHG-2q9_rVgkChCIeZzByB2hBTbR8Tsc5inUk/edit?tab=t.0#heading=h.s49mngh00e8b

Streams are processed one at a time: accept → read all chunks → accept next. When packet loss causes a stream's read to block waiting on missing data, all subsequent streams are stalled — reintroducing HTTP/2-style HOL blocking that QUIC was designed to eliminate.

Moreover, in Quinn (and QUIC doesn't mandate any specific retransmission strategy) retransmissions are stream-local and scheduled in a round-robin fashion along with regular data from other streams. With lots of pending streams on the client side, it may take quite some time (multiple RTTs, depending on the number of pending streams and current CWND) for lost data to be delivered.

Summary of Changes

Replace the sequential one-stream-at-a-time loop with a multi-stream polling mechanism that processes up to 8 (configurable) concurrent streams per connection. Default 8 streams parallelism should hopefully be sufficient to resolve HoL issues for most practical use cases (i.e. not severe packet loss or reordering affecting many streams).

Comparing to a possible alternative using UnorderedFutures:

Current code does a fixed scan of at most 8 slots. FuturesUnordered avoids scanning but adds queue bookkeeping overhead.
Current code is virtually allocation-free after connection setup. FuturesUnordered usually allocates per pushed future/task node.
FuturesUnordered is better if fairness/ready-queue behavior is a hard requirement. In our case (short streams, send_fairness disabled on sender), this is arguably not required.
If we later want to raise concurrent streams per connection to dozens/hundreds, FuturesUnordered might become more compelling.

At MAX_STREAMS_PER_CONNECTION = 8 and short stream sizes (even with 4K transactions later), the current approach is likely a bit more efficient in the hot path. To be measure though.

codecov-commenter · 2026-03-03T11:11:16Z

Codecov Report

❌ Patch coverage is 87.02703% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.1%. Comparing base (578e52e) to head (851776c).

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #10937    +/-   ##
========================================
  Coverage    83.0%    83.1%            
========================================
  Files         839      839            
  Lines      317370   317498   +128     
========================================
+ Hits       263721   263843   +122     
- Misses      53649    53655     +6

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alexpyattaev · 2026-03-03T14:09:15Z

FYI no point tagging us individually, networking gets tagged automagically by github.

gregcusack · 2026-03-05T19:33:04Z

FYI no point tagging us individually, networking gets tagged automagically by github.

personally i like getting tagged individually. so i know that i need to look at it instead of just anyone on network team

Replace the sequential one-stream-at-a-time loop with a multi-stream polling mechanism that processes up to 8 (configurable) concurrent streams per connection. Default 8 streams parallelism should hopefully be sufficient to resolve HoL issues for most practical use cases (i.e. not severe packet loss or reordering affecting many streams).

alexpyattaev · 2026-03-06T07:41:01Z

Scale of the problem in current master:

We are delaying some stream's reassembly by 160ms on average

Here by 280 ms on average

alexpyattaev · 2026-03-06T07:44:55Z

With this PR it actually somehow got worse though:

Could be that we are not dropping as many blocked streams and reassembling them eventually instead? We get way more delayed streams than before... Will need more data.

stablebits · 2026-03-06T08:43:54Z

@alexpyattaev

Could be that we are not dropping as many blocked streams and reassembling them eventually instead? We get way more delayed streams than before... Will need more data.

Timed-out streams aren't accounted in either case. The current master should have done something like this:

diff --git a/streamer/src/nonblocking/quic.rs b/streamer/src/nonblocking/quic.rs
index 7da1162ac3..12aa9c5631 100644
--- a/streamer/src/nonblocking/quic.rs
+++ b/streamer/src/nonblocking/quic.rs
@@ -657,6 +657,16 @@ async fn handle_connection<Q, C>(
                     stats
                         .total_stream_read_timeouts
                         .fetch_add(1, Ordering::Relaxed);
+
+                    let total_latency = accum.start_time.elapsed();
+                    debug!("Stream reassembly cancelled {}", total_latency.as_millis());
+                    stats
+                        .reassembly_delayed_streams
+                        .fetch_add(1, Ordering::Relaxed);
+                    stats
+                        .reassembly_delayed_streams_cumulative_delay_us
+                        .fetch_add(total_latency.as_micros() as usize, Ordering::Relaxed);
+
                     break;
                 }
             };

One more interesting question is whether 2s timeouts may cause some clients to reset their connections (but not sure yet whether this could have any impact on our new metrics).

Yet another possibility is that we encountered an outlier sample or (perhaps more likely) that there is a bug in this implementation.

Can we enable these messages:

--- a/streamer/src/nonblocking/quic.rs
+++ b/streamer/src/nonblocking/quic.rs
@@ -796,7 +796,7 @@ async fn handle_connection<Q, C>(
                 let now = Instant::now();
                 for (i, slot) in slots.iter_mut().enumerate() {
                     if slot.is_active() && now >= slot.deadline {
-                        debug!("Timeout in receiving on stream slot {i}");
+                        info!("Timeout in receiving on stream slot {i}");
                         slot.deactivate();
                         active_count -= 1;
                         stats.active_streams.fetch_sub(1, Ordering::Relaxed);
@@ -946,7 +946,7 @@ fn handle_chunks(
     let packet_size = packet.meta().size;
     let total_latency = accum.start_time.elapsed();
     if total_latency > rtt.mul_f32(LATE_REASSEMBLY_THRESHOLD) {
-        debug!("Stream reassembly dealyed {}", total_latency.as_millis());
+        info!("Stream reassembly dealyed {}", total_latency.as_millis());
         stats
             .reassembly_delayed_streams
             .fetch_add(1, Ordering::Relaxed);

and collect logs?

Updated:

We have stream_read_timeouts in both cases and the max counts are low in all cases (<10).

stablebits · 2026-03-06T08:53:28Z

But another aspect is that the new 'reassembly' metrics can't show the positive effect of HoL. Stream reassembly time won't improve. We'd also need to somehow measure the time during which accept_uni() couldn't be called or even better, how long already pending streams in quinn could not have been accepted.

Let's get back to this topic after new SWQoS. Converted to Draft status.

Updated:

Just to complete data analysis (possible root cause).

So the most likely explanation is that there are cases when multiple (likely sequential packets) are lost. Let's consider this scenario, and even if all streams are single packet.

Packets: 1, 2, 3, 4, 5 with each packet hosting a single stream.

Packet 1 arrived
accept_uni() returns stream_id=1 -> the highest_accepted_id that we've seen is 1.
Packets 2 and 3 are lost.
Packet 4 arrives with stream_id=4

Now, the way quinn (and QUIC in general) works is that the arrival of stream_id=N will trigger the opening of all streams from highest_accepted_id up and including N.

accept_uni() returns stream_id=2 -> we add it to the pending list and start measuring delay time.
accept_uni() returns stream_id=3 -> same as above; now we have 2 pending streams
accept_uni() returns stream_id=4 -> data is available
...

When data lost in packets 2 and 3 is recovered, the new code records delayed time for 2 streams (IDs 2 and 3).

The current master code may record it for stream_id=2 only. Because if data for both streams is recovered at the same time (likely), it will accept_uni() stream_id=3 only after this recovery and its data will already be available.

stablebits requested review from KirillLykov, alessandrod, alexpyattaev and gregcusack March 3, 2026 10:37

gregcusack assigned gregcusack and unassigned gregcusack Mar 5, 2026

stablebits force-pushed the hol branch from 3a315a8 to 71ecd2b Compare March 6, 2026 05:58

stablebits force-pushed the hol branch from 71ecd2b to 851776c Compare March 6, 2026 06:33

stablebits marked this pull request as draft March 6, 2026 09:08

alexpyattaev mentioned this pull request Apr 13, 2026

TPU QUIC Connection Throttling #8863

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: mitigate head-of-line blocking in QUIC stream handling#10937

fix: mitigate head-of-line blocking in QUIC stream handling#10937
stablebits wants to merge 1 commit intoanza-xyz:masterfrom
stablebits:hol

stablebits commented Mar 3, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 3, 2026 •

edited

Loading

Uh oh!

alexpyattaev commented Mar 3, 2026

Uh oh!

gregcusack commented Mar 5, 2026

Uh oh!

alexpyattaev commented Mar 6, 2026

Uh oh!

alexpyattaev commented Mar 6, 2026

Uh oh!

stablebits commented Mar 6, 2026 •

edited

Loading

Uh oh!

stablebits commented Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

stablebits commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

codecov-commenter commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alexpyattaev commented Mar 3, 2026

Uh oh!

gregcusack commented Mar 5, 2026

Uh oh!

alexpyattaev commented Mar 6, 2026

Uh oh!

alexpyattaev commented Mar 6, 2026

Uh oh!

stablebits commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stablebits commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stablebits commented Mar 3, 2026 •

edited

Loading

codecov-commenter commented Mar 3, 2026 •

edited

Loading

stablebits commented Mar 6, 2026 •

edited

Loading

stablebits commented Mar 6, 2026 •

edited

Loading