fix: mitigate head-of-line blocking in QUIC stream handling#10937
fix: mitigate head-of-line blocking in QUIC stream handling#10937stablebits wants to merge 1 commit intoanza-xyz:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10937 +/- ##
========================================
Coverage 83.0% 83.1%
========================================
Files 839 839
Lines 317370 317498 +128
========================================
+ Hits 263721 263843 +122
- Misses 53649 53655 +6 🚀 New features to boost your workflow:
|
|
FYI no point tagging us individually, networking gets tagged automagically by github. |
personally i like getting tagged individually. so i know that i need to look at it instead of just anyone on network team |
Replace the sequential one-stream-at-a-time loop with a multi-stream polling mechanism that processes up to 8 (configurable) concurrent streams per connection. Default 8 streams parallelism should hopefully be sufficient to resolve HoL issues for most practical use cases (i.e. not severe packet loss or reordering affecting many streams).
|
But another aspect is that the new 'reassembly' metrics can't show the positive effect of HoL. Stream reassembly time won't improve. We'd also need to somehow measure the time during which Let's get back to this topic after new SWQoS. Converted to Draft status. Updated: Just to complete data analysis (possible root cause). So the most likely explanation is that there are cases when multiple (likely sequential packets) are lost. Let's consider this scenario, and even if all streams are single packet. Packets: 1, 2, 3, 4, 5 with each packet hosting a single stream.
Now, the way quinn (and QUIC in general) works is that the arrival of stream_id=N will trigger the opening of all streams from highest_accepted_id up and including N.
When data lost in packets 2 and 3 is recovered, the new code records delayed time for 2 streams (IDs 2 and 3). The current master code may record it for stream_id=2 only. Because if data for both streams is recovered at the same time (likely), it will accept_uni() stream_id=3 only after this recovery and its data will already be available. |




Note: I'll do more measurements, but making it available for general feedback and review in the meantime.
Replace sequential stream processing in
handle_connection()with concurrent polling of multiple streams within a single tokio task.Problem
See https://docs.google.com/document/d/1nS7gsPqHG-2q9_rVgkChCIeZzByB2hBTbR8Tsc5inUk/edit?tab=t.0#heading=h.s49mngh00e8b
Streams are processed one at a time: accept → read all chunks → accept next. When packet loss causes a stream's read to block waiting on missing data, all subsequent streams are stalled — reintroducing HTTP/2-style HOL blocking that QUIC was designed to eliminate.
Moreover, in Quinn (and QUIC doesn't mandate any specific retransmission strategy) retransmissions are stream-local and scheduled in a round-robin fashion along with regular data from other streams. With lots of pending streams on the client side, it may take quite some time (multiple RTTs, depending on the number of pending streams and current CWND) for lost data to be delivered.
Summary of Changes
Replace the sequential one-stream-at-a-time loop with a multi-stream polling mechanism that processes up to 8 (configurable) concurrent streams per connection. Default 8 streams parallelism should hopefully be sufficient to resolve HoL issues for most practical use cases (i.e. not severe packet loss or reordering affecting many streams).
Comparing to a possible alternative using
UnorderedFutures:Current code does a fixed scan of at most 8 slots.
FuturesUnorderedavoids scanning but adds queue bookkeeping overhead.Current code is virtually allocation-free after connection setup.
FuturesUnorderedusually allocates per pushed future/task node.FuturesUnorderedis better if fairness/ready-queue behavior is a hard requirement. In our case (short streams, send_fairness disabled on sender), this is arguably not required.If we later want to raise concurrent streams per connection to dozens/hundreds,
FuturesUnorderedmight become more compelling.At
MAX_STREAMS_PER_CONNECTION = 8and short stream sizes (even with 4K transactions later), the current approach is likely a bit more efficient in the hot path. To be measure though.