[Core][Scheduler] Fix FCFS queue ordering for skipped waiting requests by harsh543 · Pull Request #33522 · vllm-project/vllm

harsh543 · 2026-02-01T20:53:11Z

Purpose

Fix FCFS (First-Come-First-Served) queue ordering violation in the v1 scheduler where skipped waiting requests jump ahead of other requests, causing increased tail latency under resource pressure.

Motivation: The vLLM v1 scheduler maintains a waiting queue following FCFS policy. When requests cannot be immediately scheduled (due to resource constraints), they should maintain their relative arrival order. However, the current implementation violates this by prepending skipped requests to the front of the queue.

Without this fix:

Requests that arrive earlier get pushed backward when later requests are skipped
Under sustained load with resource pressure, some requests experience unbounded delays
Tail latency (p99/p100) increases significantly as queue ordering becomes chaotic
Fairness guarantees are violated — a request's wait time depends on unrelated scheduling decisions

Example of the bug:

Initial waiting: [Req1, Req2, Req3, Req4]  (Req1 arrived first)
                      ↓ (Req2 can't schedule due to KV blocks)
After scheduling: [Req2, Req1, Req3, Req4]  ❌ Req2 jumped ahead of Req1!

With sustained pressure, Req1 keeps getting pushed back indefinitely.

With this fix:

Initial waiting: [Req1, Req2, Req3, Req4]
                      ↓ (Req2 can't schedule)
After scheduling: [Req1, Req3, Req4, Req2]  ✅ Order preserved, Req2 goes to tail

Scenario	Before (Bug)	After (Fix)
Skipped request placement	Front of queue	Tail of queue
FCFS ordering	Violated under pressure	Preserved
Tail latency (p99/p100)	Increased unpredictably	Stable/reduced
Request starvation risk	Possible	Eliminated

Root Cause

The schedule() function uses prepend_request() (front insertion) instead of add_request() (tail insertion) when returning skipped requests to the waiting queue:

# WRONG: Inserts at FRONT, violating FCFS
skipped_waiting_requests.prepend_request(request)
...
self.waiting.prepend_requests(skipped_waiting_requests)

This affects all skip scenarios:

Waiting for remote KV blocks (WAITING_FOR_REMOTE_KVS)
Waiting for FSM compilation (WAITING_FOR_FSM)
Waiting for streaming input (WAITING_FOR_STREAMING_REQ)
Max LoRA limit exceeded
KV cache allocation failure
Async KV loading

Changes

Replace all prepend_request() calls with add_request() for skipped waiting requests:

# CORRECT: Appends to TAIL, preserving FCFS
skipped_waiting_requests.add_request(request)
...
for request in skipped_waiting_requests:
    self.waiting.add_request(request)

Files changed:

vllm/v1/core/sched/scheduler.py: 7 call sites updated in schedule()
tests/v1/core/test_scheduler.py: Test updated to verify correct FCFS behavior

Note: _preempt_request() intentionally keeps prepend_request() — preempted requests were already running and interrupted, so they should get priority to prevent starvation. This is semantically different from "skipped waiting" requests.

Backward Compatibility

Fully backward compatible:

No API changes
No configuration changes
Only affects internal queue ordering
All requests still get scheduled eventually
Existing behavior preserved for preempted requests

Test Plan

# Run the updated FCFS ordering test
pytest tests/v1/core/test_scheduler.py::test_skipped_requests_fcfs_order -v

# Run full scheduler test suite
pytest tests/v1/core/test_scheduler.py -v

Test Result

Test test_skipped_requests_fcfs_order verifies:

Create 4 requests: [req_0, req_1, req_2, req_3]
Set first 2 to WAITING_FOR_REMOTE_KVS (will be skipped)
Schedule with max_num_seqs=1 (only req_2 can run)
Verify waiting queue order: [req_3, req_0, req_1]
- req_3: stayed in waiting (never processed, capacity full)
- req_0: skipped → added to tail
- req_1: skipped → added to tail

Before fix: Test expected [req_0, req_1, req_3] (front insertion)
After fix: Test expects [req_3, req_0, req_1] (tail insertion) ✅

Impact

Metric	Expected Change
p50 latency	No change
p99 latency	Improved (less queue reordering)
p100 latency	Improved (no request starvation)
Throughput	No change
Memory usage	No change

Fixes #27441

🤖 Generated with Claude Code

Replace prepend_request() with add_request() when returning skipped requests to the waiting queue. Previously, skipped requests were inserted at the front of the queue, causing them to jump ahead of other waiting requests and violating FCFS ordering. With this fix, skipped requests go to the tail of the queue, preserving arrival order and reducing tail latency under resource pressure. Fixes vllm-project#27441 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-02-01T20:53:21Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

dosubot · 2026-02-01T20:53:21Z

Related Documentation

No published documentation to review for changes on this repository.

Write your first living document

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

This pull request effectively addresses a First-Come, First-Served (FCFS) ordering violation for skipped requests in the scheduler. By replacing prepend_request with add_request, skipped requests are now correctly appended to the tail of the waiting queue, preserving their order. Additionally, the change to handle KV cache allocation failures by continuing to the next request instead of breaking the scheduling loop is a significant improvement that mitigates head-of-line blocking. The accompanying test updates and the new fairness test correctly validate these important fixes. The changes are well-implemented and improve the scheduler's correctness and performance under resource pressure.

orozery · 2026-02-02T07:51:38Z

I actually think that the current implementation is the correct way to ensure FCFS.
The scheduler processes waiting requests serially.
Every waiting request that is skipped should be returned to its original position in the self.waiting list.
This is achieved by prepending the list of skipped requests back to the head (position 0) of the waiting list.

harsh543 · 2026-02-02T17:12:39Z

~~Original analysis below was incorrect — see my correction in the follow-up comments.~~

@orozery Thanks for the review! After further analysis, I realize you're correct about the algorithm.

Corrected analysis of prepend+prepend:

Initial: [Req1, Req2, Req3, Req4]

1. Skip Req1 → skipped.prepend_request(Req1) → skipped = [Req1]
2. Skip Req2 → skipped.prepend_request(Req2) → skipped = [Req2, Req1]  ← First reversal
3. Req3 scheduled
4. waiting = [Req4]
5. waiting.prepend_requests(skipped) → extendleft([Req2, Req1])
   - appendleft(Req2) → [Req2, Req4]
   - appendleft(Req1) → [Req1, Req2, Req4]  ← Second reversal cancels first!

Result: [Req1, Req2, Req4] ✅ Correct FCFS order!

The two reversals (prepend during collection + extendleft during reinsertion) cancel out and preserve FCFS order. My original worked example missed the extendleft reversal step.

The real question: If the current prepend+prepend implementation is algorithmically correct, what's causing the shuffling shown in the issue author's visualization? That's what needs investigation.

Apologies for the initial confusion!

harsh543 · 2026-02-02T17:26:53Z

Follow-up clarification:

@orozery You're conceptually correct that prepending skipped requests back to the head is the right approach for FCFS — I initially missed this. The two prepend operations (collection + reinsertion) should create two reversals that cancel out.

However, @CennyMo identified the root cause in the issue thread:

# PR #14002 changed the behavior:
# BEFORE:
self._deque.extendleft(requests)  # extendleft naturally reverses

# AFTER (or at some point):
self._deque.extendleft(reversed(requests))  # Double reversal!

The current prepend_requests implementation uses extendleft(requests) without reversed(), but if prepend_request is building the skipped list in reversed order AND extendleft also reverses, the two should cancel out.

The issue author's visualization shows shuffling occurring in practice, which suggests either:

A mismatch in the reversal logic at some point in the code path
An edge case where the cancellation doesn't work as expected

The add_request() approach in this PR sidesteps the reversal complexity entirely — it's more explicit and less error-prone, even if it changes the semantic to tail placement (which also helps with HOL blocking under KV pressure).

Would you prefer we:

Keep the PR as-is (tail placement, avoids reversal complexity, prevents HOL blocking)
Fix the reversal bug directly while keeping head placement (if that's the preferred FCFS semantic)

Happy to adjust based on your guidance!

orozery · 2026-02-02T17:46:13Z

I actually fixed the reversing bug in #32173 and added a unit test which verifies it.

njhill · 2026-02-02T23:17:03Z

It sounds like the problem is just that we didn't close #27441 after it was fixed by #32173?

If so let's close both #27441 and this PR.

harsh543 · 2026-03-02T01:30:11Z

Reviewed the relevant shutdown/structured-output paths in the repo (EngineCore + scheduler/StructuredOutputManager) and didn't find any remaining issues or conflicting comments. I've tested as much as possible—can we move this forward for now, @njhill?

harsh543 and others added 3 commits January 31, 2026 19:43

v1 scheduler: skip KV-blocked waiting requests

79fa1e6

tests: avoid allocate_slots mock in KV fairness test

ffe9a25

harsh543 requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners February 1, 2026 20:53

mergify bot added the v1 label Feb 1, 2026

gemini-code-assist bot reviewed Feb 1, 2026

View reviewed changes

harsh543 marked this pull request as draft February 4, 2026 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core][Scheduler] Fix FCFS queue ordering for skipped waiting requests#33522

[Core][Scheduler] Fix FCFS queue ordering for skipped waiting requests#33522
harsh543 wants to merge 3 commits intovllm-project:mainfrom
harsh543:fix-scheduler-queue-order

harsh543 commented Feb 1, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

dosubot bot commented Feb 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

orozery commented Feb 2, 2026

Uh oh!

harsh543 commented Feb 2, 2026 •

edited

Loading

Uh oh!

harsh543 commented Feb 2, 2026

Uh oh!

orozery commented Feb 2, 2026

Uh oh!

njhill commented Feb 2, 2026

Uh oh!

harsh543 commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

harsh543 commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Root Cause

Changes

Backward Compatibility

Test Plan

Test Result

Impact

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

dosubot bot commented Feb 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

orozery commented Feb 2, 2026

Uh oh!

harsh543 commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harsh543 commented Feb 2, 2026

Uh oh!

orozery commented Feb 2, 2026

Uh oh!

njhill commented Feb 2, 2026

Uh oh!

harsh543 commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harsh543 commented Feb 1, 2026 •

edited

Loading

harsh543 commented Feb 2, 2026 •

edited

Loading