[Core][Scheduler] Fix FCFS queue ordering for skipped waiting requests#33522
[Core][Scheduler] Fix FCFS queue ordering for skipped waiting requests#33522harsh543 wants to merge 3 commits intovllm-project:mainfrom
Conversation
Replace prepend_request() with add_request() when returning skipped requests to the waiting queue. Previously, skipped requests were inserted at the front of the queue, causing them to jump ahead of other waiting requests and violating FCFS ordering. With this fix, skipped requests go to the tail of the queue, preserving arrival order and reducing tail latency under resource pressure. Fixes vllm-project#27441 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
Related Documentation No published documentation to review for changes on this repository. |
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a First-Come, First-Served (FCFS) ordering violation for skipped requests in the scheduler. By replacing prepend_request with add_request, skipped requests are now correctly appended to the tail of the waiting queue, preserving their order. Additionally, the change to handle KV cache allocation failures by continuing to the next request instead of breaking the scheduling loop is a significant improvement that mitigates head-of-line blocking. The accompanying test updates and the new fairness test correctly validate these important fixes. The changes are well-implemented and improve the scheduler's correctness and performance under resource pressure.
|
I actually think that the current implementation is the correct way to ensure FCFS. |
|
@orozery Thanks for the review! After further analysis, I realize you're correct about the algorithm. Corrected analysis of prepend+prepend: The two reversals (prepend during collection + extendleft during reinsertion) cancel out and preserve FCFS order. My original worked example missed the The real question: If the current prepend+prepend implementation is algorithmically correct, what's causing the shuffling shown in the issue author's visualization? That's what needs investigation. Apologies for the initial confusion! |
|
Follow-up clarification: @orozery You're conceptually correct that prepending skipped requests back to the head is the right approach for FCFS — I initially missed this. The two prepend operations (collection + reinsertion) should create two reversals that cancel out. However, @CennyMo identified the root cause in the issue thread: # PR #14002 changed the behavior:
# BEFORE:
self._deque.extendleft(requests) # extendleft naturally reverses
# AFTER (or at some point):
self._deque.extendleft(reversed(requests)) # Double reversal!The current The issue author's visualization shows shuffling occurring in practice, which suggests either:
The Would you prefer we:
Happy to adjust based on your guidance! |
|
I actually fixed the reversing bug in #32173 and added a unit test which verifies it. |
|
Reviewed the relevant shutdown/structured-output paths in the repo (EngineCore + scheduler/StructuredOutputManager) and didn't find any remaining issues or conflicting comments. I've tested as much as possible—can we move this forward for now, @njhill? |
Purpose
Fix FCFS (First-Come-First-Served) queue ordering violation in the v1 scheduler where skipped waiting requests jump ahead of other requests, causing increased tail latency under resource pressure.
Motivation: The vLLM v1 scheduler maintains a waiting queue following FCFS policy. When requests cannot be immediately scheduled (due to resource constraints), they should maintain their relative arrival order. However, the current implementation violates this by prepending skipped requests to the front of the queue.
Without this fix:
Example of the bug:
With this fix:
Root Cause
The
schedule()function usesprepend_request()(front insertion) instead ofadd_request()(tail insertion) when returning skipped requests to the waiting queue:This affects all skip scenarios:
WAITING_FOR_REMOTE_KVS)WAITING_FOR_FSM)WAITING_FOR_STREAMING_REQ)Changes
Replace all
prepend_request()calls withadd_request()for skipped waiting requests:Files changed:
vllm/v1/core/sched/scheduler.py: 7 call sites updated inschedule()tests/v1/core/test_scheduler.py: Test updated to verify correct FCFS behaviorNote:
_preempt_request()intentionally keepsprepend_request()— preempted requests were already running and interrupted, so they should get priority to prevent starvation. This is semantically different from "skipped waiting" requests.Backward Compatibility
Fully backward compatible:
Test Plan
Test Result
Test
test_skipped_requests_fcfs_orderverifies:[req_0, req_1, req_2, req_3]WAITING_FOR_REMOTE_KVS(will be skipped)max_num_seqs=1(onlyreq_2can run)[req_3, req_0, req_1]req_3: stayed in waiting (never processed, capacity full)req_0: skipped → added to tailreq_1: skipped → added to tailBefore fix: Test expected
[req_0, req_1, req_3](front insertion)After fix: Test expects
[req_3, req_0, req_1](tail insertion) ✅Impact
Fixes #27441
🤖 Generated with Claude Code