perf: replace O(n²) queue.remove() with O(n) single-pass partition in chunk scheduler#1613
perf: replace O(n²) queue.remove() with O(n) single-pass partition in chunk scheduler#1613dubin555 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e0006a3d90
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| continue | ||
| queue.remove(request) | ||
| waiting_for_chunk_list.append(request) | ||
| queue[:] = keep |
There was a problem hiding this comment.
Avoid list-slice assignment on scheduler waiting queue
Replacing removals with queue[:] = keep assumes queue implements mutable slice assignment, but _process_chunk_queue() is called from process_pending_chunks() with the scheduler's self.waiting queue object (created via vLLM request-queue abstractions), not just plain lists. Those queue types expose methods like pop_request/remove_requests and may not support __setitem__ slicing, so async-chunk scheduling can fail at runtime with a type error as soon as this path runs.
Useful? React with 👍 / 👎.
| @@ -0,0 +1,373 @@ | |||
| """Standalone test for the O(n^2) -> O(n) fix in _process_chunk_queue(). | |||
There was a problem hiding this comment.
is this file added to the proper folder?
vllm-omni-reviewer Code ReviewNo critical issues found.
|
|
Thanks for the review! Regarding the test file location — you're right, I'll move it to the appropriate test directory. Could you point me to the preferred test folder for scheduler-related tests? Regarding |
e0006a3 to
14c6468
Compare
… chunk scheduler Replace repeated queue.remove() calls with a single-pass partition that separates completed items from remaining items in O(n) time. The original approach called list.remove() for each completed item, which is O(n) per removal, making the total O(n²) for k completions.
14c6468 to
8c16b3d
Compare
|
Updated — moved the test file from repo root to |
|
@Shirley125 PTAL |
I think the test files should be added to the simple unit test pipeline |
|
Good point! Added |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Code Review Report
📋 Summary
| Item | Details |
|---|---|
| PR | perf: replace O(n²) queue.remove() with O(n) single-pass partition in chunk scheduler |
| Author | @dubin555 |
| Files Changed | chunk_transfer_adapter.py, test_fix_on2_queue_remove.py |
| Changes | +385 / -3 |
🎯 Purpose
Optimize _process_chunk_queue() in chunk_transfer_adapter.py from O(n²) to O(n) complexity.
Original Problem:
- The code snapshots the queue, iterates over it, and calls
queue.remove(request)for each item to be moved - Since
deque.remove()is O(n) (linear scan), and it's called inside an O(n) loop → O(n²) total - This becomes a scheduling bottleneck at high concurrency
Fix Approach:
- Build a
keeplist of requests that should stay in the queue - Replace queue contents with
queue[:] = keepat the end — single O(n) operation - Same semantics, same element ordering, no API changes
📊 Performance Benchmarks (Worst Case: 50% stay, 50% removed)
| N | Original | Fixed | Speedup |
|---|---|---|---|
| 1,000 | 6.9ms | 0.18ms | 37.8x |
| 5,000 | 183ms | 1.0ms | 176.9x |
| 10,000 | 719ms | 1.9ms | 368.6x |
| 50,000 | 18.0s | 10.7ms | 1,678x |
The original shows clear O(n²) growth (doubling N ≈ 4× time), while the fix shows O(n) growth (doubling N ≈ 2× time).
🔍 Code Changes
# Before (O(n²)):
queue_snapshot = list(queue)
for request in queue_snapshot:
# ... logic ...
queue.remove(request) # O(n) per call
waiting_for_chunk_list.append(request)
# After (O(n)):
keep: list[Any] = []
for request in list(queue):
# ... logic ...
if should_stay:
keep.append(request)
else:
waiting_for_chunk_list.append(request)
queue[:] = keep # Single O(n) replacement✅ Test Coverage
Comprehensive test suite (test_fix_on2_queue_remove.py) with:
| Test | Coverage |
|---|---|
test_all_new_requests_moved |
All WAITING requests moved |
test_ready_chunks_stay_in_queue |
Ready-chunk requests stay |
test_finished_requests_stay_in_queue |
Finished requests stay |
test_waiting_for_chunk_with_finished_load |
Load-finished requests get target_status |
test_mixed_scenario |
All code paths exercised |
test_with_plain_list_queue |
Works with plain list (not just deque) |
test_empty_queue |
Edge case: empty queue |
test_order_preservation |
Element ordering preserved |
test_original_vs_fixed_equivalence |
100 randomized equivalence trials |
test_performance_improvement |
Regression test for O(n) scaling |
All tests pass with pytest -m 'core_model and cpu'.
🔍 Review Findings
✅ Strengths
- Solid algorithmic improvement — Clear O(n²) → O(n) optimization
- Excellent benchmark data — Real numbers showing massive speedup at scale
- Comprehensive test coverage — 100 randomized equivalence trials ensure behavioral parity
- Minimal code change — Only 8 lines modified in production code
- Good documentation — PR description clearly explains the problem and solution
- CI markers added —
pytestmark = [pytest.mark.core_model, pytest.mark.cpu]for proper pipeline inclusion
⚠️ Minor Observations
- Type annotation:
keep: list[Any]could be more specific if request type is known - Test file location: Test file is standalone and doesn't require vllm to be installed — good for isolation but consider if it should be integrated with existing test infrastructure
📝 Verdict
| Rating | Notes |
|---|---|
| APPROVE ✅ | Excellent performance optimization with thorough testing |
Rationale:
- Clear algorithmic improvement with measurable impact
- Comprehensive test coverage proving behavioral equivalence
- Minimal, well-documented code changes
- Real-world benchmarks demonstrate significant value at scale
Kudos: The 100 randomized equivalence trials are particularly thorough — this is exactly the kind of testing needed for algorithm changes. 👏
Reviewed by: vllm-omni-reviewer MCP tool 🤖
|
Thanks for you work! But please check chatgpt bot's review --- Avoid list-slice assignment on scheduler waiting queue. After testing this PR, I found that using slicing on the waiting queue indeed raises an error. |
…hunk scheduler Collect requests to remove in a single pass, then call queue.remove_requests() once instead of calling queue.remove() per item inside a loop. Uses the queue's native remove_requests() method which is compatible with vLLM's request queue abstractions (not plain lists).
922529c to
469aa18
Compare
|
Thanks @Shirley125 — good catch. You're right that Updated the approach: instead of slice assignment, I now collect requests to remove in a single pass, then call |
|
For context on why def remove_requests(self, requests):
requests_to_remove = set(requests) # O(k)
filtered = [req for req in self if req not in requests_to_remove] # O(n)
self.clear()
self.extend(filtered)So collecting items to remove in a single pass and calling |
| finished_load_reqs.remove(request.request_id) | ||
| self.requests_with_ready_chunks.add(request.request_id) | ||
| continue | ||
| queue.remove(request) |
There was a problem hiding this comment.
Can we try using pop_request?
Thanks, your solution works. However, there is a potential issue here. If |
| to_remove.append(request) | ||
| waiting_for_chunk_list.append(request) | ||
| if to_remove: | ||
| queue.remove_requests(to_remove) |
There was a problem hiding this comment.
remove_requests() is not a standard list method. Does every queue type passed here implement it? The old queue.remove() worked on any list-like.
|
Closing this PR — the |
Purpose
Replace O(n²)
queue.remove()pattern with O(n) single-pass partition in_process_chunk_queue().In
chunk_transfer_adapter.py, the current code snapshots the queue, iterates over the snapshot, and for each request that needs to be moved to the "waiting for chunk" list, callsqueue.remove(request). Sincedeque.remove()is O(n) (linear scan to find the element), and this is called inside an O(n) loop, the total cost is O(n²) for n pending requests.At high concurrency this becomes a scheduling bottleneck. The fix builds a
keeplist of requests that should stay in the queue, then replaces the queue contents withqueue[:] = keepat the end — a single O(n) operation. Same semantics, same element ordering, no API changes.Benchmark (worst case: 50% stay, 50% removed):
The original shows clear O(n²) growth (doubling N ≈ 4× the time), while the fix shows O(n) growth (doubling N ≈ 2× the time).
Test Plan
Standalone test suite (
test_fix_on2_queue_remove.py) covering:Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.