Skip to content

perf: replace O(n²) queue.remove() with O(n) single-pass partition in chunk scheduler#1613

Closed
dubin555 wants to merge 2 commits into
vllm-project:mainfrom
dubin555:oss-scout/verify-fix-on2-queue-remove
Closed

perf: replace O(n²) queue.remove() with O(n) single-pass partition in chunk scheduler#1613
dubin555 wants to merge 2 commits into
vllm-project:mainfrom
dubin555:oss-scout/verify-fix-on2-queue-remove

Conversation

@dubin555
Copy link
Copy Markdown
Contributor

@dubin555 dubin555 commented Mar 2, 2026

Purpose

Replace O(n²) queue.remove() pattern with O(n) single-pass partition in _process_chunk_queue().

In chunk_transfer_adapter.py, the current code snapshots the queue, iterates over the snapshot, and for each request that needs to be moved to the "waiting for chunk" list, calls queue.remove(request). Since deque.remove() is O(n) (linear scan to find the element), and this is called inside an O(n) loop, the total cost is O(n²) for n pending requests.

At high concurrency this becomes a scheduling bottleneck. The fix builds a keep list of requests that should stay in the queue, then replaces the queue contents with queue[:] = keep at the end — a single O(n) operation. Same semantics, same element ordering, no API changes.

Benchmark (worst case: 50% stay, 50% removed):

N Original Fixed Speedup
1,000 6.9ms 0.18ms 37.8x
5,000 183ms 1.0ms 176.9x
10,000 719ms 1.9ms 368.6x
50,000 18.0s 10.7ms 1,678x

The original shows clear O(n²) growth (doubling N ≈ 4× the time), while the fix shows O(n) growth (doubling N ≈ 2× the time).

Test Plan

Standalone test suite (test_fix_on2_queue_remove.py) covering:

  • All-new requests moved to waiting list
  • Ready-chunk requests stay in queue
  • Finished requests stay in queue
  • Waiting-for-chunk with finished load
  • Mixed scenario with all code paths
  • Plain list queue (not just deque)
  • Empty queue
  • Order preservation
  • 100 randomized equivalence trials (original vs fixed output must be identical)
  • Performance regression test confirming O(n) scaling
python3 test_fix_on2_queue_remove.py

Test Result

Testing _process_chunk_queue O(n^2) -> O(n) fix
============================================================
  PASS: test_all_new_requests_moved
  PASS: test_ready_chunks_stay_in_queue
  PASS: test_finished_requests_stay_in_queue
  PASS: test_waiting_for_chunk_with_finished_load
  PASS: test_mixed_scenario
  PASS: test_with_plain_list_queue
  PASS: test_empty_queue
  PASS: test_original_vs_fixed_equivalence (100 random trials)
  PASS: test_performance_improvement
        N=5000: original=194678µs, fixed=1438µs, speedup=135.4x
  PASS: test_order_preservation
============================================================
ALL TESTS PASSED

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

@dubin555 dubin555 requested a review from hsliuustc0106 as a code owner March 2, 2026 13:55
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e0006a3d90

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

continue
queue.remove(request)
waiting_for_chunk_list.append(request)
queue[:] = keep
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid list-slice assignment on scheduler waiting queue

Replacing removals with queue[:] = keep assumes queue implements mutable slice assignment, but _process_chunk_queue() is called from process_pending_chunks() with the scheduler's self.waiting queue object (created via vLLM request-queue abstractions), not just plain lists. Those queue types expose methods like pop_request/remove_requests and may not support __setitem__ slicing, so async-chunk scheduling can fail at runtime with a type error as soon as this path runs.

Useful? React with 👍 / 👎.

@@ -0,0 +1,373 @@
"""Standalone test for the O(n^2) -> O(n) fix in _process_chunk_queue().
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file added to the proper folder?

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

vllm-omni-reviewer Code Review

No critical issues found.
This is a well-executed performance PR:

  • Comprehensive tests with correctness and performance verification
  • Clear benchmark showing >5x speedup at N=5000
  • Verified equivalence between original and fixed implementations (100 random trials)
  • Minimal, focused change targeting the specific O(n^2) bottleneck
    The implementation correctly transforms the O(n^2) queue.remove() pattern into an O(n) single-pass partition while preserving identical behavior.

@dubin555
Copy link
Copy Markdown
Contributor Author

dubin555 commented Mar 2, 2026

Thanks for the review!

Regarding the test file location — you're right, I'll move it to the appropriate test directory. Could you point me to the preferred test folder for scheduler-related tests?

Regarding queue[:] = keep — good catch. The slice assignment preserves the original list object reference, which is important since self.waiting may be referenced elsewhere. I'll add a comment clarifying this choice.

@dubin555 dubin555 force-pushed the oss-scout/verify-fix-on2-queue-remove branch from e0006a3 to 14c6468 Compare March 2, 2026 14:20
… chunk scheduler

Replace repeated queue.remove() calls with a single-pass partition that
separates completed items from remaining items in O(n) time. The original
approach called list.remove() for each completed item, which is O(n) per
removal, making the total O(n²) for k completions.
@dubin555 dubin555 force-pushed the oss-scout/verify-fix-on2-queue-remove branch from 14c6468 to 8c16b3d Compare March 2, 2026 14:20
@dubin555
Copy link
Copy Markdown
Contributor Author

dubin555 commented Mar 2, 2026

Updated — moved the test file from repo root to tests/test_fix_on2_queue_remove.py and removed the benchmark artifact. Also cleaned up the commit to only include the source change and the test.

@amy-why-3459
Copy link
Copy Markdown
Contributor

@Shirley125 PTAL

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Updated — moved the test file from repo root to tests/test_fix_on2_queue_remove.py and removed the benchmark artifact. Also cleaned up the commit to only include the source change and the test.

I think the test files should be added to the simple unit test pipeline

@dubin555
Copy link
Copy Markdown
Contributor Author

dubin555 commented Mar 2, 2026

Good point! Added pytestmark = [pytest.mark.core_model, pytest.mark.cpu] to the test file so it gets picked up by the simple unit test pipeline (pytest -m 'core_model and cpu'). See the latest commit.

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Report

📋 Summary

Item Details
PR perf: replace O(n²) queue.remove() with O(n) single-pass partition in chunk scheduler
Author @dubin555
Files Changed chunk_transfer_adapter.py, test_fix_on2_queue_remove.py
Changes +385 / -3

🎯 Purpose

Optimize _process_chunk_queue() in chunk_transfer_adapter.py from O(n²) to O(n) complexity.

Original Problem:

  • The code snapshots the queue, iterates over it, and calls queue.remove(request) for each item to be moved
  • Since deque.remove() is O(n) (linear scan), and it's called inside an O(n) loop → O(n²) total
  • This becomes a scheduling bottleneck at high concurrency

Fix Approach:

  • Build a keep list of requests that should stay in the queue
  • Replace queue contents with queue[:] = keep at the end — single O(n) operation
  • Same semantics, same element ordering, no API changes

📊 Performance Benchmarks (Worst Case: 50% stay, 50% removed)

N Original Fixed Speedup
1,000 6.9ms 0.18ms 37.8x
5,000 183ms 1.0ms 176.9x
10,000 719ms 1.9ms 368.6x
50,000 18.0s 10.7ms 1,678x

The original shows clear O(n²) growth (doubling N ≈ 4× time), while the fix shows O(n) growth (doubling N ≈ 2× time).


🔍 Code Changes

# Before (O(n²)):
queue_snapshot = list(queue)
for request in queue_snapshot:
    # ... logic ...
    queue.remove(request)  # O(n) per call
    waiting_for_chunk_list.append(request)

# After (O(n)):
keep: list[Any] = []
for request in list(queue):
    # ... logic ...
    if should_stay:
        keep.append(request)
    else:
        waiting_for_chunk_list.append(request)
queue[:] = keep  # Single O(n) replacement

✅ Test Coverage

Comprehensive test suite (test_fix_on2_queue_remove.py) with:

Test Coverage
test_all_new_requests_moved All WAITING requests moved
test_ready_chunks_stay_in_queue Ready-chunk requests stay
test_finished_requests_stay_in_queue Finished requests stay
test_waiting_for_chunk_with_finished_load Load-finished requests get target_status
test_mixed_scenario All code paths exercised
test_with_plain_list_queue Works with plain list (not just deque)
test_empty_queue Edge case: empty queue
test_order_preservation Element ordering preserved
test_original_vs_fixed_equivalence 100 randomized equivalence trials
test_performance_improvement Regression test for O(n) scaling

All tests pass with pytest -m 'core_model and cpu'.


🔍 Review Findings

✅ Strengths

  1. Solid algorithmic improvement — Clear O(n²) → O(n) optimization
  2. Excellent benchmark data — Real numbers showing massive speedup at scale
  3. Comprehensive test coverage — 100 randomized equivalence trials ensure behavioral parity
  4. Minimal code change — Only 8 lines modified in production code
  5. Good documentation — PR description clearly explains the problem and solution
  6. CI markers addedpytestmark = [pytest.mark.core_model, pytest.mark.cpu] for proper pipeline inclusion

⚠️ Minor Observations

  1. Type annotation: keep: list[Any] could be more specific if request type is known
  2. Test file location: Test file is standalone and doesn't require vllm to be installed — good for isolation but consider if it should be integrated with existing test infrastructure

📝 Verdict

Rating Notes
APPROVE Excellent performance optimization with thorough testing

Rationale:

  • Clear algorithmic improvement with measurable impact
  • Comprehensive test coverage proving behavioral equivalence
  • Minimal, well-documented code changes
  • Real-world benchmarks demonstrate significant value at scale

Kudos: The 100 randomized equivalence trials are particularly thorough — this is exactly the kind of testing needed for algorithm changes. 👏


Reviewed by: vllm-omni-reviewer MCP tool 🤖

@Shirley125
Copy link
Copy Markdown
Contributor

Thanks for you work! But please check chatgpt bot's review --- Avoid list-slice assignment on scheduler waiting queue. After testing this PR, I found that using slicing on the waiting queue indeed raises an error.

…hunk scheduler

Collect requests to remove in a single pass, then call queue.remove_requests()
once instead of calling queue.remove() per item inside a loop.

Uses the queue's native remove_requests() method which is compatible with
vLLM's request queue abstractions (not plain lists).
@dubin555 dubin555 force-pushed the oss-scout/verify-fix-on2-queue-remove branch from 922529c to 469aa18 Compare March 3, 2026 02:58
@dubin555
Copy link
Copy Markdown
Contributor Author

dubin555 commented Mar 3, 2026

Thanks @Shirley125 — good catch. You're right that queue[:] = keep doesn't work with vLLM's request queue abstractions.

Updated the approach: instead of slice assignment, I now collect requests to remove in a single pass, then call queue.remove_requests(to_remove) which uses the queue's native batch removal method. This avoids the O(n*k) overhead of per-item queue.remove() while being compatible with the actual queue type.

@dubin555
Copy link
Copy Markdown
Contributor Author

dubin555 commented Mar 3, 2026

For context on why remove_requests() is the right approach: vLLM's FCFSRequestQueue.remove_requests() (request_queue.py:109-116) internally builds a set and filters in O(n), instead of the per-item remove() which is O(n) each call:

def remove_requests(self, requests):
    requests_to_remove = set(requests)  # O(k)
    filtered = [req for req in self if req not in requests_to_remove]  # O(n)
    self.clear()
    self.extend(filtered)

So collecting items to remove in a single pass and calling remove_requests() once gives the same O(n) performance as the original single-pass partition approach, but uses the queue's own API correctly.

finished_load_reqs.remove(request.request_id)
self.requests_with_ready_chunks.add(request.request_id)
continue
queue.remove(request)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try using pop_request?

@Shirley125
Copy link
Copy Markdown
Contributor

For context on why remove_requests() is the right approach: vLLM's FCFSRequestQueue.remove_requests() (request_queue.py:109-116) internally builds a set and filters in O(n), instead of the per-item remove() which is O(n) each call:

def remove_requests(self, requests):
    requests_to_remove = set(requests)  # O(k)
    filtered = [req for req in self if req not in requests_to_remove]  # O(n)
    self.clear()
    self.extend(filtered)

So collecting items to remove in a single pass and calling remove_requests() once gives the same O(n) performance as the original single-pass partition approach, but uses the queue's own API correctly.

Thanks, your solution works. However, there is a potential issue here. If queue is waiting, we can use the remove_requests() method. But if queue is the running queue (which is a list[Request]), this method cannot be used, and we may need to use the previous slicing-based approach instead.

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing inline

to_remove.append(request)
waiting_for_chunk_list.append(request)
if to_remove:
queue.remove_requests(to_remove)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove_requests() is not a standard list method. Does every queue type passed here implement it? The old queue.remove() worked on any list-like.

@dubin555
Copy link
Copy Markdown
Contributor Author

dubin555 commented Mar 5, 2026

Closing this PR — the remove_requests() method is not available on plain list (the running queue), so this change would cause AttributeError when processing running requests. Thanks for the review catch.

@dubin555 dubin555 closed this Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants