Skip to content

Commit

Permalink
FIX vllm-project#7592 keeping chunked prefill performance the untouched
Browse files Browse the repository at this point in the history
  • Loading branch information
noooop committed Aug 27, 2024
1 parent 98312de commit 408b727
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion vllm/core/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,9 @@ def _schedule_running(
# to keep all the sequence groups in the RUNNING state.

if enable_chunking:
# Once chunked prefill is enabled, the policy is changed to prioritize decode requests.
# By default, vLLM scheduler prioritizes prefills.
# Once chunked prefill is enabled,
# the policy is changed to prioritize decode requests.
self.running = deque(
sorted(
self.running,
Expand Down

0 comments on commit 408b727

Please sign in to comment.