feat: Priority-based scheduling optimization (including default priority, preemption toggle, priority-based metrics, etc.) by zhuxinjie-nz · Pull Request #17026 · sgl-project/sglang

zhuxinjie-nz · 2026-01-13T13:35:13Z

Motivation

During the use of priority-based scheduling, we identified several issues:

The original priority scheduling logic does not support setting a default priority.
The preemption logic introduces certain risks in our specific business scenario, so we need a configurable toggle to enable or disable preemption.
We require metrics to support performance analysis based on priority-based scheduling.

Modifications

Built upon the original priority-based scheduling, several optimizations have been implemented, including support for setting a default priority, adding a toggle for preemption logic, and introducing priority-based metrics.

Benchmarking and Profiling

Queueing behavior under high and low priority

Starting from concurrency level 40, requests begin to queue, with the number of queued low-priority requests significantly higher than that of high-priority ones, which aligns with expectations.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

gemini-code-assist · 2026-01-13T13:35:33Z

Summary of Changes

Hello @zhuxinjie-nz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the existing priority-based scheduling system by addressing key functional and observability gaps. It provides mechanisms for assigning default priorities to requests, offers a configurable switch for preemption, and integrates comprehensive priority-aware metrics. These changes aim to make the scheduler more robust, adaptable to diverse operational needs, and transparent in its performance characteristics.

Highlights

Default Priority Support: Introduced the ability to set a default priority for requests that do not explicitly specify one, improving flexibility in scheduling.
Configurable Preemption Toggle: Added a new configuration option to enable or disable preemption logic based on priority, allowing for better control over scheduling behavior in different business scenarios.
Priority-Based Metrics: Implemented new metrics to track request counts by priority across various queues (running, waiting, prefill, decode), enabling detailed performance analysis and monitoring of priority scheduling effectiveness.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several valuable optimizations for priority-based scheduling, including support for a default priority, a toggle for preemption, and priority-based metrics. The changes are well-structured and address the motivations outlined in the description.

My review focuses on improving code maintainability and performance by addressing areas of code duplication. I've identified several instances where similar logic is repeated across different parts of the codebase. By refactoring these into helper methods, the code can be made cleaner, more efficient, and easier to maintain in the future. Specifically, I've pointed out opportunities for this in scheduler.py, scheduler_metrics_mixin.py, and collector.py.

gemini-code-assist · 2026-01-13T13:38:31Z

+            if self.enable_priority_scheduling:
+                num_running_reqs_by_priority.clear()
+                for req in self.running_batch.reqs:
+                    num_running_reqs_by_priority[req.priority] += 1


This block for calculating num_running_reqs_by_priority is inefficient as it runs inside a loop over self.waiting_queue. It is also a duplicate of the logic at lines 1859-1861.

This repeated calculation can impact performance, especially with a large waiting queue. Furthermore, its current placement might not accurately reflect the final state of self.running_batch for logging, as it's updated on each iteration.

A better approach is to calculate num_running_reqs_by_priority only once, after the loop completes and just before log_prefill_stats is called. This ensures correctness and improves efficiency.

I recommend removing this block and the one at lines 1859-1861, and then adding the calculation logic right before the log_prefill_stats call around line 2003.

gemini-code-assist · 2026-01-13T13:38:31Z

+            if self.enable_priority_scheduling:
+                num_queue_reqs_by_priority: dict[int, int] = defaultdict(int)
+                for req in self.waiting_queue:
+                    num_queue_reqs_by_priority[req.priority] += 1
+                self.stats.num_queue_reqs_by_priority = num_queue_reqs_by_priority


The logic for counting requests by priority is repeated multiple times in this file for different queues (e.g., lines 257-265, 278-286, and also in log_decode_stats). This code duplication makes the code harder to maintain.

To improve this, you could extract the counting logic into a helper method. For example:

from collections import defaultdict from typing import Iterable, Dict def _compute_reqs_by_priority(self, req_queue: Iterable[Req]) -> Dict[int, int]: """Computes the number of requests for each priority in a queue.""" counts = defaultdict(int) for req in req_queue: counts[req.priority] += 1 return counts

Then you can use it like this:

if self.enable_priority_scheduling: self.stats.num_queue_reqs_by_priority = self._compute_reqs_by_priority(self.waiting_queue)

This would make the code cleaner and more maintainable.

gemini-code-assist · 2026-01-13T13:38:31Z

+        if stats.num_running_reqs_by_priority:
+            for key, value in stats.num_running_reqs_by_priority.items():
+                self._log_gauge(self.num_running_reqs, value, priority=key)
+        else:
+            self._log_gauge(self.num_running_reqs, stats.num_running_reqs)


This if/else block for logging gauges based on whether per-priority data is available is repeated for multiple metrics (e.g., num_queue_reqs, num_prefill_prealloc_queue_reqs, etc.). This leads to code duplication and makes the log_stats method lengthy.

Consider refactoring this logic into a helper method to reduce duplication and improve readability. For example:

def _log_gauge_by_priority(self, gauge, total_value: Union[int, float], by_priority_dict: Dict[int, int]): if by_priority_dict: for priority, value in by_priority_dict.items(): self._log_gauge(gauge, value, priority=priority) else: self._log_gauge(gauge, total_value)

You could then simplify the calls in log_stats like this:

self._log_gauge_by_priority( self.num_running_reqs, stats.num_running_reqs, stats.num_running_reqs_by_priority )

stmatengss · 2026-01-14T15:46:08Z

/tag-and-rerun-ci

huangtingwei9988 · 2026-01-20T03:57:02Z

@hnyls2002 @xiezhq-hermann PTAL

huangtingwei9988 · 2026-01-21T02:25:43Z

/tag-run-ci-label

xiezhq-hermann · 2026-01-22T23:55:20Z

@harrisonlimh would appreciate if you can take a look at this PR.

zhuxinjie-nz · 2026-01-26T07:08:25Z

@harrisonlimh would appreciate if you can take a look at this PR.

@harrisonlimh Thanks for reviewing! 🙏

harrisonlimh · 2026-01-28T02:03:42Z

Hi! Thank you so much for the contribution! I will make sure to review the PR in the next two days.

harrisonlimh · 2026-01-30T07:27:07Z

    prefill_max_requests: Optional[int] = None
    schedule_policy: str = "fcfs"
    enable_priority_scheduling: bool = False
+    enable_try_preemption_by_priority: bool = False


Preemption is enabled by default in priority scheduling, so this flag is redundant. Would you be able to update it to disable_try_preemption_by_priority and use it as a toggle to disable the behavior instead?

self.try_preemption = ( self.enable_priority_scheduling and not self.server_args.disable_try_preemption_by_priority )

FYI - alternative way of disabling preemption is to set priority_scheduling_preemption_threshold to a value that is > absl(highest_priority_int - lowest_priority_int). Sharing it in case that the mentioned use case of disabling preemption is urgent for the business need.

The parameters have already been updated as required. we still hope to be able to explicitly disable preemption via a parameter, which would make usage clearer and more explicit.

Thank you for updating it!

harrisonlimh · 2026-01-30T07:39:43Z

Would it be possible to make one or two helper functions that calculate per-priority request counts in scheduler.py and scheduler_metrics_mixin.py and consolidate the usage?

Perhaps that a) takes in the list of requests and return xxx_reqs_by_priority dictionary or b) additionally taking the dictionary and updating in place.

Thank you very much — I’ll work on fixing these issues.

# Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/managers/scheduler_metrics_mixin.py # python/sglang/srt/managers/tokenizer_manager.py # python/sglang/srt/metrics/collector.py

stmatengss · 2026-02-04T16:23:46Z

/run-failed-ci

harrisonlimh · 2026-02-08T02:52:42Z

LGTM! Thank you!

stmatengss · 2026-03-04T09:09:56Z

@hnyls2002 Thanks!

huangtingwei9988 · 2026-03-04T09:15:20Z

@hnyls2002 Thanks!

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hnyls2002 · 2026-03-04T22:13:42Z

/rerun-stage stage-c-test-8-gpu-h20

github-actions · 2026-03-04T22:14:08Z

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies).

hnyls2002 · 2026-03-04T22:14:11Z

/rerun-ut test_priority_metrics.py

github-actions · 2026-03-04T22:14:14Z

🔗 View workflow run

github-actions · 2026-03-04T22:14:36Z

❌ No test file found matching test_priority_metrics.py under test/registered/.

…ity, preemption toggle, priority-based metrics, etc.) (sgl-project#17026) Co-authored-by: hnyls2002 <lsyincs@gmail.com>

feat: Add metrics based on priority-based scheduling.

7d66ba9

zhuxinjie-nz requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners January 13, 2026 13:35

gemini-code-assist Bot reviewed Jan 13, 2026

View reviewed changes

[fix] Incorrect parameter naming

8676199

github-actions Bot added the run-ci label Jan 14, 2026

huangtingwei9988 and others added 2 commits January 15, 2026 10:33

fix lint

999d4df

Merge branch 'main' into priority_metrics_dev

e1ef8c3

huangtingwei9988 assigned xiezhq-hermann and hnyls2002 Jan 20, 2026

Merge branch 'main' into priority_metrics_dev

bfad739

nanzhi.zxj added 2 commits January 22, 2026 20:32

[fix] Logic issue in the num_running_reqs_by_priority tracking record.

a582950

Merge remote-tracking branch 'origin/main' into priority_metrics_dev

f9b834a

Merge branch 'main' into priority_metrics_dev

53785e5

harrisonlimh reviewed Jan 30, 2026

View reviewed changes

nanzhi.zxj added 3 commits February 4, 2026 15:06

[fix] Optimize priority scheduling logic.

bfe854a

Merge remote-tracking branch 'origin/main' into priority_metrics_dev

311721f

# Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/managers/scheduler_metrics_mixin.py # python/sglang/srt/managers/tokenizer_manager.py # python/sglang/srt/metrics/collector.py

[fix] Optimize priority scheduling logic.

934758c

harrisonlimh approved these changes Feb 8, 2026

View reviewed changes

sufeng-buaa approved these changes Mar 4, 2026

View reviewed changes

hnyls2002 added 2 commits March 4, 2026 00:32

fix wrong server args naming

6f3e93d

rename confusing name

81ccd78

hnyls2002 requested a review from ClawSeven as a code owner March 4, 2026 08:34

hnyls2002 added 2 commits March 4, 2026 00:35

fix wrong comments

8491531

simplify the code

56e9b44

hnyls2002 and others added 8 commits March 4, 2026 01:28

use QueueCount

0724cbe

fix import error

cdbaf59

add comments

f9d66ad

fix dict copy & label type

b14152c

Add unit test for priority scheduling metrics

75251d9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix future

e7257b7

fix

6f957cb

tiny fix

4504bcf

hnyls2002 merged commit 28c931e into sgl-project:main Mar 4, 2026
53 of 97 checks passed

This was referenced Mar 5, 2026

[misc] Priority scheduling metrics cleanup follow-up #19926

Closed

[misc] Priority scheduling metrics cleanup #19927

Merged

hnyls2002 mentioned this pull request Apr 28, 2026

Deepseek V4 #23882

Merged

Conversation

zhuxinjie-nz commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Benchmarking and Profiling

Queueing behavior under high and low priority

Checklist

Uh oh!

gemini-code-assist Bot commented Jan 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

stmatengss commented Jan 14, 2026

Uh oh!

huangtingwei9988 commented Jan 20, 2026

Uh oh!

huangtingwei9988 commented Jan 21, 2026

Uh oh!

xiezhq-hermann commented Jan 22, 2026

Uh oh!

zhuxinjie-nz commented Jan 26, 2026

Uh oh!

harrisonlimh commented Jan 28, 2026

Uh oh!

harrisonlimh Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

zhuxinjie-nz Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

harrisonlimh Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

harrisonlimh Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

zhuxinjie-nz Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

stmatengss commented Feb 4, 2026

Uh oh!

harrisonlimh commented Feb 8, 2026

Uh oh!

stmatengss commented Mar 4, 2026

Uh oh!

huangtingwei9988 commented Mar 4, 2026

Uh oh!

hnyls2002 commented Mar 4, 2026

Uh oh!

github-actions Bot commented Mar 4, 2026

Uh oh!

hnyls2002 commented Mar 4, 2026

Uh oh!

github-actions Bot commented Mar 4, 2026

Uh oh!

github-actions Bot commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

zhuxinjie-nz commented Jan 13, 2026 •

edited

Loading