Skip to content

[diffusion] refactor: make timestep scheduler request-local#23716

Merged
mickqian merged 19 commits into
mainfrom
diffusion-stateless-scheduler
Apr 26, 2026
Merged

[diffusion] refactor: make timestep scheduler request-local#23716
mickqian merged 19 commits into
mainfrom
diffusion-stateless-scheduler

Conversation

@mickqian
Copy link
Copy Markdown
Collaborator

@mickqian mickqian commented Apr 25, 2026

Introduce batch-local scheduler object

PipelineStages are designed to be of no side-effect (not changing global states outside of Req).
So instead of sharing stage-global scheduler state across requests, scheduler objects should keep mutable denoising loop state (for example step indices and multistep buffers), and stages will clone the pipeline scheduler template into the request

Since scheduler contains no weights, this change won't introduce any obvious VRAM usage

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@github-actions github-actions Bot added the diffusion SGLang Diffusion label Apr 25, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the scheduler management across multiple multimodal generation stages to ensure request isolation and prevent state leakage during concurrent execution. The changes move the scheduler instance from a shared class attribute to the request batch, utilizing deep copies to maintain independent states. The review feedback identifies several critical issues, including incorrect attribute access for diffusers scheduler configurations, potential state leakage in causal_denoising.py, and the use of uninitialized scheduler copies in denoising_dmd.py.

Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising_dmd.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising_dmd.py Outdated
autocast_enabled = (
target_dtype != torch.float32
) and not server_args.disable_autocast
scheduler = batch.scheduler or self.scheduler
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This assignment uses self.scheduler directly if batch.scheduler is missing. Since self.scheduler is a shared instance attribute of the stage, any modifications to its state during the denoising process will leak across concurrent requests, violating the stateless design goal of this PR. A copy.deepcopy should be used here to ensure request isolation (note that import copy would also need to be added to this file).

@mickqian
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@mickqian mickqian changed the title [diffusion] refactor: make timestep preparation stage state-less [diffusion] refactor: make timestep scheduler request-local Apr 25, 2026
@mickqian
Copy link
Copy Markdown
Collaborator Author

Concerns about scheduler-clone overhead this PR might introduce:

request-local isolated schedulers are only needed when a request can run concurrently with another request or outlive the stage-local scheduler state, for example grouped execution, true multi-request batch execution, or disaggregation-side scheduler reconstruction.

So the ownership model is:

  • reuse the stage-local scheduler for the sequential path, and
  • isolate only for concurrent/disaggregated execution.

This keeps the existing sequential performance behavior while providing the right isolation point for future grouped requests.

@mickqian mickqian merged commit d49a037 into main Apr 26, 2026
89 of 105 checks passed
@mickqian mickqian deleted the diffusion-stateless-scheduler branch April 26, 2026 07:59
vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant