[diffusion] refactor: make timestep scheduler request-local#23716
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the scheduler management across multiple multimodal generation stages to ensure request isolation and prevent state leakage during concurrent execution. The changes move the scheduler instance from a shared class attribute to the request batch, utilizing deep copies to maintain independent states. The review feedback identifies several critical issues, including incorrect attribute access for diffusers scheduler configurations, potential state leakage in causal_denoising.py, and the use of uninitialized scheduler copies in denoising_dmd.py.
| autocast_enabled = ( | ||
| target_dtype != torch.float32 | ||
| ) and not server_args.disable_autocast | ||
| scheduler = batch.scheduler or self.scheduler |
There was a problem hiding this comment.
This assignment uses self.scheduler directly if batch.scheduler is missing. Since self.scheduler is a shared instance attribute of the stage, any modifications to its state during the denoising process will leak across concurrent requests, violating the stateless design goal of this PR. A copy.deepcopy should be used here to ensure request isolation (note that import copy would also need to be added to this file).
|
/tag-and-rerun-ci |
|
Concerns about scheduler-clone overhead this PR might introduce: request-local isolated schedulers are only needed when a request can run concurrently with another request or outlive the stage-local scheduler state, for example grouped execution, true multi-request batch execution, or disaggregation-side scheduler reconstruction. So the ownership model is:
This keeps the existing sequential performance behavior while providing the right isolation point for future grouped requests. |
Introduce batch-local scheduler object
PipelineStages are designed to be of no side-effect (not changing global states outside of Req).
So instead of sharing stage-global scheduler state across requests, scheduler objects should keep mutable denoising loop state (for example step indices and multistep buffers), and stages will clone the pipeline scheduler template into the request
Since scheduler contains no weights, this change won't introduce any obvious VRAM usage
Motivation
Modifications
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci