[diffusion] refactor: introduce component residency manager by mickqian · Pull Request #23771 · sgl-project/sglang

mickqian · 2026-04-26T16:49:11Z

Motivation

Component management is crucial for the latency of diffusion pipeline inference, especially with modern diffusion models requiring larger sub-components (mistral, qwen as encoders) and more components (dual-dit for Wan and LTX) which have parameter size sum up to be larger than most modern GPUs.

Currently the model management code is scattered in each pipeline's pre and post hooks, making it hard to apply advance coordination.

Modifications

introduce ComponentResidencyStrategy to abstract and cover all pre-existing module management techniques (including layerwise-offload, snapshot and resident mode for LTX pre-merged LoRA)
introduce the ComponentResidencyManager to serve as a global manager to coordinate the components' placement, to maximize the latency while making full use of VRAM. The manager calls the preparation, pre-fetch and release hooks defined by the aforementioned strategies to coordinate

Accuracy Tests

Speed Tests and Profiling

Checklist

TODO

use warmup request to build the component relationship as a graph, for manager to coordinate the module placement in a smaller granularity
provide argument for user to switch between different memory mode (balanced, high-vram, low-vram)

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

…t-residency-test-20260426

gemini-code-assist

Code Review

This pull request introduces a centralized PipelineResidencyManager to coordinate the loading and offloading of model components (such as VAEs, DiTs, and text encoders) across different pipeline stages. This system replaces legacy stage-local behavior with configurable residency strategies (static, dynamic, or disabled) and integrates hooks into the pipeline executors to manage component lifecycles. Additionally, the PR adds a compatibility wrapper for Flash Attention v3 kernels and updates various stages to declare their component usage. Feedback focuses on a potential type error in text encoding attention masks, the lack of prefetching in the vanilla D2H strategy, and the use of hardcoded memory buffer values.

gemini-code-assist · 2026-04-26T16:51:10Z

+            self._trace("prefetch_skip", use, strategy, module)
+            return
+        self._trace("prefetch", use, strategy, module)


Prefetching is explicitly disabled for VanillaD2HStrategy. This strategy is used for components like text encoders, which can be large and would benefit from asynchronous H2D transfers to hide latency. Unless there is a specific reason to avoid overlapping these transfers (e.g., memory pressure concerns that aren't already handled by the manager), prefetching should be enabled for this strategy as well.

gemini-code-assist · 2026-04-26T16:51:10Z

+        memory_usage = getattr(self.pipeline, "memory_usages", {}).get(component_name)
+        if memory_usage is None:
+            return False
+        return memory_usage + 2.0 < current_platform.get_available_gpu_memory()


The memory buffer value 2.0 (presumably GB) is hardcoded. It would be better to define this as a named constant or make it configurable via ServerArgs to improve maintainability and allow tuning for different hardware or workloads.

…est-20260426 # Conflicts: # python/sglang/multimodal_gen/runtime/managers/scheduler.py # python/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py # python/sglang/multimodal_gen/runtime/pipelines_core/executors/sync_executor.py

mickqian · 2026-04-30T12:55:33Z

/tag-and-rerun-ci

…ect#23771)

github-actions Bot added diffusion SGLang Diffusion jit-kernel labels Apr 26, 2026

mickqian added 6 commits April 27, 2026 00:49

Add component residency manager

cf03214

Respect FSDP-managed text encoders

65e2482

Avoid request-time D2H for single DiT residency

0bdb0ab

Skip synchronous vanilla prefetch

23387dd

Preserve legacy lifecycle for stage-managed components

f51b174

Merge remote-tracking branch 'refs/remotes/origin/main' into componen…

17f27d4

…t-residency-test-20260426

gemini-code-assist Bot reviewed Apr 26, 2026

View reviewed changes

mickqian added 9 commits April 27, 2026 01:25

upd

49ac110

move component / group management codes into manager

ad72cd1

upd

d6e9468

upd

d84b7c8

clean scheduler

cdf984a

simplify

d655308

upd

12f60c4

upd

4ced751

mickqian force-pushed the component-residency-manager branch from 0537cf9 to 4ced751 Compare April 30, 2026 06:27

github-actions Bot added the lora label Apr 30, 2026

mickqian added 10 commits April 30, 2026 15:38

refactor

2f50321

_finish_active_component_use hook for PipelineStage

2b36760

[diffusion] tidy ltx2 residency import

29efe8e

diffusion: keep layerwise non-layer weights on device

9aa177e

self.use_declared_component

c208cca

add todo on phase

de2da66

tighten residency warmup and device readiness

d552641

document residency budget planner TODO

0df0464

restore nonblocking vanilla d2h release

98cd053

clean

1077f35

mickqian force-pushed the component-residency-manager branch from 5187e0b to 1077f35 Compare April 30, 2026 12:55

mickqian marked this pull request as ready for review April 30, 2026 12:55

mickqian requested review from BBuf, Ying1123, hnyls2002, merrymercy, ping1jing2, xiezhq-hermann, yhyang201 and yingluosanqian as code owners April 30, 2026 12:55

github-actions Bot added the run-ci label Apr 30, 2026

mickqian added 5 commits April 30, 2026 21:03

Address component residency review comments

8d881b5

upd

2a07a43

upd

1903bf5

upd

bff4858

Guard LTX2 LoRA phase switch

c561d4a

mickqian force-pushed the component-residency-manager branch from 77e3f2c to c561d4a Compare April 30, 2026 16:11

mickqian added 4 commits May 1, 2026 00:48

Restore Qwen postprocessed no-mask handling

c786d03

Restore LTX2 T2V batched guided forward

4dcd112

upd

74945da

Fix Wan TI2V VAE dtype handling

705d6cd

mickqian merged commit 9d84268 into sgl-project:main May 1, 2026
69 of 78 checks passed

vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026

[diffusion] refactor: introduce component residency manager (sgl-proj…

9ab81ae

…ect#23771)

qimcis mentioned this pull request May 7, 2026

[diffusion]: Fix diffusers executor crash when component residency manager is absent #24573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[diffusion] refactor: introduce component residency manager#23771

[diffusion] refactor: introduce component residency manager#23771
mickqian merged 34 commits into
sgl-project:mainfrom
mickqian:component-residency-manager

mickqian commented Apr 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

mickqian commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mickqian commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

TODO

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

mickqian commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mickqian commented Apr 26, 2026 •

edited

Loading