Skip to content

[diffusion] Generalize layerwise offload residency mixin to all components#24593

Merged
mickqian merged 43 commits into
sgl-project:mainfrom
mickqian:codex/component-residency-strategy-compat
May 16, 2026
Merged

[diffusion] Generalize layerwise offload residency mixin to all components#24593
mickqian merged 43 commits into
sgl-project:mainfrom
mickqian:codex/component-residency-strategy-compat

Conversation

@mickqian
Copy link
Copy Markdown
Collaborator

@mickqian mickqian commented May 7, 2026

Summary

  • Rename the DiT-specific layerwise offload mixin to LayerwiseOffloadableModuleMixin.
  • Resolve layerwise residency by module capability before falling back to existing component CPU-offload flags.
  • Add --layerwise-offload-components to select layerwise offload by pipeline component name, with --layerwise-offload-modules accepted as an alias.
  • Keep --dit-layerwise-offload legacy behavior: when no component is named, only default DiT components are configured; encoder / VAE / bridge / upsampler / vocoder must be selected explicitly.
  • Disable conflicting component CPU/FSDP offload flags when the same component is explicitly selected for layerwise offload, and keep layer buffers resident to avoid releasing shared buffers such as RoPE caches.

Validation

Remote H200 container /sgl-workspace/sglang at fdf022713606dc2e6262975145d94e7f7d504a0d:

  • PYTHONPATH=/sgl-workspace/sglang/python python -m pytest python/sglang/multimodal_gen/test/unit/test_layerwise_offload.py python/sglang/multimodal_gen/test/unit/test_server_args.py -> 41 passed, 2 warnings
  • Z-Image 512x512, 1 step, seed 0, torch_sdpa backend:
    • --dit-layerwise-offload true --dit-offload-prefetch-size 0 -> PASS, enabled ['transformer']
    • --layerwise-offload-components transformer --dit-offload-prefetch-size 0 -> PASS, enabled ['transformer']
    • --layerwise-offload-components transformer text_encoder --dit-offload-prefetch-size 0 -> PASS, enabled ['text_encoder', 'transformer']
    • --layerwise-offload-components all --dit-offload-prefetch-size 0 -> PASS, enabled ['text_encoder', 'vae', 'transformer']
    • --layerwise-offload-modules transformer --dit-offload-prefetch-size 0 -> PASS, enabled ['transformer']
    • --layerwise-offload-components missing_component --dit-offload-prefetch-size 0 -> PASS with warning and no layerwise component
  • All six generated PNG files have identical sha256: 0f0dba3e7d97aa3be19ef7d6d1cd3ea0e727c322153a8a4f9904089b8e9ee4c1
  • No local tests were run.

CI States

Latest PR Test: Run #25928755979
Latest PR Test (Extra): ⚠️ Not enabled — add run-ci-extra label to opt in.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the layerwise offload mechanism by renaming OffloadableDiTMixin to LayerwiseOffloadableModuleMixin and centralizing residency strategy logic in component_manager.py. It introduces helper functions like is_layerwise_offloaded_module and should_cpu_offload_component to simplify offload decisions across various model components. Feedback includes suggestions to remove a redundant bool() call, simplify multi-line tuple unpacking for better readability, and remove an unnecessary trailing comma in a tuple assignment.

def is_layerwise_offloaded_module(module: torch.nn.Module) -> bool:
return (
isinstance(module, LayerwiseOffloadableModuleMixin)
and bool(module.layerwise_offload_managers)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The bool() call here is redundant. In Python, an empty list is evaluated as False in a boolean context, so you can check for non-emptiness directly. The pythonic way is to use the list itself in the condition.

Suggested change
and bool(module.layerwise_offload_managers)
and module.layerwise_offload_managers

Comment on lines +707 to +711
(shift_msa, scale_msa, gate_msa), (
shift_mlp,
scale_mlp,
gate_mlp,
) = temb_mod_params_img
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This multi-line tuple unpacking seems unnecessary as the line is not excessively long. It could be simplified to a single line for better readability and to reduce vertical space.

        (shift_msa, scale_msa, gate_msa), (shift_mlp, scale_mlp, gate_mlp) = temb_mod_params_img

x_valid_lens,
cap_valid_lens,
) = self.patchify_and_embed(
(x, cap_feats, x_size, x_valid_lens, cap_valid_lens,) = self.patchify_and_embed(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The trailing comma in this tuple unpacking is unnecessary. While valid syntax, it's typically used to define a single-element tuple. For multi-element tuples, it's unconventional and can be removed for clarity.

Suggested change
(x, cap_feats, x_size, x_valid_lens, cap_valid_lens,) = self.patchify_and_embed(
(x, cap_feats, x_size, x_valid_lens, cap_valid_lens) = self.patchify_and_embed(

@mickqian mickqian requested a review from wisclmy0611 as a code owner May 7, 2026 09:26
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 7, 2026
@mickqian mickqian changed the title [diffusion] Generalize layerwise offload residency mixin [diffusion] Generalize layerwise offload residency mixin to all components May 8, 2026
mickqian added 3 commits May 8, 2026 20:31
…ency-strategy-compat

# Conflicts:
#	python/sglang/multimodal_gen/configs/pipeline_configs/base.py
#	python/sglang/multimodal_gen/configs/pipeline_configs/model_deployment_config.py
#	python/sglang/multimodal_gen/configs/pipeline_configs/mova.py
#	python/sglang/multimodal_gen/configs/pipeline_configs/wan.py
#	python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py
#	python/sglang/multimodal_gen/runtime/server_args.py
#	python/sglang/multimodal_gen/test/unit/test_server_args.py
@mickqian mickqian merged commit 416fdbb into sgl-project:main May 16, 2026
124 of 132 checks passed
Shunkangz pushed a commit to Shunkangz/sglang that referenced this pull request May 27, 2026
alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Jun 4, 2026
zijiexia added a commit to zijiexia/sglang that referenced this pull request Jun 4, 2026
…ents

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion documentation Improvements or additions to documentation lora run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant