[diffusion] Add performance mode defaults by mickqian · Pull Request #24491 · sgl-project/sglang

mickqian · 2026-05-06T07:18:04Z

Summary

Add --performance-mode / --mode for diffusion server defaults: auto, throughput, memory, and balanced, with aggressive, conservative, and balance aliases.
Auto-select FSDP+CFG only for high-confidence multi-GPU Qwen/Wan CFG cases, gated by the least available memory across selected GPUs.
Move the auto-tune decision logic into runtime/server_args_auto_tune.py; ServerArgs now only invokes the resolver.
Declare model-specific auto-tune hints on PipelineConfig methods instead of hard-coding Qwen/Wan/MOVA class-name checks in ServerArgs.
Preserve explicit FSDP, offload, and parallelism flags; --enable-cfg-parallel false now explicitly disables CFG parallelism.
Update the diffusion OOM guidance and add a concise deployment cookbook for CPU offload, FSDP, CFG, SP, and TP choices.

TODO

support a performance mode that would not make any adjustments

gemini-code-assist

Code Review

This pull request adds a --performance-mode CLI flag with presets to automate memory and performance configurations for diffusion models, supported by a new Deployment Cookbook and improved OOM diagnostics. Reviewers suggested enhancing GPU memory detection accuracy, maintaining encoder offloading in memory-constrained modes, and refactoring duplicated offload logic.

gemini-code-assist · 2026-05-06T07:20:06Z

+            if self.use_fsdp_inference:
+                self._set_gpu_resident_defaults(use_fsdp=True)
+                return


In memory mode, if use_fsdp_inference is explicitly set to True, calling _set_gpu_resident_defaults(use_fsdp=True) will disable CPU offloading for the text and image encoders (setting them to False). This is counter-intuitive for a mode intended to minimize GPU memory usage. It would be better to still enable offloading for these components even when FSDP is used for the DiT.

if self.use_fsdp_inference: if self.text_encoder_cpu_offload is None: self.text_encoder_cpu_offload = True if self.image_encoder_cpu_offload is None: self.image_encoder_cpu_offload = True if self.dit_cpu_offload is None: self.dit_cpu_offload = False return

gemini-code-assist · 2026-05-06T07:20:06Z

+                self._is_wan_or_mova_pipeline()
+                and not envs.SGLANG_CACHE_DIT_ENABLED
+                and current_platform.enable_dit_layerwise_offload_for_wan_by_default()
+            ):
+                if self.dit_layerwise_offload is None:
+                    self.dit_layerwise_offload = True
+                if self.dit_cpu_offload is None:
+                    self.dit_cpu_offload = False
+                if self.text_encoder_cpu_offload is None:
+                    self.text_encoder_cpu_offload = True
+                if self.image_encoder_cpu_offload is None:
+                    self.image_encoder_cpu_offload = True


The logic for auto-enabling layerwise offload for Wan/Mova models is duplicated here and in _adjust_platform_specific (lines 960-994). Consider refactoring this into a helper method to improve maintainability and ensure consistency across different performance modes.

mickqian · 2026-05-07T07:07:01Z

/tag-and-rerun-ci

…rmance-mode-clean # Conflicts: # python/sglang/multimodal_gen/runtime/server_args.py

…rmance-mode-clean

…rmance-mode-clean # Conflicts: # python/sglang/srt/speculative/eagle_worker_v2.py

…args + deployment cookbook Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add diffusion performance mode defaults

66b0a0b

mickqian requested review from ping1jing2, wisclmy0611 and yhyang201 as code owners May 6, 2026 07:18

github-actions Bot added documentation Improvements or additions to documentation diffusion SGLang Diffusion labels May 6, 2026

mickqian mentioned this pull request May 6, 2026

[codex] Add diffusion performance mode defaults #24419

Closed

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

github-actions Bot added the run-ci label May 7, 2026

mickqian added 15 commits May 7, 2026 15:27

Refactor diffusion performance auto tune policy

82f2304

Apply black formatting to auto tune resolver

a231fca

Rename diffusion performance preset to speed

ef3eaec

Collect deployment auto tune hints

5f9d65f

upd

ccefae5

auto-tune z-image

3186ec0

Merge remote-tracking branch 'origin/main' into codex/diffusion-perfo…

8bf7975

…rmance-mode-clean # Conflicts: # python/sglang/multimodal_gen/runtime/server_args.py

Merge remote-tracking branch 'origin/main' into codex/diffusion-perfo…

eb7444e

…rmance-mode-clean

upd

e45d05f

upd

155c914

upd

42c2e9d

upd

9108ba4

upd

62d671b

upd

7974064

lint

132491c

mickqian requested review from CatherineSue, Ying1123, hnyls2002, key4ng and merrymercy as code owners May 12, 2026 07:17

github-actions Bot added npu model-gateway labels May 12, 2026

mickqian added 6 commits May 12, 2026 15:30

lint

90f5d3e

upd

c909cd9

Merge remote-tracking branch 'origin/main' into codex/diffusion-perfo…

59eec65

…rmance-mode-clean

upd

c214fd8

merge balanced and auto mode

50c3997

fix

031ba4d

mickqian requested review from BBuf and yingluosanqian as code owners May 12, 2026 16:10

mickqian added 13 commits May 13, 2026 15:18

Refine diffusion auto performance mode

acba38e

Tighten auto performance defaults

38b5e04

Narrow auto offload residency tuning

c69f635

Fix auto layerwise offload ordering

4989205

Merge remote-tracking branch 'origin/main' into codex/diffusion-perfo…

8bfc955

…rmance-mode-clean # Conflicts: # python/sglang/srt/speculative/eagle_worker_v2.py

upd

c912573

Update auto performance mode unit expectations

67c65d8

Preserve LTX snapshot DiT offload in auto mode

a94c8e7

Print diffusion performance logs per run

f974077

Remove duplicate baseline improvement logs

b7eaadc

Disable warmup for LTX HQ CI case

337d20c

Avoid LTX snapshot warmup DiT overlap

2023929

Tighten stable LTX perf baselines

3091f2e

mickqian merged commit ff70aea into sgl-project:main May 13, 2026
100 of 138 checks passed

This was referenced May 25, 2026

fix(ci): enforce legacy docs/ gate in Lint workflow zijiexia/sglang#4

Closed

fix(ci): enforce legacy docs/ gate in Lint workflow #26322

Merged

Shunkangz pushed a commit to Shunkangz/sglang that referenced this pull request May 27, 2026

[diffusion] feat: add performance mode server args (sgl-project#24491)

31ea54c

alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Jun 4, 2026

[diffusion] feat: add performance mode server args (sgl-project#24491)

85e6d43

zijiexia mentioned this pull request Jun 4, 2026

docs: sync legacy docs/-only updates into docs_new (Mintlify) #27308

Merged

zijiexia added a commit to zijiexia/sglang that referenced this pull request Jun 4, 2026

docs_new: port sgl-project#24491 — diffusion performance-mode server …

64d3cdd

…args + deployment cookbook Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[diffusion] Add performance mode defaults#24491

[diffusion] Add performance mode defaults#24491
mickqian merged 35 commits into
sgl-project:mainfrom
mickqian:codex/diffusion-performance-mode-clean

mickqian commented May 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

mickqian commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mickqian commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

TODO

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

mickqian commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mickqian commented May 6, 2026 •

edited

Loading