[diffusion][CI]: Add individual component accuracy CI for diffusion models by Ratish1 · Pull Request #18709 · sgl-project/sglang

Ratish1 · 2026-02-12T09:34:54Z

Motivation

This PR introduces component-level accuracy testing for the diffusion runtime in sglang.multimodal_gen. Related: #12987

This PR adds a dedicated component-accuracy framework for validating SGLang runtime components against their reference Hugging Face implementations.

For individual component accuracy, the reference side is:

diffusers for transformer and vae
transformers for text_encoder

That is the core value of this PR. It gives us a targeted correctness layer between raw checkpoint loading and full
serving tests.

This is especially important for diffusion runtime components because they are often not raw HF modules. In practice they may differ in:

parameter naming
fused versus split tensor layouts
tensor-parallel sharding
runtime-specific initialization and loading paths

A component-level parity framework is therefore necessary to validate the actual runtime module implementation instead of relying only on end-to-end pipeline behavior.

Modification

Add a component-accuracy framework for diffusion runtime modules

This PR adds a dedicated accuracy flow for comparing runtime-loaded SGLang components against Hugging Face reference components.

The framework supports the main component families used by the diffusion runtime:

transformer
VAE
text encoder

The comparison flow is:

resolve the concrete component source for a testcase
build runtime-style ServerArgs
initialize the required distributed/runtime state
load the SGLang component through the actual runtime loader path
load the corresponding Hugging Face reference component
align reference weights into the runtime component
run both components
compare outputs with cosine similarity

This gives us a stable component-level correctness signal that can run inside CI.

Use the real SGLang runtime loader path

The SGLang side of the comparison is loaded through the actual runtime component loader stack.

That means the framework exercises the same runtime-side loading logic used by the codebase rather than constructing fake
test-only modules. This is important because the goal is to validate the real runtime implementation, including:

runtime model class resolution
distributed/model-parallel setup
component-specific loader logic
runtime-specific tensor layouts

Compare against Hugging Face reference implementations

The reference side is loaded from the corresponding Hugging Face implementation for each component family:

transformer and vae use diffusers
text_encoder uses transformers

This is intentionally a component-level reference path, so that the framework can compare one runtime component at a time and localize issues much more precisely than end-to-end tests allow.

Add weight-alignment logic for runtime parity testing

A core part of this PR is the weight-alignment path used before comparison.

This exists because the runtime component and the Hugging Face reference component do not always expose weights in the same form. The framework therefore handles:

parameter name remapping
known fused tensor layouts such as QKV and gate/up projections
tensor-parallel-aware copying into sharded runtime parameters

This allows the SGLang runtime component to be compared in the layout it actually uses in the codebase, while still
grounding the comparison in the Hugging Face reference weights.

Add forward adapters for component-level comparison

The PR introduces native hook/input adapters that build deterministic synthetic inputs for component comparisons.

These adapters exist so that the framework can produce a valid shared input bundle for both:

the SGLang runtime component
the Hugging Face reference component

This is needed because the two sides do not always expose identical forward signatures. The adapters normalize those differences so the framework can run a clean parity comparison.

Add staged low-memory execution for selected 1-GPU cases

Some large 1-GPU cases cannot safely keep both the runtime component and the reference component resident at the same time.

For these cases, the framework supports staged execution:

run the SGLang side first
move the output to CPU
release the first stage
run the reference side
compare CPU outputs

This keeps the framework usable for memory-constrained cases without changing the comparison contract.

Add distributed-aware runtime setup and cleanup

The framework initializes the same distributed/model-parallel context needed for the selected testcase topology,
including 1-GPU and 2-GPU accuracy runs.

It also performs explicit cleanup after each case so that:

runtime state does not leak between cases
multi-GPU runs exit cleanly
failures are easier to localize per testcase

Add testcase policy for thresholds and unsupported boundaries

The framework also includes testcase policy for:

threshold overrides where distributed/runtime variance is acceptable
unsupported reference-loading boundaries that should be skipped rather than reported as false failures

This keeps the reported results meaningful and avoids mixing together:

real runtime correctness issues
unsupported comparison boundaries
expected minor variance in multi-GPU parity runs

In short, this PR adds the first component-accuracy framework for the diffusion runtime and grounds it against the
corresponding Hugging Face reference implementations.

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

…odels

gemini-code-assist · 2026-02-12T09:35:17Z

Summary of Changes

Hello @Ratish1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive accuracy testing framework for diffusion models within SGLang. It establishes a robust CI process by enabling component-level comparisons against established Hugging Face baselines. The framework includes an extensible engine for loading and aligning model weights, along with dedicated adapters for various diffusion model architectures, ensuring consistent and reliable accuracy validation across different configurations and GPU setups.

Highlights

Component-level Accuracy Tests: Added component-level accuracy tests for VAE, transformer, and text encoder components of SGLang diffusion models, comparing them against Hugging Face Diffusers/Transformers baselines.
Deterministic Inputs and Centralized Configuration: Implemented deterministic inputs and centralized thresholds/skips to ensure CI stability and manage model-specific variations.
New Accuracy Engine: Introduced a new accuracy engine responsible for component loading, weight alignment, and cosine-similarity checks between SGLang and reference models.
Model-Specific Adapters: Developed model-specific adapters for Flux, Wan, Qwen, ZImage, and Hunyuan models to handle input generation for accuracy tests.
Test Suites: Created 1-GPU and 2-GPU test suites (test_accuracy_*) to cover various testing scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/multimodal_gen/test/server/accuracy_adapters.py
- Added a new file defining abstract ComponentAdapter and concrete implementations for Flux, Wan, Qwen, ZImage, and Hunyuan models to manage input generation and model execution for accuracy tests.
python/sglang/multimodal_gen/test/server/accuracy_config.py
- Added a new file to centralize default accuracy thresholds, per-case overrides, and explicit skip reasons for VAE, Transformer, and Text Encoder components.
python/sglang/multimodal_gen/test/server/accuracy_utils.py
- Added a new file providing utility functions for deterministic random number generation, tensor statistics logging, and robust extraction of output tensors from model results.
python/sglang/multimodal_gen/test/server/component_accuracy.py
- Added a new file implementing the core AccuracyEngine responsible for loading SGLang and reference model components, performing TP-aware weight transfer, and conducting cosine similarity checks.
python/sglang/multimodal_gen/test/server/test_accuracy_1_gpu_a.py
- Added a new file containing pytest-based accuracy tests for VAE, Transformer, and Text Encoder components on a single GPU for a specific set of diffusion models.
python/sglang/multimodal_gen/test/server/test_accuracy_1_gpu_b.py
- Added a new file containing pytest-based accuracy tests for VAE, Transformer, and Text Encoder components on a single GPU for another specific set of diffusion models.
python/sglang/multimodal_gen/test/server/test_accuracy_2_gpu_a.py
- Added a new file containing pytest-based accuracy tests for VAE, Transformer, and Text Encoder components across two GPUs for a specific set of diffusion models, utilizing torchrun.
python/sglang/multimodal_gen/test/server/test_accuracy_2_gpu_b.py
- Added a new file containing pytest-based accuracy tests for VAE, Transformer, and Text Encoder components across two GPUs for another specific set of diffusion models, utilizing torchrun.
test/registered/diffusion/test_diffusion_component_accuracy_1gpu_a.py
- Added a new CI registration file to run the 1-GPU accuracy test suite A.
test/registered/diffusion/test_diffusion_component_accuracy_1gpu_b.py
- Added a new CI registration file to run the 1-GPU accuracy test suite B.
test/registered/diffusion/test_diffusion_component_accuracy_2gpu_a.py
- Added a new CI registration file to run the 2-GPU accuracy test suite A.
test/registered/diffusion/test_diffusion_component_accuracy_2gpu_b.py
- Added a new CI registration file to run the 2-GPU accuracy test suite B.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive and well-structured framework for component-level accuracy testing of diffusion models. The separation of concerns into adapters, configuration, and a core engine is excellent. My feedback focuses on improving maintainability and robustness by refactoring for conciseness, reducing reliance on fragile string-based logic, and hardening utility functions against unexpected inputs.

python/sglang/multimodal_gen/test/server/accuracy_utils.py

python/sglang/multimodal_gen/test/server/component_accuracy.py

python/sglang/multimodal_gen/test/server/accuracy_adapters.py

zhaochenyang20 · 2026-02-23T07:55:03Z

Nice done

- replace model-specific adapter flow with generic hook-based component profiles - move component comparison execution into a unified native-hook accuracy engine - wire all 1-GPU/2-GPU accuracy suites to the native hook execution path - add topology-aware parallel orchestration for mixed component test suites

gemini-code-assist · 2026-03-02T18:38:47Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

mickqian · 2026-03-08T18:17:14Z

bravo.
is it working fine now? could you attach some logs

Ratish1 · 2026-03-08T18:19:10Z

bravo. is it working fine now? could you attach some logs

Yes ran it locally and works fine. I can attach logs after rerunning once more since there are 4 test files to run and takes some time for the 2 GPU runs.

Ratish1 · 2026-03-09T14:16:30Z

Hey @mickqian , my bad for the late reply, got caught up in some other work. Here is the logs for all 4 test fiiles below:

1 GPU A & B:

2 GPU A & B:

BBuf

LGTM now.

BBuf · 2026-03-28T12:00:06Z

@mickqian I think the pr is ready now, do you have any other suggestions?

BBuf · 2026-03-28T12:03:21Z

We should try to keep CI runtime short — ideally each test completes within a few minutes.

Ratish1 · 2026-03-28T12:08:02Z

We should try to keep CI runtime short — ideally each test completes within a few minutes.

Yes I think the text encoders are the problem for that. Will skip similar models sharing same components.

…gpu teardown

Ratish1 · 2026-03-28T18:19:38Z

Fixed CI runtime @BBuf @mickqian

BBuf · 2026-03-30T01:57:31Z

/tag-and-rerun-ci

ping1jing2 · 2026-03-30T05:34:48Z

/rerun-failed-ci

BBuf · 2026-04-01T01:03:31Z

/rerun-failed-ci

Ratish1 · 2026-04-01T08:22:52Z

/rerun-failed-ci

Ratish1 · 2026-04-01T10:42:59Z

/rerun-failed-ci

Ratish1 · 2026-04-01T16:04:19Z

TODO: wire the tests into run_suite.py.

mickqian · 2026-04-03T04:24:43Z

could you add a sleep after each server shutdown, to make CI more reliable?

Ratish1 · 2026-04-03T05:53:42Z

could you add a sleep after each server shutdown, to make CI more reliable?

Yes, will do it in this PR #21960 . Thanks.

Ratish1 added 3 commits February 4, 2026 22:51

[diffusion][CI]: Add individual component accuracy CI for diffusion m…

043a941

…odels

remove fallback

5f93ead

Merge remote-tracking branch 'upstream/main' into feat/accuracy-test

d18275e

github-actions bot added the diffusion SGLang Diffusion label Feb 12, 2026

gemini-code-assist bot reviewed Feb 12, 2026

View reviewed changes

upd

3df80fe

Ratish1 added 4 commits February 23, 2026 15:14

Merge remote-tracking branch 'upstream/main' into feat/accuracy-test

668e52e

upd

6958597

upd

61fce33

Ratish1 marked this pull request as ready for review March 2, 2026 18:38

Ratish1 requested review from mickqian, ping1jing2 and yhyang201 as code owners March 2, 2026 18:38

upd

d594a1c

Ratish1 force-pushed the feat/accuracy-test branch from d594a1c to 322be95 Compare March 9, 2026 14:29

Ratish1 requested review from ByronHsu, CatherineSue, Edwardf0t1, iforgetmyname, ispobock, key4ng, slin1237 and zhyncs as code owners March 9, 2026 14:29

Ratish1 added 4 commits March 28, 2026 12:01

reduce unnecessary memory cleanup work between accuracy stages

0300ad9

remove unused accuracy helpers

0e5f665

[diffusion] test: trim accuracy harness comments

9edb668

Merge remote-tracking branch 'upstream/main' into feat/accuracy-test

4d8ccb0

BBuf approved these changes Mar 28, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

Ratish1 added 2 commits March 28, 2026 23:44

[diffusion] test: reduce duplicate accuracy coverage and stabilize 2-…

280fd24

…gpu teardown

Merge remote-tracking branch 'upstream/main' into feat/accuracy-test

eb82f80

Merge branch 'main' into feat/accuracy-test

6534ea3

Ratish1 added 7 commits March 30, 2026 11:12

fix ci names

93307ad

fix ci path

c5f54e9

upd

fe41a77

fix OOM on 2 gpu cases

9d121fc

add CI tests to diffusion workflow

6d4245b

fix

c1edd23

fix

96c5fdb

BBuf approved these changes Apr 1, 2026

View reviewed changes

Ratish1 mentioned this pull request Apr 2, 2026

[diffusion][CI]: route multimodal component accuracy through run_suite #21960

Open

5 tasks

Conversation

Ratish1 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Add a component-accuracy framework for diffusion runtime modules

Use the real SGLang runtime loader path

Compare against Hugging Face reference implementations

Add weight-alignment logic for runtime parity testing

Add forward adapters for component-level comparison

Add staged low-memory execution for selected 1-GPU cases

Add distributed-aware runtime setup and cleanup

Add testcase policy for thresholds and unsupported boundaries

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 commented Feb 23, 2026

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Uh oh!

mickqian commented Mar 8, 2026

Uh oh!

Ratish1 commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ratish1 commented Mar 9, 2026

Uh oh!

BBuf left a comment

Choose a reason for hiding this comment

Uh oh!

BBuf commented Mar 28, 2026

Uh oh!

BBuf commented Mar 28, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

Ratish1 commented Mar 28, 2026

Uh oh!

Ratish1 commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BBuf commented Mar 30, 2026

Uh oh!

ping1jing2 commented Mar 30, 2026

Uh oh!

BBuf commented Apr 1, 2026

Uh oh!

Ratish1 commented Apr 1, 2026

Uh oh!

Ratish1 commented Apr 1, 2026

Uh oh!

Ratish1 commented Apr 1, 2026

Uh oh!

mickqian commented Apr 3, 2026

Uh oh!

Ratish1 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Ratish1 commented Feb 12, 2026 •

edited

Loading

Ratish1 commented Mar 8, 2026 •

edited

Loading

Ratish1 commented Mar 28, 2026 •

edited

Loading