Nextstep online e2e by Joshna-Medisetty · Pull Request #2107 · vllm-project/vllm-omni

Joshna-Medisetty · 2026-03-23T21:41:45Z

Purpose

Add an online serving E2E test for NextStep-1.1 text-to-image using OmniServer and OpenAIClientHandler.send_diffusion_request, consistent with other online diffusion tests in the repo. Supports broader L4 diffusion online coverage under RFC #1832.

What’s included

Model: NextStep-1.1 (stepfun-ai/NextStep-1.1).
Server config: TP=2, NextStep11Pipeline (single lightweight case; no Cache-DiT / Ulysses / FP8 stack).
Pytest marks: advanced_model, diffusion, L4, distributed_cuda(2).

Test plan

Ran the test successfully on Intel XPU using the same server flags (local validation; the committed test stays CUDA L4–marked for CI).
Expect 2× L4 to be sufficient in CI.

Test Results (Intel XPU / 2× L4)

Config	Resolution	Steps	Latency	Peak Memory reserved	Peak memory allocated
TP=2	256x256	2	25.7 s	~20.2 GB	~16.1 GB
TP=2	256x256	20	57.3 s	~20.2 GB	~16.1 GB
TP=2	512x512	15	191.5 s	~20.2 GB	~16.6 GB
TP=2	512x512	20	230.0 s	~20.2 GB	~16.6 GB
TP=2	512x512	28	286.0 s	~20.2 GB	~16.6 GB

Sample Output

256×256 @ 2 steps

256×256 @ 20 steps

512×512 @ 15 steps

512×512 @ 20 steps

512×512 @ 28 steps

Joshna-Medisetty · 2026-03-23T21:42:51Z

@xuechendi @fhfuih

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b65437cd72

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T21:46:20Z

+                    "--vae-use-slicing",
+                    "--vae-use-tiling",


Exercise an actual VAE optimization path for NextStep

For stepfun-ai/NextStep-1.1, these flags do not change the server behavior today: vllm_omni/diffusion/models/nextstep_1_1/modeling_flux_vae.py implements AutoencoderKL.decode() as a plain decoder call and the class never defines use_slicing, while vllm_omni/diffusion/registry.py only applies vae_use_slicing when that attribute exists. In other words, this parametrization cannot catch regressions in VAE slicing or tiling for NextStep and only ends up validating TP=2, which gives the nightly L4 expansion suite false coverage for the advertised VAE paths.

Useful? React with 👍 / 👎.

hsliuustc0106

Test Evidence Request

Thank you for adding NextStep-1.1 online serving coverage! Before merging, could you please provide some baseline test evidence to help us understand the model's characteristics and validate the CI configuration?

Requested Evidence

1. Latency

Time to generate a single image with the test config (TP=2, 256x256, 2 steps)
If available, also include latency for a more realistic config (e.g., 512x512, 20-30 steps)

2. Accuracy / Correctness

Sample output image(s) from a test run, OR
Validation that send_diffusion_request returns a valid image with expected dimensions

3. Memory

Peak VRAM usage with TP=2 on L4 (or your XPU equivalent)
This helps validate that 2× L4 is sufficient (per your PR description) vs. needing TP=4 on 4× L4

How to Provide

You can add this directly in the PR description or as a comment. Example format:

### Test Results (Intel XPU / 2× L4)

| Config | Resolution | Steps | Latency | Peak Memory |
|--------|------------|-------|---------|-------------|
| TP=2   | 256x256    | 2     | X.XX s  | XX GB       |
| TP=2   | 512x512    | 28    | XX.XX s | XX GB       |

Sample output: [attach image or describe validation]

This information helps us:

Validate the CI hardware allocation is appropriate
Establish a baseline for future regression detection
Document expected behavior for this model

Note: The docs/readthedocs.org failure appears transient (this PR doesn't modify documentation). It should pass on re-run.

congw729

LGTM

xuechendi · 2026-03-24T16:50:43Z

+from tests.utils import hardware_marks
+
+# Same host class as FLUX.2-klein expansion (g6.12xlarge / gpu_4_queue); TP=2 needs 2 devices.
+TWO_CARD_L4_MARKS = hardware_marks(res={"cuda": "L4"}, num_cards=2)


test only on CUDA?

i tested it locally on xpu but removed it for pr to stay consistent with existing diffusion E2E tests, which currently target CUDA L4 . Would you prefer that I also include XPU coverage or should we keep it aligned with CUDA- only tests for now?

Joshna-Medisetty · 2026-03-24T19:23:19Z

@hsliuustc0106 Thanks for the checklist. I’ve updated the PR description with test results and sample outputs.

For memory, we see about ~20.2 GB peak reserved per GPU (PyTorch worker log on Intel XPU, TP=2) and about ~16–16.6 GB allocated depending on resolution; reserved vs allocated is called out in the description.
With ~20 GB per device, 2× L4 (24 GB each) should be enough for this setup without TP=4, noting CUDA/L4 may differ a bit from these XPU numbers.

I also changed the E2E config to 512×512 @ 20 steps so the test matches a more realistic setting.

Gaohan123 · 2026-03-26T17:37:51Z

Please fix CI error. Thanks

Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com>

Gaohan123 · 2026-03-27T07:02:17Z

Are you ready? If not, I can remove the label. Then we can save some resources

Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com>

Joshna-Medisetty · 2026-03-27T16:16:38Z

@Gaohan123 PR is ready and passes tests on L4.
I’m currently seeing a failure in the Qwen image diffusion perf test, which does not include the NextStep test introduced in this PR, so it appears unrelated to these changes.

Joshna-Medisetty · 2026-03-31T18:17:48Z

Hi @Gaohan123, PR is ready for merge. Could please add the label. Thank you.

congw729 · 2026-04-01T06:31:05Z

Hi @Gaohan123, PR is ready for merge. Could please add the label. Thank you.

Hi, I just added the nightly label for L4 level tests to double check.

Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com>

Joshna-Medisetty · 2026-04-02T04:30:04Z

The tests it fails are unrelated. May be need a force merge

fhfuih · 2026-04-02T07:49:04Z

Seems like #2435. It's OK to not merge main if there is no conflict.

Thanks for your patience and continuous tracking of this PR. It has really been a while. But since you happen to have done another merge, let's wait one more time (after that issue is resolved -> you update the branch) 🙏

Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com>

Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com> Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

chatgpt-codex-connector Bot reviewed Mar 23, 2026

View reviewed changes

Joshna-Medisetty changed the title ~~Nextstep online e2e xpu~~ Nextstep online e2e Mar 23, 2026

hsliuustc0106 requested changes Mar 24, 2026

View reviewed changes

fhfuih mentioned this pull request Mar 24, 2026

[RFC]: L4 e2e tests of diffusion models and diffusion features (continuous maintanance) #1832

Open

1 task

congw729 approved these changes Mar 24, 2026

View reviewed changes

fhfuih approved these changes Mar 24, 2026

View reviewed changes

xuechendi reviewed Mar 24, 2026

View reviewed changes

Gaohan123 added ready label to trigger buildkite CI nightly-test label to trigger buildkite nightly test CI and removed ready label to trigger buildkite CI labels Mar 25, 2026

Joshna-Medisetty force-pushed the nextstep-online-e2e-xpu branch 3 times, most recently from 6ad2084 to db5ad57 Compare March 26, 2026 22:35

steps = 5

26c69f4

Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com>

Joshna-Medisetty force-pushed the nextstep-online-e2e-xpu branch from 8f5962d to 26c69f4 Compare March 27, 2026 06:36

Gaohan123 removed the nightly-test label to trigger buildkite nightly test CI label Mar 27, 2026

Joshna-Medisetty added 2 commits March 27, 2026 09:11

Update test_nextstep_expansion.py

d95881f

Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com>

Merge branch 'main' into nextstep-online-e2e-xpu

5308526

Joshna-Medisetty added 2 commits March 30, 2026 10:25

Merge branch 'main' into nextstep-online-e2e-xpu

54af48f

Merge branch 'main' into nextstep-online-e2e-xpu

c70552b

congw729 added the nightly-test label to trigger buildkite nightly test CI label Apr 1, 2026

Joshna-Medisetty added 2 commits April 1, 2026 09:00

Merge branch 'main' into nextstep-online-e2e-xpu

9169361

Update test_nextstep_expansion.py

11f8f25

Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com>

Joshna-Medisetty added 2 commits April 1, 2026 11:19

Update test_nextstep_expansion.py

4bdaaa6

Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com>

Update test_nextstep_expansion.py

abada76

Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com>

Merge branch 'main' into nextstep-online-e2e-xpu

8b609b7

hsliuustc0106 and others added 2 commits April 9, 2026 12:03

Merge branch 'main' into nextstep-online-e2e-xpu

93fb3e3

Merge branch 'main' into nextstep-online-e2e-xpu

686f20c

yenuo26 added ready label to trigger buildkite CI and removed nightly-test label to trigger buildkite nightly test CI labels Apr 14, 2026

Joshna-Medisetty added 6 commits April 14, 2026 13:14

Merge branch 'main' into nextstep-online-e2e-xpu

c312e1d

Update test_nextstep_expansion.py

7ff00b3

Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com>

Merge branch 'main' into nextstep-online-e2e-xpu

1017b9b

Merge branch 'main' into nextstep-online-e2e-xpu

3446d36

Merge branch 'main' into nextstep-online-e2e-xpu

cd92d10

Merge branch 'main' into nextstep-online-e2e-xpu

c5dc7c7

hsliuustc0106 merged commit b5ddff7 into vllm-project:main Apr 17, 2026
8 checks passed

Conversation

Joshna-Medisetty commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

What’s included

Test plan

Test Results (Intel XPU / 2× L4)

Sample Output

Uh oh!

Joshna-Medisetty commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Test Evidence Request

Requested Evidence

How to Provide

Uh oh!

congw729 left a comment

Choose a reason for hiding this comment

Uh oh!

xuechendi Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Joshna-Medisetty Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Joshna-Medisetty commented Mar 24, 2026

Uh oh!

Gaohan123 commented Mar 26, 2026

Uh oh!

Gaohan123 commented Mar 27, 2026

Uh oh!

Joshna-Medisetty commented Mar 27, 2026

Uh oh!

Joshna-Medisetty commented Mar 31, 2026

Uh oh!

congw729 commented Apr 1, 2026

Uh oh!

Joshna-Medisetty commented Apr 2, 2026

Uh oh!

fhfuih commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Joshna-Medisetty commented Mar 23, 2026 •

edited

Loading

Joshna-Medisetty Mar 24, 2026 •

edited

Loading