[Docs][CI] doc update & L4 example test for text-to-image page by fhfuih · Pull Request #1910 · vllm-project/vllm-omni

fhfuih · 2026-03-16T06:48:07Z

Purpose

As per #1244 , there is the need to update example documentation.

Also to follow the recent establishment of multi-level testing system, this PR adds L4 test for all the example code snippets appeared on "Online Serving/Text to Image" and "Offline Inference/Text to Image" doc pages. It can also serve as an example for future contribution of other example tests.

Test Plan

Methods

Per offline discussion with @yenuo26 and @wtomin , the offline version tries to dynamically extract all python and bash code blocks in the markdown file and directly run them.

Meanwhile, this approach is currently not used in the online version, because we have to additionally tell server-launching scripts from client-request-sending scripts; the code can become way messier. Hence, the online version copies the codes from the code blocks. (More flexible for complex setup, but requires future doc changes to sync with test script)

Runtime config

Currently, we decide to try not lowering down num_inference_steps in examples. Thus, a test script may run 50 inference steps. This is to fully recreate real-world scenarios of running these examples.

Test naming rule

NOTE, open to discussion: Due to the different approaches in online and offline tests (elaborated above), their test cases have different naming conventions

The offline tests, dynamically extracting test scripts, only benefits together with pytest parametrization. So the naming has to be test_{the name of only one test function}[distinguisher in parametrization ID]
The online tests, hand-written & copied from corresponding code blocks, has to be defined in separate test functions, hence the naming has to be test_{distinguisher in function name}[only a dummy parametrization ID for omni_server dependency injection]

> pytest tests/examples/offline_inference/test_text_to_image.py tests/examples/online_serving/test_text_to_image.py --collect-only

<Dir vllm-omni>
  <Package tests>
    <Dir examples>
      <Package offline_inference>
        <Module test_text_to_image.py>
          <Function test_text_to_image[basic_usage_001]>
          <Function test_text_to_image[basic_usage_002]>
          <Function test_text_to_image[basic_usage_003]>
          <Function test_text_to_image[local_cli_usage_001]>
          <Function test_text_to_image[local_cli_usage_002]>
          <Function test_text_to_image[local_cli_usage_003]>
          <Function test_text_to_image[lora_001]>
          <Function test_text_to_image[web_ui_demo_001]>
      <Package online_serving>
        <Module test_text_to_image.py>
          <Function test_api_calls_001[omni_server0]>
          <Function test_api_calls_002[omni_server0]>
          <Function test_lora_001[omni_server0]>
          <Function test_api_calls_003>
          <Function test_lora_002>

Exclusion Rule

Gradio scripts are excluded from this test
- Example: test_api_calls_003
tests that largely overlap other existing tests may be excluded
- Example: test_lora_002

Output folder structure

Three-layer folder structure:

root output dir: optionally set an OUTPUT_DIR env variable, otherwise pytest auto-creates one under /tmp
doc page dir: manually set a global variable in each test_XXX.py file, can include opinionated abbreviations. Example: example_offline_t2i
test case dir: should be the same as the test case name, e.g., basic_usage_001. This is automatically done when dynamically extracting code from markdown. But for the case of manually copying code content, still need to pay attention to this.
relevant files produced by the script, such as output.png. (The dynamically extracted python scripts are also saved here)

Example file output:

├── example_offline_t2i
│   ├── basic_usage_001
│   │   ├── coffee.png
│   │   └── snippet.py
│   ├── basic_usage_002
│   │   ├── 0.jpg
│   │   ├── 1.jpg
│   │   ├── 2.jpg
│   │   └── snippet.py
│   ├── basic_usage_003
│   │   ├── 0.jpg
│   │   ├── 1.jpg
│   │   └── snippet.py
│   └── local_cli_usage_001
│       └── outputs
│           └── coffee.png
└── example_online_t2i
    ├── api_calls_001
    │   └── api_calls_001.png
    ├── api_calls_002
    │   └── api_calls_002.png
    └── lora_001
        └── lora_001.png

Test Result

Passed on my side
See the bottom comment for CI results

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copilot

Pull request overview

Adds L4 “examples” tests for the text-to-image documentation (online serving + offline inference), and introduces shared helpers/fixtures to extract and execute README code blocks.

Changes:

Add online serving text-to-image example tests and offline inference tests driven by README snippet extraction.
Introduce tests/examples/conftest.py with shared output-dir fixture, README snippet extraction, and subprocess helpers.
Refactor existing online serving Qwen Omni example tests to reuse shared helpers; update docs/snippets accordingly.

Reviewed changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/examples/online_serving/test_text_to_image.py	New online-serving text-to-image example tests (curl + Python client + LoRA).
tests/examples/offline_inference/test_text_to_image.py	New offline text-to-image example tests that execute extracted README snippets and validate outputs.
tests/examples/conftest.py	New shared infra: output dir fixture, README snippet parsing, runner, subprocess + parsing helpers.
tests/examples/online_serving/test_qwen3_omni.py	Remove local helper duplication; import shared helpers; align module markers/docstring.
tests/examples/online_serving/test_qwen2_5_omni.py	Same refactor as qwen3_omni.
pyproject.toml	Add `mistune` dependency for README AST parsing in example tests.
examples/offline_inference/text_to_image/README.md	Fix example code (don’t assign `.save()` result) and replace non-ASCII comma characters.
docs/contributing/ci/tests_style.md	Document examples→tests mapping in the test style guide.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc23c952e7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

wtomin · 2026-03-16T08:54:22Z

How to trigger this L4 text-to-image example test temporally on the CI machine? I think H100 machine can accomodate the VRAM requirements of majority of models.

fhfuih · 2026-03-16T10:02:00Z

How to trigger this L4 text-to-image example test temporally on the CI machine? I think H100 machine can accomodate the VRAM requirements of majority of models.

I just edited the pipeline yamls to temporarily do so. Could you help add a ready tag to trigger CI?

wtomin · 2026-03-16T10:04:03Z

When it is ready to merge (which I suppose should be after 3/17), please let me know. @fhfuih

yenuo26 · 2026-03-16T12:13:16Z

+Use one of the following patterns depending on page type:
+
+- **Dynamic code-block extraction (preferred for offline docs)**
+  - Extract Python/Bash code blocks from markdown AST analyzer, then execute them directly in tests. See [https://github.com/vllm-project/vllm-omni/blob/main/tests/examples/offline_inference/test_text_to_image.py](https://github.com/vllm-project/vllm-omni/blob/main/tests/examples/offline_inference/test_text_to_image.py) for reference implementation.


Perhaps we can also briefly explain here how to write such test cases and what parameters ExampleRunner support.

Done. Please check again

hsliuustc0106 · 2026-03-18T10:12:38Z

conflicts

fhfuih · 2026-03-19T08:00:04Z

This file is to explicitly exclude the newly add .inc.md file.

Previously without this file, all *.md files in this folder are implicitly added to doc. Now that L4 tests receive additional guides, I put them in separate subfiles to avoid cluttering CI_5levels.md. This is the same design pattern as docs/getting_started/installation/...

fhfuih · 2026-03-19T08:02:51Z

Apart from adding doc test guides to L4 test documentation, I also change the previous <details><summary>... fold block to MkDocs-native ???+ example ... fold block syntax. Everything in the former block is expected to be HTML, not markdown, so all formatting is lost. The original content in this block is only indented

fhfuih · 2026-03-19T08:05:48Z

All tests passed, with one exception: FLUX.2-dev seems to fail in CI machine, reporting missing model_index.json. Maybe because FLUX require additional user agreement and huggingface somehow bans the resource loading---although there is no Timeout or Auth -related error.

~~After skipping this case and adding a TODO note in the test script, all other tests work fine.~~

Please see the updated comment below

The latest test (skipped FLUX, all other passed, took 23 minutes): https://buildkite.com/vllm/vllm-omni/builds/4447/steps/canvas?sid=019d04fa-620d-4f33-b33c-3f9c2b81f8e2
a previous test with only FLUX failing: https://buildkite.com/vllm/vllm-omni/builds/4441/steps/canvas?sid=019d04bf-4e2f-4b30-8def-52b41a7369a6
The current CI YAML is reverted after the above tests are done

@hsliuustc0106 @wtomin @yenuo26 @congw729 PTAL. I also added some self-comment (annotations) to some of my file changes above, for the sake of clarification.

hsliuustc0106

Review Summary

This PR establishes a solid foundation for L4 documentation example testing with a well-designed approach:

Dynamic extraction from markdown for offline tests keeps docs and tests synchronized
Copied scripts for online tests handles server/client complexity appropriately
Comprehensive documentation for future contributors in doc_example_tests.inc.md

✅ Validated

Gate checks passing (DCO, pre-commit, mergeable, build, CI)
Test infrastructure design follows good patterns
Documentation updates are appropriate and complete
PR description includes clear test plan and naming conventions
Skip conditions for known issues (FLUX.2-dev, Web UI Demo) are reasonable

📝 Minor (non-blocking)

PR description checklist has docs unchecked but docs are updated - minor inconsistency
write_zimage_lora duplication noted as TODO in code

Good work on establishing this testing pattern for the project!

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

fhfuih · 2026-03-19T10:16:54Z

With recent rebase, the HF token issue is fixed, and now FLUX model can run as well. All tests pass in this CI: https://buildkite.com/vllm/vllm-omni/builds/4472/steps/canvas?sid=019d056d-120a-4acb-9924-3457e40176e5

The temporary modifications to CI pipelines have been reverted

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

fhfuih · 2026-03-23T02:25:44Z

Fixed conflict in doc. Not affecting previous test results

…project#1910) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> Signed-off-by: hongyi.zhang <hongyi.zhang@bytedance.com>

…project#1910) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

fhfuih requested a review from hsliuustc0106 as a code owner March 16, 2026 06:48

Copilot AI review requested due to automatic review settings March 16, 2026 06:48

fhfuih mentioned this pull request Mar 16, 2026

[RFC]: Supplement use cases for L1, L3, and L4 JiusiServe/vllm-omni#163

Closed

1 task

Copilot AI reviewed Mar 16, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread tests/examples/online_serving/test_qwen2_5_omni.py

fhfuih force-pushed the test-example branch from fc23c95 to 7f419f9 Compare March 16, 2026 07:32

Copilot started reviewing on behalf of fhfuih March 16, 2026 07:40 View session

fhfuih changed the title ~~L4 test for text-to-image doc example code (online+offline)~~ [Docs][CI] doc update & L4 example test for text-to-image page Mar 16, 2026

wtomin mentioned this pull request Mar 16, 2026

[RFC]: Diffusion Offline Examples Docs & Test #1244

Open

1 task