Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
3009359
re-cherry-pick commits
sniper35 Feb 18, 2026
1af0cd0
fix OmniDiffusionRequest params
sniper35 Feb 18, 2026
ce1609d
fix cfg2
sniper35 Feb 18, 2026
962617d
address comments
sniper35 Feb 18, 2026
5638b15
no need to change the path
sniper35 Feb 18, 2026
607fcd1
defensive import
sniper35 Feb 18, 2026
0d07ea9
test doc
sniper35 Feb 18, 2026
6b0b680
create empty file to be overwritten by main
sniper35 Feb 18, 2026
1cff682
remove related tests
sniper35 Feb 18, 2026
505112b
sync test fies to upstream/main
sniper35 Feb 18, 2026
2b45ad5
remove not used
sniper35 Feb 18, 2026
227ee81
refactoring
sniper35 Feb 18, 2026
ad44346
update docs for StepFun
sniper35 Feb 18, 2026
92755b6
address comments
sniper35 Feb 18, 2026
4438860
fix tests
sniper35 Feb 18, 2026
8861894
fix CI
sniper35 Feb 18, 2026
80593ca
restore files changed by mistake
sniper35 Feb 22, 2026
6b64db6
remove redundant chanes
sniper35 Feb 22, 2026
0f8c5cd
remove unwanted changes
sniper35 Feb 22, 2026
2fd69db
remove unwanted changes
sniper35 Feb 22, 2026
93765f6
add comments for non-diffuser models
sniper35 Feb 22, 2026
a394d6f
remove uncessary changes
sniper35 Feb 22, 2026
9f593fd
addressed comments
sniper35 Feb 22, 2026
308bd63
addressed comments
sniper35 Feb 22, 2026
be99090
Merge branch 'main' into nextstep_1
sniper35 Feb 22, 2026
8f6f613
Merge branch 'main' into nextstep_1
hsliuustc0106 Feb 23, 2026
5e7249f
update feauture support for stepfun
sniper35 Feb 25, 2026
50e6ebb
Merge branch 'main' into nextstep_1
sniper35 Feb 25, 2026
d053bf4
VAE-Patch-Parallel, need to test
sniper35 Feb 25, 2026
5db26f3
add CFG-Parallel and TP
sniper35 Feb 25, 2026
ff0499e
addressed comments
sniper35 Feb 25, 2026
36b02c0
refactoring && cleanup
sniper35 Feb 25, 2026
4c1ba59
ruff check
sniper35 Feb 25, 2026
5aafaf9
Merge branch 'main' into nextstep_1
sniper35 Feb 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ th {
|`Qwen3TTSForConditionalGeneration` | Qwen3-TTS-12Hz-1.7B-CustomVoice | `Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice` |
|`Qwen3TTSForConditionalGeneration` | Qwen3-TTS-12Hz-1.7B-VoiceDesign | `Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign` |
|`Qwen3TTSForConditionalGeneration` | Qwen3-TTS-12Hz-1.7B-Base | `Qwen/Qwen3-TTS-12Hz-0.6B-Base` |
|`NextStep11Pipeline` | NextStep-1.1 | `stepfun-ai/NextStep-1.1` |
Comment thread
sniper35 marked this conversation as resolved.


## List of Supported Models for NPU
Expand Down
1 change: 1 addition & 0 deletions docs/user_guide/diffusion_acceleration.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ The following table shows which models are currently supported by each accelerat
| **Stable-Diffusion3.5** | `stabilityai/stable-diffusion-3.5` | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
| **Bagel** | `ByteDance-Seed/BAGEL-7B-MoT` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| **FLUX.1-dev** | `black-forest-labs/FLUX.1-dev` | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
| **NextStep-1.1** | `stepfun-ai/NextStep-1.1` | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |

### VideoGen

Expand Down
34 changes: 31 additions & 3 deletions examples/offline_inference/text_to_image/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Text-To-Image

This folder provides several entrypoints for experimenting with `Qwen/Qwen-Image` `Qwen/Qwen-Image-2512` `Tongyi-MAI/Z-Image-Turbo` using vLLM-Omni:
This folder provides several entrypoints for experimenting with `Qwen/Qwen-Image` `Qwen/Qwen-Image-2512` `Tongyi-MAI/Z-Image-Turbo` `stepfun-ai/NextStep-1.1` using vLLM-Omni, note that NextStep-1.1 has different architecture so we treat it differently regarding running arguments and pipeline.

- `text_to_image.py`: command-line script for single image generation with advanced options.
- `web_demo.py`: lightweight Gradio UI for interactive prompt/seed/CFG exploration.
Expand Down Expand Up @@ -74,6 +74,8 @@ if __name__ == "__main__":

## Local CLI Usage

### Qwen/Tongyi Models

```bash
python text_to_image.py \
--model Tongyi-MAI/Z-Image-Turbo \
Expand All @@ -87,7 +89,26 @@ python text_to_image.py \
--output outputs/coffee.png
```

Key arguments:
### NextStep Models

NextStep-1.1 can have extra arguments
```bash
python text_to_image.py \
--model stepfun-ai/NextStep-1.1 \
--prompt "A baby panda wearing an Iron Man mask, holding a board with 'NextStep-1' written on it" \
--height 512 \
--width 512 \
--num-inference-steps 28 \
--guidance-scale 7.5 \
--guidance-scale-2 1.0 \
--cfg-schedule constant \
--output nextstep_output.png \
--seed 42
```

### Key Arguments

**Common arguments:**

- `--prompt`: text description (string).
- `--seed`: integer seed for deterministic sampling.
Expand All @@ -98,8 +119,15 @@ Key arguments:
- `--output`: path to save the generated PNG.
- `--vae-use-slicing`: enable VAE slicing for memory optimization.
- `--vae-use-tiling`: enable VAE tiling for memory optimization.
- `--cfg-parallel-size`: set it to 2 to enable CFG Parallel. See more examples in [`user_guide`](../../../docs/user_guide/diffusion/parallelism_acceleration.md#cfg-parallel).
- `--cfg-parallel-size`: set it to 2 to enable CFG Parallel. See more examples in [`user_guide`](../../../docs/user_guide/diffusion_acceleration.md#using-cfg-parallel).
- `--enable-cpu-offload`: enable CPU offloading for diffusion models.
- `--guidance-scale`: classifier-free guidance scale.

**NextStep-1.1 specific:**
- `--guidance-scale-2`: secondary guidance scale, e.g. image-level CFG (default: 1.0).
- `--timesteps-shift`: timesteps shift parameter for sampling (default: 1.0).
- `--cfg-schedule`: CFG schedule type, "constant" or "linear" (default: "constant").
- `--use-norm`: apply layer normalization to sampled tokens.

> ℹ️ If you encounter OOM errors, try using `--vae-use-slicing` and `--vae-use-tiling` to reduce memory usage.

Expand Down
91 changes: 73 additions & 18 deletions examples/offline_inference/text_to_image/text_to_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,26 @@
from vllm_omni.platforms import current_omni_platform


def is_nextstep_model(model_name: str) -> bool:
"""Check if the model is a NextStep model by reading its config."""
from vllm.transformers_utils.config import get_hf_file_to_dict

try:
cfg = get_hf_file_to_dict("config.json", model_name)
if cfg and cfg.get("model_type") == "nextstep":
return True
except Exception:
pass
return False


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Generate an image with Qwen-Image.")
parser = argparse.ArgumentParser(description="Generate an image with supported diffusion models.")
parser.add_argument(
"--model",
default="Qwen/Qwen-Image",
help="Diffusion model name or local path. Supported models: "
"Qwen/Qwen-Image, Tongyi-MAI/Z-Image-Turbo, Qwen/Qwen-Image-2512",
"Qwen/Qwen-Image, Tongyi-MAI/Z-Image-Turbo, Qwen/Qwen-Image-2512, stepfun-ai/NextStep-1.1",
)
parser.add_argument("--prompt", default="a cup of coffee on the table", help="Text prompt for image generation.")
parser.add_argument(
Expand Down Expand Up @@ -153,16 +166,43 @@ def parse_args() -> argparse.Namespace:
default=1,
help="Number of ranks used for VAE patch/tile parallelism (decode/encode).",
)
# NextStep-1.1 specific arguments
parser.add_argument(
"--guidance-scale-2",
type=float,
default=1.0,
help="Secondary guidance scale (e.g. image-level CFG for NextStep-1.1).",
)
parser.add_argument(
"--timesteps-shift",
type=float,
default=1.0,
help="[NextStep-1.1 only] Timesteps shift parameter for sampling.",
)
parser.add_argument(
"--cfg-schedule",
type=str,
default="constant",
choices=["constant", "linear"],
help="[NextStep-1.1 only] CFG schedule type.",
)
parser.add_argument(
"--use-norm",
action="store_true",
help="[NextStep-1.1 only] Apply layer normalization to sampled tokens.",
)
return parser.parse_args()


def main():
args = parse_args()
generator = torch.Generator(device=current_omni_platform.device_type).manual_seed(args.seed)
use_nextstep = is_nextstep_model(args.model)

# Configure cache based on backend type
cache_config = None
if args.cache_backend == "cache_dit":
cache_backend = args.cache_backend

if cache_backend == "cache_dit":
# cache-dit configuration: Hybrid DBCache + SCM + TaylorSeer
# All parameters marked with [cache-dit only] in DiffusionCacheConfig
cache_config = {
Expand All @@ -179,7 +219,7 @@ def main():
"scm_steps_mask_policy": None, # SCM mask policy: None (disabled), "slow", "medium", "fast", "ultra"
"scm_steps_policy": "dynamic", # SCM steps policy: "dynamic" or "static"
}
elif args.cache_backend == "tea_cache":
elif cache_backend == "tea_cache":
# TeaCache configuration
# All parameters marked with [tea_cache only] in DiffusionCacheConfig
cache_config = {
Expand Down Expand Up @@ -213,19 +253,24 @@ def main():
elif args.quantization:
quant_kwargs["quantization"] = args.quantization

omni = Omni(
model=args.model,
enable_layerwise_offload=args.enable_layerwise_offload,
vae_use_slicing=args.vae_use_slicing,
vae_use_tiling=args.vae_use_tiling,
cache_backend=args.cache_backend,
cache_config=cache_config,
enable_cache_dit_summary=args.enable_cache_dit_summary,
parallel_config=parallel_config,
enforce_eager=args.enforce_eager,
enable_cpu_offload=args.enable_cpu_offload,
# Initialize Omni with model-specific settings
omni_kwargs = {
"model": args.model,
"enable_layerwise_offload": args.enable_layerwise_offload,
"vae_use_slicing": args.vae_use_slicing,
"vae_use_tiling": args.vae_use_tiling,
"cache_backend": cache_backend,
"cache_config": cache_config,
"enable_cache_dit_summary": args.enable_cache_dit_summary,
"parallel_config": parallel_config,
"enforce_eager": args.enforce_eager,
"enable_cpu_offload": args.enable_cpu_offload,
**quant_kwargs,
)
}
if use_nextstep:
# NextStep-1.1 requires explicit pipeline class
omni_kwargs["model_class_name"] = "NextStep11Pipeline"
omni = Omni(**omni_kwargs)

if profiler_enabled:
print("[Profiler] Starting profiling...")
Expand All @@ -236,7 +281,7 @@ def main():
print("Generation Configuration:")
print(f" Model: {args.model}")
print(f" Inference steps: {args.num_inference_steps}")
print(f" Cache backend: {args.cache_backend if args.cache_backend else 'None (no acceleration)'}")
print(f" Cache backend: {cache_backend if cache_backend else 'None (no acceleration)'}")
print(f" Quantization: {args.quantization if args.quantization else 'None (BF16)'}")
if ignored_layers:
print(f" Ignored layers: {ignored_layers}")
Expand All @@ -250,6 +295,13 @@ def main():
print(f"{'=' * 60}\n")

generation_start = time.perf_counter()

extra_args = {
"timesteps_shift": args.timesteps_shift,
"cfg_schedule": args.cfg_schedule,
"use_norm": args.use_norm,
}

outputs = omni.generate(
{
"prompt": args.prompt,
Expand All @@ -261,10 +313,13 @@ def main():
generator=generator,
true_cfg_scale=args.cfg_scale,
guidance_scale=args.guidance_scale,
guidance_scale_2=args.guidance_scale_2,
num_inference_steps=args.num_inference_steps,
num_outputs_per_prompt=args.num_images_per_prompt,
extra_args=extra_args,
),
)

generation_end = time.perf_counter()
generation_time = generation_end - generation_start

Expand Down
Loading