Skip to content

[Quantization] Redo Z-Image text encoder FP8 online quantization#3279

Merged
lishunyang12 merged 40 commits into
vllm-project:mainfrom
Isotr0py:redo-encoder-quant
May 6, 2026
Merged

[Quantization] Redo Z-Image text encoder FP8 online quantization#3279
lishunyang12 merged 40 commits into
vllm-project:mainfrom
Isotr0py:redo-encoder-quant

Conversation

@Isotr0py
Copy link
Copy Markdown
Member

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

python examples/offline_inference/text_to_image/text_to_image.py --model /mnt/data0/LLM/Z-Image-Turbo/ --width 512 --height 512 --quantization fp8 --tensor-parallel-size 2

Test Result

INFO 04-30 18:20:24 [diffusers_loader.py:355] Loading weights took 8.20 seconds
INFO 04-30 18:20:25 [diffusion_model_runner.py:142] Model loading took 7.7360 GiB and 11.881425 seconds
INFO 04-30 18:20:25 [diffusion_model_runner.py:147] Model runner: Model loaded successfully.
INFO 04-30 18:20:25 [diffusion_model_runner.py:142] Model loading took 7.7371 GiB and 11.893948 seconds
INFO 04-30 18:20:25 [diffusion_model_runner.py:147] Model runner: Model loaded successfully.
INFO 04-30 18:20:25 [diffusion_model_runner.py:86] Model runner: transformer compiled with torch.compile.
INFO 04-30 18:20:25 [diffusion_model_runner.py:188] Model runner: Initialization complete.
INFO 04-30 18:20:25 [diffusion_model_runner.py:86] Model runner: transformer compiled with torch.compile.
INFO 04-30 18:20:25 [diffusion_model_runner.py:188] Model runner: Initialization complete.
INFO 04-30 18:20:26 [diffusion_worker.py:199] Worker 0: Process-scoped GPU memory after model loading: 8.25 GiB.
INFO 04-30 18:20:26 [manager.py:96] Initializing DiffusionLoRAManager: device=cuda:0, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
INFO 04-30 18:20:26 [diffusion_worker.py:97] Worker 0: Initialization complete.
INFO 04-30 18:20:26 [diffusion_worker.py:703] Worker 0: Scheduler loop started.
INFO 04-30 18:20:26 [diffusion_worker.py:613] Worker 0 ready to receive requests via shared memory
INFO 04-30 18:20:26 [diffusion_worker.py:199] Worker 1: Process-scoped GPU memory after model loading: 8.27 GiB.
INFO 04-30 18:20:26 [manager.py:96] Initializing DiffusionLoRAManager: device=cuda:1, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
INFO 04-30 18:20:26 [diffusion_worker.py:97] Worker 1: Initialization complete.
INFO 04-30 18:20:26 [diffusion_worker.py:703] Worker 1: Scheduler loop started.
INFO 04-30 18:20:26 [diffusion_worker.py:613] Worker 1 ready to receive requests via shared memory
INFO 04-30 18:20:26 [diffusion_engine.py:443] dummy run to warm up the model
INFO 04-30 18:20:26 [kv_transfer_manager.py:1268] Rank-aware KV receive: rank 1 independently receiving (from_tp=2, to_tp=2)
WARNING 04-30 18:20:26 [kv_transfer_manager.py:985] No connector available for receiving KV cache
INFO 04-30 18:20:26 [kv_transfer_manager.py:1268] Rank-aware KV receive: rank 0 independently receiving (from_tp=2, to_tp=2)
WARNING 04-30 18:20:26 [kv_transfer_manager.py:985] No connector available for receiving KV cache
WARNING 04-30 18:20:26 [pipeline_z_image.py:541] strength parameter (0.60) is only applicable for image-to-image (I2I) generation. It will be ignored for text-to-image (T2I) generation.
INFO 04-30 18:20:26 [marlin_utils.py:433] Marlin kernel can achieve better performance for small size_n with experimental use_atomic_add feature. You can consider set environment variable VLLM_MARLIN_USE_ATOMIC_ADD to 1 if possible.
INFO 04-30 18:20:31 [diffusion_model_runner.py:213] Peak GPU memory (this request): 8.78 GB reserved, 8.34 GB allocated, 0.44 GB pool overhead (5.0%)
INFO 04-30 18:20:31 [inline_stage_diffusion_client.py:63] [InlineStageDiffusionClient] Stage-0 initialized inline (batch_size=1)
INFO 04-30 18:20:31 [async_omni_engine.py:823] [AsyncOmniEngine] Stage 0 initialized (diffusion, batch_size=1)
INFO 04-30 18:20:31 [orchestrator.py:192] [Orchestrator] Starting event loop
INFO 04-30 18:20:31 [async_omni_engine.py:377] [AsyncOmniEngine] Orchestrator ready with 1 stages
INFO 04-30 18:20:31 [omni_base.py:162] [Omni] AsyncOmniEngine initialized in 28.80 seconds
INFO 04-30 18:20:31 [omni_base.py:181] [Omni] Initialized with 1 stages for model /mnt/data0/LLM/Z-Image-Turbo/

============================================================
Generation Configuration:
  Model: /mnt/data0/LLM/Z-Image-Turbo/
  Inference steps: 50
  Cache backend: None (no acceleration)
  Quantization: fp8
  Parallel configuration: tensor_parallel_size=2, ulysses_degree=1, ulysses_mode=strict, ring_degree=1, cfg_parallel_size=1, vae_patch_parallel_size=1, enable_expert_parallel=False.
  CPU offload: False; CPU Layerwise Offload: False
  Image size: 1024x1024
============================================================

INFO 04-30 18:20:31 [orchestrator.py:894] [Orchestrator] _handle_add_request: stage=0 req=0_1e0f8561-372e-466f-bbd5-d2a534740a7a prompt_type=dict original_prompt_type=dict final_stage=0 num_sampling_params=1
Processed prompts:   0%|                                                                                                                              | 0/1 [00:00<?, ?it/s]INFO 04-30 18:20:31 [kv_transfer_manager.py:1268] Rank-aware KV receive: rank 0 independently receiving (from_tp=2, to_tp=2)
WARNING 04-30 18:20:31 [kv_transfer_manager.py:985] No connector available for receiving KV cache
INFO 04-30 18:20:31 [kv_transfer_manager.py:1268] Rank-aware KV receive: rank 1 independently receiving (from_tp=2, to_tp=2)
WARNING 04-30 18:20:31 [pipeline_z_image.py:541] strength parameter (0.60) is only applicable for image-to-image (I2I) generation. It will be ignored for text-to-image (T2I) generation.
WARNING 04-30 18:20:31 [kv_transfer_manager.py:985] No connector available for receiving KV cache
INFO 04-30 18:22:03 [diffusion_model_runner.py:213] Peak GPU memory (this request): 12.17 GB reserved, 10.13 GB allocated, 2.04 GB pool overhead (16.8%)
INFO 04-30 18:22:03 [diffusion_engine.py:127] Generation completed successfully.
INFO 04-30 18:22:03 [diffusion_engine.py:174] Post-processing completed in 0.0336 seconds
INFO 04-30 18:22:03 [diffusion_engine.py:177] DiffusionEngine.step breakdown: preprocess=0.00 ms, add_req_and_wait=91458.09 ms, postprocess=33.60 ms, total=91492.05 ms
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:31<00:00, 91.49s/it]
Total generation time: 91.4952 seconds (91495.16 ms)
INFO 04-30 18:22:03 [text_to_image.py:499] Outputs: [OmniRequestOutput(request_id='0_1e0f8561-372e-466f-bbd5-d2a534740a7a', finished=True, stage_id=0, final_output_type='image', request_output=OmniRequestOutput(request_id='0_1e0f8561-372e-466f-bbd5-d2a534740a7a', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt={'prompt': 'a cup of coffee on the table', 'negative_prompt': None}, latents=None, metrics={'preprocess_time_ms': 0.0, 'diffusion_engine_exec_time_ms': 91458.09239707887, 'diffusion_engine_total_time_ms': 91492.05475999042, 'image_num': 1, 'resolution': 640, 'postprocess_time_ms': 33.60368404537439}, multimodal_output={}, custom_output={}, stage_durations={'queue_wait_ms': 0.609284732490778, 'stage_0_gen_ms': 91493.41702461243}, peak_memory_mb=12462.0), images=[1 PIL Images], prompt=None, latents=None, metrics={}, multimodal_output={}, custom_output={}, stage_durations={'queue_wait_ms': 0.609284732490778, 'stage_0_gen_ms': 91493.41702461243}, peak_memory_mb=12462.0)]
Saved generated image to qwen_image_output.png
qwen_image_output
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…ader

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…ader

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…ader

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…ader

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…ader

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
@Isotr0py Isotr0py requested a review from hsliuustc0106 as a code owner April 30, 2026 10:25
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

download_safetensors_index_file_from_hf(
model_name_or_path,
index_file_with_subfolder,
self.load_config.download_dir,

P1 Badge Skip index download when no safetensors index exists

When a component has .safetensors weights but no index file (common for single-file components like vae/diffusion_pytorch_model.safetensors), available_index_file is empty and index_file_with_subfolder becomes None; this code still calls download_safetensors_index_file_from_hf(...) with that None value. In vLLM, that helper forwards index_file to hf_hub_download(filename=...), so remote loading fails at runtime for these repos (local-path loads may not hit this path).

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@Isotr0py Isotr0py added the ready label to trigger buildkite CI label Apr 30, 2026
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
@Isotr0py Isotr0py requested a review from lishunyang12 April 30, 2026 15:26
@Isotr0py
Copy link
Copy Markdown
Member Author

@lishunyang12 CI should pass this time.

@lishunyang12
Copy link
Copy Markdown
Collaborator

Please check again. Seems CI failed still. Thanks.

@hsliuustc0106 hsliuustc0106 added the merge-test label to trigger buildkite merge test CI label Apr 30, 2026
@Isotr0py
Copy link
Copy Markdown
Member Author

Isotr0py commented May 1, 2026

=================================================================== short test summary info ====================================================================
--
FAILED tests/e2e/online_serving/test_mimo_audio.py::test_audio_to_text_audio_001[omni_server0] - AssertionError: The audio content is not same as the text
=============================================== 1 failed, 4 passed, 1 skipped, 25 warnings in 711.27s (0:11:51) ================================================

I thought the failing CI is omni model instead, which should be unrelated to this PR.

Comment thread vllm_omni/diffusion/model_loader/diffusers_loader.py Outdated
Isotr0py and others added 3 commits May 3, 2026 23:45
Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
@Isotr0py
Copy link
Copy Markdown
Member Author

Isotr0py commented May 5, 2026

@lishunyang12 @hsliuustc0106 All merge-tests should pass this time.

@Isotr0py Isotr0py added this to the v0.20.0 milestone May 5, 2026
@lishunyang12 lishunyang12 enabled auto-merge (squash) May 6, 2026 10:04
@lishunyang12 lishunyang12 merged commit 0a8204c into vllm-project:main May 6, 2026
8 checks passed
@Isotr0py Isotr0py deleted the redo-encoder-quant branch May 7, 2026 01:27
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
…m-project#3279)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
pi314ever added a commit to pi314ever/vllm-omni that referenced this pull request May 19, 2026
commit a3d4ed809d56977eb632e8a63aae1fc090a790e3
Author: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Date:   Wed May 20 00:14:08 2026 +0800

    [Quantization][tools] Add diffusion quantization output comparison tool (#3175)

    Signed-off-by: david6666666 <530634352@qq.com>
    Signed-off-by: David Chen <530634352@qq.com>

commit 3c58868c9a4fb7f0b1754d07738d1f87d3af5dae
Author: dengyunyang <584797741@qq.com>
Date:   Tue May 19 22:22:27 2026 +0800

    [BugFix] fix mult cli timeout with get kv (#3741)

    Signed-off-by: dengyunyang <584797741@qq.com>

commit da5361879395d45d5017fb575a7446cb36774bf4
Author: Shin <shin@yixiaoer.sg>
Date:   Tue May 19 19:56:38 2026 +0800

    [Recipe] Qwen/Qwen-Image-Edit (#3684)

    Signed-off-by: yixiaoer <shin@yixiaoer.sg>

commit 18186db216319684e3e0d2c268d6a0409525fc2e
Author: Schatten <3192396192@qq.com>
Date:   Tue May 19 19:23:45 2026 +0800

    [Cleanup] Remove unused build_base_engine_args after #1115 (#3720)

    Signed-off-by: Schatten <czhengt@qq.com>

commit 14e5baceaf240e78d1a0c5dcc883563db23eb703
Author: Lu <luludachiever@gmail.com>
Date:   Tue May 19 19:19:58 2026 +0800

    [Qwen-Image] Drop unused vision tower from text encoder (#3608)

    Signed-off-by: lulugoodcoder <luludachiever@gmail.com>
    Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>

commit 2af2a50e0e2981ec2eef32e704f5a66c3d451c95
Author: wangyu <53896905+yenuo26@users.noreply.github.com>
Date:   Tue May 19 15:22:02 2026 +0800

    [CI] improve Buildkite testcase statistics reports (#3543)

    Signed-off-by: wangyu <410167048@qq.com>

commit bd83ac9b4b6f7f3a64d13a1695d5a51e73164075
Author: wangyu <53896905+yenuo26@users.noreply.github.com>
Date:   Tue May 19 14:28:47 2026 +0800

    [CI] invalid_param reliability suite and weekly http_invalid jobs (#3652)

    Signed-off-by: wangyu <410167048@qq.com>

commit e277feacaf859c1aa3f2f7354d6fc396cf06ba5d
Author: chickeyton <ngton2014@gmail.com>
Date:   Tue May 19 12:08:20 2026 +0800

    [large-scale-serving] Integrate OmniCoordinator into stage engine pipeline (#3569)

    Signed-off-by: chickeyton <ngton2014@gmail.com>
    Signed-off-by: herotai214 <herotai214@gmail.com>
    Co-authored-by: herotai214 <herotai214@gmail.com>

commit ca9fd0b71ce04fa6283154c0ee7f32fcfc2eaf11
Author: JiaHong <2360655509@qq.com>
Date:   Tue May 19 11:52:41 2026 +0800

    Reject non-positive Flux2 Klein inference steps (#3717)

    Signed-off-by: MmMaiIIi <2360655509@qq.com>

commit 3ac739817f5afce9b5a291c2eddaccf5c1927cab
Author: JiaHong <2360655509@qq.com>
Date:   Tue May 19 11:30:54 2026 +0800

    [Bugfix] Reject empty prompts in Flux2 Klein diffusion pipeline (#3711)

    Signed-off-by: MmMaiIIi <2360655509@qq.com>
    Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>

commit 1fa734419ec6b578537aa5267c0d42f006499201
Author: bjf-frz <frz123db@gmail.com>
Date:   Tue May 19 11:30:33 2026 +0800

    [Refactor]Rename diffusion benchmark backend to endpoint (#3137)

    Signed-off-by: bjf-frz <frz123db@gmail.com>
    Signed-off-by: bjfwhite <baijingfan1@huawei.com>
    Co-authored-by: bjfwhite <baijingfan1@huawei.com>

commit 2c6b1bb0c0b814aa562770737e8d0a6dd7c848f7
Author: fan2956 <zhoufan53@huawei.com>
Date:   Tue May 19 10:24:27 2026 +0800

    [Bugfix] Fix hunyuanimage3 dit quant storageshape mismatch error (#3694)

    Signed-off-by: fan2956 <zhoufan53@huawei.com>

commit e2ed1c457455f8460182873111882b46829dc2df
Author: Daniel Huang <daniel1.huang@intel.com>
Date:   Mon May 18 19:19:16 2026 -0700

    Disable sampler kernel for XPU test (#3718)

    Signed-off-by: Daniel Huang <daniel1.huang@intel.com>

commit 89f8819525589141fd825ce4f0d1e1be9cf3660b
Author: Rustam Khadipash <16683750+hadipash@users.noreply.github.com>
Date:   Tue May 19 09:42:17 2026 +0800

    [Feature] Add support for Pipeline Parallel and integrate it into Wan 2.2 (#2322)

    Signed-off-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com>

commit 475a4002b0136235b4feb22d4a1e4b221ca5e112
Author: Chendi.Xue <chendi.xue@intel.com>
Date:   Mon May 18 18:58:15 2026 -0500

    [XPU] set flash_attn as default diffusion attn backend and fix k_len for cross_attn (#3525)

    Signed-off-by: Chendi Xue <chendi.xue@intel.com>

commit ab59673f21d804729557ac53d4c839e6d7353afb
Author: Sy03 <1370724210@qq.com>
Date:   Tue May 19 02:59:23 2026 +0800

    [Bugfix][Qwen3-Omni] Handle short Code2Wav chunk outputs (#3687)

    Signed-off-by: Sy03 <1370724210@qq.com>
    Co-authored-by: amy-why-3459 <wuhaiyan17@huawei.com>

commit 821286794f1afaac7d44d7a75371e87527b30d22
Author: lyj-jjj <liuyingjun5@huawei.com>
Date:   Tue May 19 00:35:30 2026 +0800

    [HY-Imgae3.0] support hunyuan image3 dit fa-fp8 on npu (#3540)

    Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
    Co-authored-by: Cursor <cursoragent@cursor.com>

commit 309e5c38c665b91a9818f03dd5c515878caf0e53
Author: amy-why-3459 <wuhaiyan17@huawei.com>
Date:   Mon May 18 21:25:37 2026 +0800

    [BugFix][CI]Fixing occasional CI failures (#3623)

    Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

commit f4115bd7716e1d29c8233bc8a69125dfdd35b3d1
Author: Ding Zuhao <e1583181@u.nus.edu>
Date:   Mon May 18 21:12:46 2026 +0800

    [Bugfix] Fix SenseNova U1 broken import after SupportsModuleOffload  (#3691)

    Signed-off-by: nussejzz <nussejzz@users.noreply.github.com>
    Co-authored-by: nussejzz <nussejzz@users.noreply.github.com>

commit dbc589dbca09df88714ba433ee241c3aa6690235
Author: Lancer <maruixiang6688@gmail.com>
Date:   Mon May 18 17:23:40 2026 +0800

    [Bugfix] fix diffusion quantization benchmarking for Omni outputs (#3653)

    Signed-off-by: Lancer <maruixiang6688@gmail.com>

commit 990566aef10c69ac1fa3073437be0a3333b3dc15
Author: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Date:   Mon May 18 05:18:18 2026 -0400

    [Bugfix][TTS] Drop meaningless TTFT from speech-endpoint benchmarks (#3674)

    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

commit 6d37e77fb2d9b9f4625a022ccffcafdff3134ef7
Author: Chendi.Xue <chendi.xue@intel.com>
Date:   Sun May 17 22:06:56 2026 -0500

    [XPU]  update dockerfile and CI to 0.21.0 (#3675)

    Signed-off-by: Chendi Xue <chendi.xue@intel.com>

commit 4ba8e14981bb80a1835a7956357ebf32011b0c27
Author: wuhang <wuhang6@huawei.com>
Date:   Mon May 18 08:54:57 2026 +0800

    Fix diffusion engine cleanup lifecycle (#3494)

    Signed-off-by: wuhang <wuhang6@huawei.com>
    Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit c99df1ebd9f8007639507a6ba6e5dea09e0abd9c
Author: Sy03 <1370724210@qq.com>
Date:   Mon May 18 04:58:59 2026 +0800

    [TTS][Perf] Optimize Qwen3-TTS high-concurrency serving (#3662)

    Signed-off-by: Sy03 <1370724210@qq.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>

commit 0a395f9de11469255d8347a1ce48df56fef74888
Author: bjf-frz <frz123db@gmail.com>
Date:   Mon May 18 00:59:46 2026 +0800

    [SKILL]Add diffusion perf skill (#3461)

    Signed-off-by: bjf-frz <frz123db@gmail.com>

commit c0e132d973276e5c1213bd03d930718ff056fd57
Author: Hongsheng Liu <liuhongsheng4@huawei.com>
Date:   Mon May 18 00:02:34 2026 +0800

    [Doc] Reorganize available recipes into a table (#3671)

    Signed-off-by: hsliu <liuhongsheng4@huawei.com>
    Co-authored-by: deepseek-v4-pro <noreply@anthropic.com>

commit 471ddfe025db12bf6f117eb6dd66c40343849c21
Author: Hongsheng Liu <liuhongsheng4@huawei.com>
Date:   Sun May 17 23:36:46 2026 +0800

    [Doc] Simplify template example subtitle (#3669)

    Signed-off-by: hsliu <liuhongsheng4@huawei.com>
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

commit 8cfc9179e6545ee45c90be36cfdba43afcec788e
Author: Mike Qiu <qdy220091330@gmail.com>
Date:   Sun May 17 23:30:24 2026 +0800

    Fix reasoning_parser crash: reconstruct StructuredOutputsConfig from dict (#2845)

    Signed-off-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
    Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
    Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 0da9ffdb0d3023482e1e90d6563a3e379ed6a160
Author: Mike Qiu <qdy220091330@gmail.com>
Date:   Sun May 17 23:05:34 2026 +0800

    Fix output finish reason issue for audio chunk in stream mode (#2849)

    Signed-off-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
    Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 4e880537501d2b2935c97ddfe3dfdf2679d3e2dc
Author: TaffyOfficial <2587297563@qq.com>
Date:   Sun May 17 22:42:59 2026 +0800

    [BugFix][HunyuanImage3] Set MRoPE dynamic_arg_dims so graph mode can compile (#3630)

    Signed-off-by: TaffyOfficial <2324465096@qq.com>
    Co-authored-by: TaffyOfficial <2324465096@qq.com>
    Co-authored-by: Codex <codex@openai.com>

commit 768943b8791abf30a1cc7b1cf82cbbad5d5ee247
Author: Reid <61492567+reidliu41@users.noreply.github.com>
Date:   Sun May 17 22:10:26 2026 +0800

      [Frontend]Handle audio generate engine errors consistently (#3316)

    Signed-off-by: reidliu41 <reid201711@gmail.com>
    Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>

commit 220db62b3f6a7877e0eb39f3cb8f15ec219d4136
Author: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Date:   Sun May 17 21:58:44 2026 +0800

    [Bugfix] Adapt LTX-2 connector arg with diffusers 0.38.0 (#3661)

    Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

commit 5549b7f44a0bfa75c294d397f8742208e253c3d1
Author: Kevin H. Luu <khluu000@gmail.com>
Date:   Sun May 17 04:02:28 2026 -0700

    [CI/Build] Enable twine upload to PyPI (#3667)

    Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

commit bc26cad19a0443cc4f444d5bb843e55c1ac3e2f4
Author: Kevin H. Luu <khluu000@gmail.com>
Date:   Sun May 17 03:40:14 2026 -0700

    [CI/Build] Unify release pipeline with NIGHTLY=1 option, add x86_64/aarch64 image builds (#3428)

    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

commit 9c5e35f485a7d2037330ea74535e8373c739f350
Author: Alex Brooks <albrooks@redhat.com>
Date:   Sat May 16 17:33:03 2026 -0600

    [Config Refactor] Support Recursive Merging for Engine Args (#3009)

    Signed-off-by: Alex Brooks <albrooks@redhat.com>
    Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit a64ebf103b35fa48f42accf444e4f027c992009e
Author: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Date:   Sun May 17 07:32:34 2026 +0800

    [Refactor] Migrate and clean up TTS configs: CosyVoice3, OmniVoice, VoxCPM (#3338)

    Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
    Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

commit c08959ee040281ecd310293adeb82067fa2e5932
Author: TJian <tunjian.tan@embeddedllm.com>
Date:   Sat May 16 23:15:59 2026 +0800

    [ROCm] [CI] [Bugfix] Upgrade vllm version to v0.21.0 and ROCm 7.2.2 (#3659)

    Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

commit c5ac295e3c9f0b3425843b15964824a89cd271ae
Author: rongfu.leng <lenronfu@gmail.com>
Date:   Sat May 16 21:11:34 2026 +0800

    [Feat] Add helios support cache dit (#3470)

    Signed-off-by: rongfu.leng <lenronfu@gmail.com>

commit ea35a0cc4a35dcdb674af76d8279c084a6aaa181
Author: Zeng Chuang <zengchuang3@huawei.com>
Date:   Sat May 16 20:51:31 2026 +0800

    [Bugfix]update process name for dit stage (#3602)

    Signed-off-by: zengchuang <zengchuang3@huawei.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 0f4853ff86f3fd840f9404535c89961a48eb13e2
Author: wuhang <wuhang6@huawei.com>
Date:   Sat May 16 20:50:29 2026 +0800

    [Bugfix] Support diffusion worker dead detect when use inline engine (#3214)

    Signed-off-by: wuhang <wuhang6@huawei.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit b5e163cfcabbfdea73469c014766d104d2231e10
Author: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Date:   Sat May 16 20:08:51 2026 +0800

    [CI][Accuracy] Add Qwen-Image-2512 Qwen-Image-Edit-2511 pixel accuracy tests (#3502)

    Signed-off-by: david6666666 <530634352@qq.com>

commit d647e7e4cfa3c50bed50cc07e465365bc9627f0b
Author: dengyunyang <584797741@qq.com>
Date:   Sat May 16 19:35:05 2026 +0800

    [Hunyuanimage 3.0] hunyuan accuracy test (#3655)

    Signed-off-by: dengyunyang <584797741@qq.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 33220b1e39c51d87b982dd1d5e6abd8e20aa8b5a
Author: Nick Cao <ncao@redhat.com>
Date:   Sat May 16 07:34:04 2026 -0400

    [BugFix] Finish async_chunk requests without pad-token injection (#3613)

    Signed-off-by: Nick Cao <ncao@redhat.com>
    Co-authored-by: Claude <noreply@anthropic.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit eb4e60ee64f2e5cd785b43fdd3af9ff7822b5a4f
Author: Zhou Taichang <tzhouam@connect.ust.hk>
Date:   Sat May 16 18:18:43 2026 +0800

    [Rebase] Rebase to vllm v0.21.0 (#3530)

    Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
    Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk>
    Signed-off-by: NumberWan <wantszkin2003@gmail.com>
    Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
    Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
    Signed-off-by: Dnoob <dxpouo@gmail.com>
    Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
    Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
    Signed-off-by: rein yang <ruiruyang2@gmail.com>
    Signed-off-by: Nick Cao <ncao@redhat.com>
    Signed-off-by: zhumingjue <zhumingjue@huawei.com>
    Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
    Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
    Signed-off-by: gcanlin <canlinguosdu@gmail.com>
    Signed-off-by: wangyu <410167048@qq.com>
    Signed-off-by: weizhoublue <weizhoublue@github.com>
    Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
    Signed-off-by: dengyunyang <584797741@qq.com>
    Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
    Signed-off-by: David Chen <530634352@qq.com>
    Signed-off-by: Jie Liu <33612777+keeper-jie@users.noreply.github.com>
    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
    Signed-off-by: princepride <wangzhipeng628@gmail.com>
    Signed-off-by: natureofnature <wzliu@connect.hku.hk>
    Signed-off-by: bjf-frz <frz123db@gmail.com>
    Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
    Signed-off-by: KexiongYu <yukexiong1@huawei.com>
    Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
    Signed-off-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
    Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
    Co-authored-by: NumberWan <wantszkin2003@gmail.com>
    Co-authored-by: dsinghvi <divyanshsinghvi@gmail.com>
    Co-authored-by: Dnoob <dxpouo@gmail.com>
    Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
    Co-authored-by: Samit <285365963@qq.com>
    Co-authored-by: rein yang <73573651+R2-Y@users.noreply.github.com>
    Co-authored-by: Nick Cao <ncao@redhat.com>
    Co-authored-by: zhumingjue138 <zhumingjue@huawei.com>
    Co-authored-by: Ricardo Noriega <rnoriega@redhat.com>
    Co-authored-by: lyj-jjj <liuyingjun5@huawei.com>
    Co-authored-by: Cursor <cursoragent@cursor.com>
    Co-authored-by: gcanlin <canlinguosdu@gmail.com>
    Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com>
    Co-authored-by: weizhoublue <45163302+weizhoublue@users.noreply.github.com>
    Co-authored-by: weizhoublue <weizhoublue@github.com>
    Co-authored-by: dengyunyang <584797741@qq.com>
    Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
    Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
    Co-authored-by: Jie Liu <33612777+keeper-jie@users.noreply.github.com>
    Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
    Co-authored-by: NATURE <wzliu@connect.hku.hk>
    Co-authored-by: bjf-frz <frz123db@gmail.com>
    Co-authored-by: amy-why-3459 <wuhaiyan17@huawei.com>
    Co-authored-by: Y. Fisher <yukexiong1@huawei.com>
    Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
    Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

commit 5e1986206f7381757d51c507dcbd54b553889fb1
Author: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Date:   Fri May 15 11:35:44 2026 -0400

    [CI] Replace c=128 perf cell with c=16; loosen new-cell baselines (#3637)

    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit d1c65bdffaa21799d6a4dc34086ceb68dea9fe9d
Author: amy-why-3459 <wuhaiyan17@huawei.com>
Date:   Fri May 15 22:50:03 2026 +0800

    [BugFix] fix ci (#3650)

    Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

commit d18168ccbbb1b3735b43d25e712ad248e9a29ffa
Author: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
Date:   Fri May 15 17:55:09 2026 +0800

    [bugfix] Fix diffusers backend input bug after #2913 (#3644)

    Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
    Signed-off-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
    Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

commit 779bf3118bf642fbd8bb35f9416b443829f6c604
Author: Y. Fisher <yukexiong1@huawei.com>
Date:   Fri May 15 17:35:22 2026 +0800

    [Bugfix] fix compatibility of _hunyuan_image3_unpack_packed_topk between vllm / vllm ascend (#3640)

    Signed-off-by: KexiongYu <yukexiong1@huawei.com>

commit e7ee5de09f2fb32debadf4b42f193baf27042c69
Author: amy-why-3459 <wuhaiyan17@huawei.com>
Date:   Fri May 15 17:07:42 2026 +0800

    [BugFix] Fix the issue of thinker requests being preempted, causing shape mismatch. (#3147)

    Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

commit 440c718d2a6beb052a18c23163112a5ed5413d6d
Author: bjf-frz <frz123db@gmail.com>
Date:   Fri May 15 16:42:18 2026 +0800

    [Bugfix]Fix multimodal cache routing for AR replicas (#3605)

    Signed-off-by: bjf-frz <frz123db@gmail.com>

commit 82a0b3a46763d8be64c3613265297e2a2271faa4
Author: NATURE <wzliu@connect.hku.hk>
Date:   Fri May 15 14:14:08 2026 +0800

    [2/5] [core]refactor communication layer: PR 2 of 5 Qwen3 Omni non async  (#2677)

    Signed-off-by: natureofnature <wzliu@connect.hku.hk>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit c7178d89bb7a70817f239febc84c3b21a714dae7
Author: 汪志鹏 <wangzhipeng628@gmail.com>
Date:   Fri May 15 13:28:40 2026 +0800

    [Bugfix] UnspecifiedOmniPlatform.get_device_count returns 0 instead o… (#3636)

    Signed-off-by: princepride <wangzhipeng628@gmail.com>

commit fdb0efea946c35d2ee68f57274dadd0a616e561e
Author: wangyu <53896905+yenuo26@users.noreply.github.com>
Date:   Fri May 15 11:50:22 2026 +0800

    [CI] add cuda marker to Diffusion X2V function pytest (#3625)

    Signed-off-by: wangyu <410167048@qq.com>

commit 90f5b3c3a10b8c6032bfb82d6e112ec6d70b761a
Author: Jie Liu <33612777+keeper-jie@users.noreply.github.com>
Date:   Fri May 15 11:43:05 2026 +0800

    Update streaming_speech_client.py to solve Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice voice problem (#3380)

    Signed-off-by: Jie Liu <33612777+keeper-jie@users.noreply.github.com>
    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
    Co-authored-by: Yueqian Lin <linyueqian@outlook.com>

commit bbc00f9f86e5bf54633737bedb7964ea4003e37d
Author: lyj-jjj <liuyingjun5@huawei.com>
Date:   Fri May 15 11:22:59 2026 +0800

    [BugFix] fix(omni): isolate diffusion KV-cache dtype from vLLM --kv-cache-dtype #3585 (#3596)

    Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
    Co-authored-by: Cursor <cursoragent@cursor.com>

commit adb2291c2770a66a8658718780ff3b597591dc6d
Author: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Date:   Fri May 15 09:38:47 2026 +0800

    Update WeChat group QR code (#3624)

    Signed-off-by: David Chen <530634352@qq.com>

commit 4f13b871f949d29da952d7582a21d982330f4213
Author: Canlin Guo <canlinguosdu@gmail.com>
Date:   Thu May 14 20:37:51 2026 +0800

    [CI] Add Qwen3-TTS tests for ready tag (#3600)

    Signed-off-by: gcanlin <canlinguosdu@gmail.com>

commit 94254e015f3164a54ef66c042b8bce1a1abee34b
Author: dengyunyang <584797741@qq.com>
Date:   Thu May 14 20:07:12 2026 +0800

    [BugFix] fix shm connector (#3583)

    Signed-off-by: dengyunyang <584797741@qq.com>
    Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
    Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>

commit f7161b07d0126fc933c89eb057113cec089fc5d3
Author: bjf-frz <frz123db@gmail.com>
Date:   Thu May 14 17:27:59 2026 +0800

    [Bugfix]Allow HunyuanImage3 AR sampler batching (#3590)

    Signed-off-by: bjf-frz <frz123db@gmail.com>
    Co-authored-by: Canlin Guo <canlinguosdu@gmail.com>

commit c0b7509f0789c79199f55e13dc7320ab22d95e97
Author: Hongsheng Liu <liuhongsheng4@huawei.com>
Date:   Thu May 14 17:23:55 2026 +0800

    update v0.20.0 readme (#3594)

    Signed-off-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com>
    Co-authored-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com>

commit 3f63aaf982bcba327b7e5150faf6ccc242f84eaa
Author: TaffyOfficial <2587297563@qq.com>
Date:   Thu May 14 16:58:40 2026 +0800

    [Feature] HunyuanImage-3.0 IT2I: multi-image input + prompt API cleanup (#3444)

    Signed-off-by: TaffyOfficial <2324465096@qq.com>
    Signed-off-by: TaffyOfficial <wu15922848573@outlook.com>
    Signed-off-by: skf1999 <13234016272@163.com>
    Signed-off-by: zuiho <2324465096@qq.com>
    Signed-off-by: Claude Code <noreply@anthropic.com>
    Signed-off-by: zuiho <wu15922848573@outlook.com>
    Signed-off-by: TaffyOfficial <2587297563@qq.com>
    Co-authored-by: TaffyOfficial <2324465096@qq.com>
    Co-authored-by: TaffyOfficial <wu15922848573@outlook.com>
    Co-authored-by: skf1999 <13234016272@163.com>

commit c4f859bf56ef294e0e70b7ea6befdfc5b3f0880b
Author: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Date:   Thu May 14 02:26:59 2026 -0400

    [CI] Harden Qwen3-TTS perf nightly: enable Base voice_clone, add c=64/128, 2-GPU split (#3491)

    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

commit 0d9d57acd90f6b6418cf8ccc91c76991a84103e6
Author: wuhang <wuhang6@huawei.com>
Date:   Thu May 14 11:25:55 2026 +0800

    [Entrypoint][Refactor] Make field type hint more concrete (#3139)

    Signed-off-by: wuhang <wuhang6@huawei.com>

commit 51b4b1131e2811942d16fe984eaa1890a6112e44
Author: Y. Fisher <yukexiong1@huawei.com>
Date:   Thu May 14 11:17:15 2026 +0800

    [Bugfix]: Fix online serving failure when using deploy config (#3537)

    Signed-off-by: KexiongYu <yukexiong1@huawei.com>
    Signed-off-by: Y. Fisher <yukexiong1@huawei.com>

commit e818dba016c390b7a85afb2cb941af8f2928fe3f
Author: zhumingjue138 <zhumingjue@huawei.com>
Date:   Thu May 14 10:47:14 2026 +0800

    [Test] Add stability tests for HunyuanImage-3-Instruct (#3504)

    Signed-off-by: zhumingjue <zhumingjue@huawei.com>

commit 754d2e52fcbf3230b015457595991a1e6c9c2f6b
Author: Alex Brooks <albrooks@redhat.com>
Date:   Wed May 13 14:20:18 2026 -0600

    [BugFix] Refresh TeaCache when num_inference_steps=None (#2240)

    Signed-off-by: Alex Brooks <albrooks@redhat.com>

commit 9de9d1f7b593e5fc8884bcdd3456e062950f076f
Author: vraiti <vraiti@redhat.com>
Date:   Wed May 13 15:33:55 2026 -0400

    [Model] Add TP-aware MistralEncoder for FLUX.2-dev TP (#2465)

    Signed-off-by: vraiti <vraiti@redhat.com>
    Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

commit efd955674b608833533626fec21dfb7bacc8f009
Author: dengyunyang <584797741@qq.com>
Date:   Wed May 13 22:40:18 2026 +0800

    [Bugfix][HunyuanImage3.0] Fix KV reuse compatibility in SP scenarios (#3546)

    Signed-off-by: dengyunyang <584797741@qq.com>

commit 4d3eed152a697412c966d2ac97e0009b92490b5e
Author: Y. Fisher <yukexiong1@huawei.com>
Date:   Wed May 13 22:22:43 2026 +0800

    [Feat][Config] Support additional_config for diffusion worker (#3020)

    Signed-off-by: KexiongYu <yukexiong1@huawei.com>
    Signed-off-by: Y. Fisher <yukexiong1@huawei.com>

commit 16a84b29d51165a47152c540babce56392dfdc0e
Author: Zeng Chuang <zengchuang1005@gmail.com>
Date:   Wed May 13 22:10:35 2026 +0800

    [Bugfix] Add bot_task option of think_recaption for hunyuanimage3 it2i (#3551)

    Signed-off-by: zengchuang <zengchuang3@huawei.com>

commit b9cb57b6310de8bbc85a278e165ddf0690a5667c
Author: TJian <tunjian.tan@embeddedllm.com>
Date:   Wed May 13 20:50:57 2026 +0800

    [ROCm] Bugfix wan22 (#3463)

    Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

commit 2e8e3057bcefb9edcc62b3370914ed0e1352e44e
Author: amy-why-3459 <wuhaiyan17@huawei.com>
Date:   Wed May 13 17:54:21 2026 +0800

    [skip ci][Tests] Splitting Qwen3-omni's performance test cases (#3501)

    Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

commit a715abd4474f8c31084692b2637885088193d8c1
Author: hxhhhlalala <hyh_hh@163.com>
Date:   Wed May 13 17:14:42 2026 +0800

    [NPU][Quant] Add W8A8 MXFP8 online/offline quantization support for Wan2.2 T2V / I2V / TI2V inference on Ascend NPU (#3140)

    Signed-off-by: hyh_hh <huyinghong1@huawei.com>
    Co-authored-by: hyh_hh <huyinghong1@huawei.com>

commit b6bdc5997f73c85e3544f4e21c28049119fa7b63
Author: weizhoublue <45163302+weizhoublue@users.noreply.github.com>
Date:   Wed May 13 16:22:48 2026 +0800

    Fix: NPU AR model runner prefix cache key flattening (#3568)

    Signed-off-by: weizhoublue <weizhoublue@github.com>
    Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
    Co-authored-by: weizhoublue <weizhoublue@github.com>

commit 631251a1f8573fc1fcc325041bf1b3bf347226be
Author: knlnguyen1802 <knlnguyen1802@gmail.com>
Date:   Wed May 13 15:31:48 2026 +0800

    [Bugfix, rl] Diffusion worker SIGKILL under Ray actor (exitcode -9) (#3533)

    Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
    Co-authored-by: Samit <285365963@qq.com>

commit b6f29ee6145bf353b084557b80792dee2d5a7149
Author: wangyu <53896905+yenuo26@users.noreply.github.com>
Date:   Wed May 13 14:25:56 2026 +0800

    [CI][Bugfix] skip fp8 Z-Image quality gate (#3531) and add torchdiffeq dev extra (#3563)

    Signed-off-by: wangyu <410167048@qq.com>

commit 0ab1d3005694473a4684959c809d3fd84a00ae69
Author: Canlin Guo <canlinguosdu@gmail.com>
Date:   Wed May 13 12:58:17 2026 +0800

    [CI][Test] Add NPU nightly tests (#3480)

    Signed-off-by: gcanlin <canlinguosdu@gmail.com>

commit 56ca7dd612bbd2298426ce34147845d29197e0b4
Author: lyj-jjj <liuyingjun5@huawei.com>
Date:   Wed May 13 11:46:00 2026 +0800

    support online FP8 quantization for FA on NPU #2236 (#2640)

    Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
    Signed-off-by: gcanlin <canlinguosdu@gmail.com>
    Co-authored-by: Cursor <cursoragent@cursor.com>
    Co-authored-by: gcanlin <canlinguosdu@gmail.com>

commit 83bbe39d39bb6c6db9278ba5e9bd3aee37ce0040
Author: Ricardo Noriega <rnoriega@redhat.com>
Date:   Wed May 13 04:25:02 2026 +0200

    Bump diffusers minimum version to >=0.38.0 (#3349)

    Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>

commit 5313cf6d4800ec9dc438686f7e32eeee48bbb022
Author: Nick Cao <ncao@redhat.com>
Date:   Tue May 12 22:08:23 2026 -0400

    [Bugfix] Fix omni processing test for non-multimodal talker stage (#3559)

    Signed-off-by: Nick Cao <ncao@redhat.com>
    Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

commit c167b9d69190299070159e053a0be0a6db7f2cc1
Author: zhumingjue138 <zhumingjue@huawei.com>
Date:   Wed May 13 09:15:20 2026 +0800

    [Bugfix] Fix the issue where the qwen3-omni model long-term stability test sometimes gets stuck without sending requests. (#3468)

    Signed-off-by: zhumingjue <zhumingjue@huawei.com>

commit dca369d448cd714d36bfaab7d54ab9e3449de306
Author: Nick Cao <ncao@redhat.com>
Date:   Tue May 12 11:24:13 2026 -0400

    [Perf] Remove dead audio_tower and visual from Qwen2.5-Omni talker stage (#3425)

    Signed-off-by: Nick Cao <ncao@redhat.com>
    Co-authored-by: Claude <noreply@anthropic.com>

commit f4b28f239848db9f12121e1d760ef204b128e0be
Author: rein yang <73573651+R2-Y@users.noreply.github.com>
Date:   Tue May 12 22:10:10 2026 +0800

    [CI] update daily omni min accuracy (#3536)

    Signed-off-by: rein yang <ruiruyang2@gmail.com>

commit aa1184d737f2e908f1467b04e13b8df3aae12e53
Author: knlnguyen1802 <knlnguyen1802@gmail.com>
Date:   Tue May 12 14:55:30 2026 +0800

    [bugfix, rl] Fix race condition bug on async running for diffusion model  (#3379)

    Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
    Co-authored-by: Samit <285365963@qq.com>

commit d7ea5d5979c78fb697e7497dd1aed75bf886a9cf
Author: Dnoob <dxpouo@gmail.com>
Date:   Tue May 12 14:39:58 2026 +0800

    [New Model] Add support for tencent/Covo-Audio-Chat (#2293)

    Signed-off-by: Dnoob <dxpouo@gmail.com>
    Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
    Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit e40076e872b0ac4a458ca3abb069bfbf9806935d
Author: dsinghvi <divyanshsinghvi@gmail.com>
Date:   Tue May 12 12:00:13 2026 +0530

    [Refactor] msgspec standardisation for data entry key names and improved type checks  (#3149)

    Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
    Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

commit 9a60e11e3dbac99e2414b8c0eaa747119f9c61bd
Author: NumberWan <wantszkin2003@gmail.com>
Date:   Tue May 12 13:59:04 2026 +0800

    [Nightly CI] Remove TP case (#3534)

    Signed-off-by: NumberWan <wantszkin2003@gmail.com>

commit fe72d078caa30212244ad7d023fcf7af9531c176
Author: Saad Al-Tohamy <92796871+saadaltohamy@users.noreply.github.com>
Date:   Tue May 12 08:10:04 2026 +0300

    [FIX] Ensure `extra_params` are correctly merged into sampling params in `_create_diffusion_speech()` (#3320)

    Signed-off-by: saadaltohamy <saad_altohamy@yahoo.com>
    Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>

commit 0d91fbbbb7fe8b0e3b59e35403d2d2123969ae3f
Author: dengyunyang <584797741@qq.com>
Date:   Tue May 12 12:13:30 2026 +0800

    [Bugfix] Align the AR and DiT prompt formatting across both online and offline modes. (#3516)

    Signed-off-by: dengyunyang <584797741@qq.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit ce621be29bdce423f4d6fd2e248aa20f443135ca
Author: amy-why-3459 <wuhaiyan17@huawei.com>
Date:   Tue May 12 11:56:10 2026 +0800

    [BugFix] Modify the splicing method of streaming audio output. (#3438)

    Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

commit dd5626e6079ddfeea5430bd597fddc868e83d99b
Author: bjf-frz <frz123db@gmail.com>
Date:   Tue May 12 10:44:20 2026 +0800

    [Recipes]update Wan2.2-I2V gpu part (#3271)

    Signed-off-by: bjf-frz <frz123db@gmail.com>

commit ac5fbed6c05115d2605bfc71922eddc69471e14d
Author: wangyu <53896905+yenuo26@users.noreply.github.com>
Date:   Tue May 12 10:23:56 2026 +0800

    [CI][Bugfix] Improve e2e latency logging, update response classes to include detailed latency documentation and add startup time logging (#3246)

    Signed-off-by: wangyu <410167048@qq.com>
    Signed-off-by: [Your Name] <your.email@example.com>

commit 955fcff828705e685e1ad119ebd117940f480481
Author: Lancer <maruixiang6688@gmail.com>
Date:   Tue May 12 10:08:28 2026 +0800

    [Chore] explicit .float() conversion in Helios's optimized_scale function (#3529)

    Signed-off-by: Lancer <maruixiang6688@gmail.com>

commit 4bca522f01ca49f04bb9a6cfa14c7c8839013b0c
Author: ChenWenjing <54166744+Shirley125@users.noreply.github.com>
Date:   Tue May 12 01:09:10 2026 +0800

    [bugfix][ci] avoid Whisper transcript deduplication in realtime audio test (#3417)

    Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>

commit bd4ede391b58295335061102fb534007e3e149af
Author: Nick Cao <ncao@redhat.com>
Date:   Mon May 11 12:04:56 2026 -0400

    [Perf] Remove dead audio_tower and visual from Qwen3-Omni talker stage (#3296)

    Signed-off-by: Nick Cao <ncao@redhat.com>
    Co-authored-by: Claude <noreply@anthropic.com>

commit 6be59f7d19e11427605a727ff5142c980c9ae19c
Author: Junhong Liu <ljh_lbj@163.com>
Date:   Mon May 11 22:56:54 2026 +0800

    [Fix] Fix RMSNorm inductor KeyError under HSDP + torch.compile (#3460)

    Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

commit a33e2eb5885472e4a87f9c431a7792967046fcb1
Author: Y. Fisher <yukexiong1@huawei.com>
Date:   Mon May 11 22:49:19 2026 +0800

    [Config] Add HunyuanImage3 deploy configs (#3172)

    Signed-off-by: KexiongYu <yukexiong1@huawei.com>
    Signed-off-by: Y. Fisher <yukexiong1@huawei.com>

commit c9a8556c24ade154b09b55a39acd36a1697a1f1f
Author: 汪志鹏 <wangzhipeng628@gmail.com>
Date:   Mon May 11 22:19:00 2026 +0800

    [New Model]: Add sensenova u1 support (#3319)

    Signed-off-by: princepride <wangzhipeng628@gmail.com>
    Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

commit 2cdffcea6b0117216f29ba329bebda814d090645
Author: wangyu <53896905+yenuo26@users.noreply.github.com>
Date:   Mon May 11 21:56:08 2026 +0800

    [CI] skip failing diffusion and accuracy cases (#3432, #3256, #3257, #3488) (#3507)

    Signed-off-by: wangyu <410167048@qq.com>

commit 3f27ffbd4de71df4bede265bcf4f8212e6bfa07a
Author: wuhang <wuhang6@huawei.com>
Date:   Mon May 11 20:16:05 2026 +0800

    [Misc] Clean logs for image gen task (#3414)

    Signed-off-by: wuhang <wuhang6@huawei.com>

commit 3bf4f2850c254c45152e53224b1462a1c450581e
Author: dengyunyang <584797741@qq.com>
Date:   Mon May 11 19:34:58 2026 +0800

    [Bug][Hunyuanimage 3.0] fix different AR encode behavior  between online and offline (#3500)

    Signed-off-by: dengyunyang <584797741@qq.com>

commit 5e263b6929ef7cb19c37800db5257f700f41871c
Author: Canlin Guo <canlinguosdu@gmail.com>
Date:   Mon May 11 11:27:35 2026 +0800

    [BugFix] Rename attention_config to diffusion_attention_config (#3489)

    Signed-off-by: gcanlin <canlinguosdu@gmail.com>

commit e1088026faa5e4ee7b27ae3cd835fdc74f6431c0
Author: Baoyuan Qi <qibaoyuan@126.com>
Date:   Mon May 11 10:05:46 2026 +0800

    [Performance] Improve MiMo-Audio tokenizer decoding performance (#2183)

    Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>
    Co-authored-by: Jialong Liu <88185941+Galleons2029@users.noreply.github.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit ef7f2f1cd2158bd55d00ee811eeb09608468841a
Author: Canlin Guo <canlinguosdu@gmail.com>
Date:   Mon May 11 09:13:07 2026 +0800

    [Docs] Refactor the attention backend docs/skill (#3475)

    Signed-off-by: gcanlin <canlinguosdu@gmail.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 67e0e10c85e87e9934162390abe62eb075a6b2bd
Author: TJian <tunjian.tan@embeddedllm.com>
Date:   Mon May 11 07:53:27 2026 +0800

    [ROCm] [CI] Add the same skip ci logic as CUDA CI (#3482)

    Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

commit 2c73cf3522ae6b462eeac6c0cbec639c6e86b4a4
Author: Sy03 <1370724210@qq.com>
Date:   Mon May 11 07:07:38 2026 +0800

    [Perf] Fix Qwen3-TTS latency regression (#3485)

    Signed-off-by: Sy03 <1370724210@qq.com>

commit 857356d5b72f4b27a1f0a5f795f21463f190163b
Author: dengyunyang <584797741@qq.com>
Date:   Sun May 10 22:08:24 2026 +0800

    [Feature] hunyuanimage support flash attn (#2981)

    Signed-off-by: dengyunyang <584797741@qq.com>

commit 11c4c7f0ff7f25eecec1b875dc3a44ed6060e9ba
Author: Canlin Guo <canlinguosdu@gmail.com>
Date:   Sun May 10 11:59:05 2026 +0800

    [Diffusion][Attention] Support per-role attention backend via CLI (#2681)

    Signed-off-by: gcanlin <canlinguosdu@gmail.com>
    Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 26d481fc847a584c3f385a9ddcce002af1bbd319
Author: wangyu <53896905+yenuo26@users.noreply.github.com>
Date:   Sun May 10 07:00:29 2026 +0800

    [CI] Remove VLLM_TEST_CLEAN_GPU_MEMORY to avoid environment variable pollution that causes unnecessary GPU detection, thereby slowing down test case execution. (#3446)

    Signed-off-by: wangyu <410167048@qq.com>
    Signed-off-by: [Your Name] <your.email@example.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 77480215f5c854b030364a3e352862228f98de1a
Author: wuhang <wuhang6@huawei.com>
Date:   Sat May 9 21:13:18 2026 +0800

    [CI][Nightly] Shard nightly Diffusion X2I H100 lanes and centralize shard definitions (#3455)

    Signed-off-by: wuhang <wuhang6@huawei.com>

commit c4a099004411f0aa5d30ad05ed4e7fe6876e58e0
Author: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Date:   Sat May 9 04:55:05 2026 -0400

    (Phase 1)Add ModelOpt FP8 auto-detect support for diffusion checkpoints #2709 (#2913)

    Signed-off-by: roG0d <rodgarcas98@gmail.com>
    Signed-off-by: roG0d <baonudesifeizhai@gmail.com>
    Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
    Co-authored-by: roG0d <rodgarcas98@gmail.com>

commit 40a07e0d809e3c2dc07de52ef977ca364a1dc2cb
Author: wangyu <53896905+yenuo26@users.noreply.github.com>
Date:   Sat May 9 16:17:57 2026 +0800

    [CI] Refine nightly pytest command in Omni · Function Test with H100 to avoid duplicate testing. (#3459)

    Signed-off-by: wangyu <410167048@qq.com>

commit 0e81ef28707631fc6335bf083cf3df9966851403
Author: zhumingjue138 <zhumingjue@huawei.com>
Date:   Sat May 9 16:17:43 2026 +0800

    [CI] Update merge condition to skip L3 merges during weekly test and update doc (#3197)

    Signed-off-by: zhumingjue <zhumingjue@huawei.com>

commit ac69cbd27ecbf67e3a994c15c55d9ee65dacbd16
Author: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Date:   Sat May 9 03:22:43 2026 -0400

    [Test] Restore tts mark and omni_runner_function fixture for Voxtral TTS (#3462)

    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

commit d460673647dd97a2ba3976a8e8bcce3a2527a61e
Author: Wallbreazzz <110282866+Wallbreazzz@users.noreply.github.com>
Date:   Sat May 9 14:58:56 2026 +0800

    Fix NPU code predictor device mismatch in concurrent mode (#3453)

    Co-authored-by: houzechen <h00875519@china.huawei.com>

commit f6e3dece09ad3a72d20a119a9341551cdb25065c
Author: akshatvishu <33392262+akshatvishu@users.noreply.github.com>
Date:   Sat May 9 08:10:36 2026 +0530

    [Feature] Add FP8 quantization for Voxtral TTS (#3036)

    Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
    Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
    Signed-off-by: akshatvishu <33392262+akshatvishu@users.noreply.github.com>
    Co-authored-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
    Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>

commit de3a2917107a7f2da68b35157d83735c1dc35897
Author: Lancer <maruixiang6688@gmail.com>
Date:   Sat May 9 09:13:37 2026 +0800

    [Bugfix] fix OmniGen2 offload and dtype mismatch (#2560)

    Signed-off-by: Lancer <maruixiang6688@gmail.com>
    Signed-off-by: Lancer <402430575@qq.com>

commit c481ccee2b405e2a580b4f050cbc795cdb1e10ba
Author: Dan <416947747@qq.com>
Date:   Sat May 9 06:44:19 2026 +0800

    [Perf] Optimize VoxCPM2 first-request latency via startup warmup (#3424)

    Signed-off-by: Dan250124 <416947747@qq.com>

commit b4ab37da22e77a112e6f6e085937a4ea66ed6da9
Author: rongfu.leng <lenronfu@gmail.com>
Date:   Sat May 9 06:41:59 2026 +0800

    [Bugfix] Qwen-Image use teachche serve will crash (#3450)

    Signed-off-by: rongfu.leng <lenronfu@gmail.com>

commit c2a624bec41537a6d78454beebce58cf91764e7e
Author: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Date:   Fri May 8 18:40:43 2026 -0400

    [Bugfix][StableAudio] Pass model_class_name to Omni() and declare audio class attrs (#3405)

    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

commit aca4b7d65c0d7925d22d055ef26c630a4b8dec82
Author: chzhang2021 <chzhang2021@gmail.com>
Date:   Fri May 8 13:08:39 2026 -0700

    Add Qwen3 TTS Model recipe (#3130)

    Signed-off-by: Chonghao Zhang <chzhang2021@gmail.com>
    Signed-off-by: chzhang2021 <chzhang2021@gmail.com>
    Signed-off-by: Chonghao Zhang <chonghaoz@meta.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: Chonghao Zhang <chonghaoz@meta.com>

commit 65bc9684659d28dff1010940f0a3a0d6258fd62e
Author: Nick Cao <ncao@redhat.com>
Date:   Fri May 8 10:16:49 2026 -0400

    [Refactor] Rename SupportsModuleOffload to SupportsComponentDiscovery (#3354)

    Signed-off-by: Nick Cao <ncao@redhat.com>
    Co-authored-by: Claude <noreply@anthropic.com>

commit b968373c886618a701bb8745eb065c26e555804b
Author: Ayush Agarwal <ayushag@nvidia.com>
Date:   Fri May 8 06:37:57 2026 -0700

    enhancement: extend to dmd2 to image generation + add flux, qwen image pipelines (#2974)

    Signed-off-by: ayushag <ayushag@nvidia.com>
    Signed-off-by: Ayush Agarwal <ayushag@nvidia.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 039a09a8e14bac3762cf1c7576e46f5c6a5e5c27
Author: skf <54565339+skf-1999@users.noreply.github.com>
Date:   Fri May 8 21:33:53 2026 +0800

    [Feature] online HunyuanImage-3.0 IT2I (image editing) support (#3410)

    Signed-off-by: skf1999 <13234016272@163.com>

commit c83cd4506913e97c915be3484f862d328c332e0e
Author: zdoba <daixinning@gmail.com>
Date:   Fri May 8 21:20:03 2026 +0800

    [Feat] Add Sequence Parallelism (USP) support for HunyuanVideo 1.5 transformer (#2444)

    Signed-off-by: daixinning <daixinning@163.com>
    Co-authored-by: daixinning <daixinning@163.com>

commit f8624db93a3832136189e7cc7fec57d9f5c6e076
Author: boatman <109857087+sphinxkkkbc@users.noreply.github.com>
Date:   Fri May 8 21:03:47 2026 +0800

    [BugFix]Fix default stage config path in voxcpm2 (#3447)

    Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
    Co-authored-by: sphinxkkkbc <binchengkang8@gmail.com>

commit 07fd6afb4b0cc45b7cf2dd7ef95287bd413a5c6c
Author: TaffyOfficial <2587297563@qq.com>
Date:   Fri May 8 20:55:23 2026 +0800

    [Test][HunyuanImage3] Add e2e offline I2T smoke test (#3332)

    Signed-off-by: TaffyOfficial <2324465096@qq.com>
    Co-authored-by: TaffyOfficial <2324465096@qq.com>

commit 5b61e7f1f1be0d3691a54541e3048c4bca980203
Author: dengyunyang <584797741@qq.com>
Date:   Fri May 8 20:51:46 2026 +0800

    [Feature][Hunyuan image 3.0] AR + DIT with kv reuse. (#3346)

    Signed-off-by: dengyunyang <584797741@qq.com>

commit ce8a7dfd2da31c45084bab15b867f34a6b2b1ffa
Author: Alex Brooks <albrooks@redhat.com>
Date:   Fri May 8 02:05:38 2026 -0600

    [Bugfix] Fix Dtype Crashes in SD3 (#2526)

    Signed-off-by: Alex Brooks <albrooks@redhat.com>
    Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>

commit 50fd3a3f852918a46d721ad52e241abb80457645
Author: Phi-C <chenxjhit@163.com>
Date:   Fri May 8 15:22:32 2026 +0800

    [Bugfix] Fix the issue where the seed parameter does not take effect when using the OpenAI Python client (#3436)

    Signed-off-by: Phi-C <chenxjhit@163.com>

commit 32663f21d5e760d0cfd769110d3e133a3582cfff
Author: lsyyyyy <siyuanlei37@gmail.com>
Date:   Fri May 8 15:20:42 2026 +0800

    [Feat] support hsdp for Bagel (#3150)

    Signed-off-by: siyuan.lei <siyuanlei37@gmail.com>
    Signed-off-by: lsyyyyy <siyuanlei37@gmail.com>
    Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
    Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>

commit ea4cf77f56b8a42bd900193d59739a61ff7eec73
Author: Yuchen Jiang <yuchen.yj.jiang@gmail.com>
Date:   Thu May 7 23:29:15 2026 -0700

    [Hardware] Extend diffusion engine plugin extensibility for out-of-tree hardware backends (#3239)

    Signed-off-by: Yuchen Jiang <yucjiang@amazon.com>
    Co-authored-by: Yuchen Jiang <yucjiang@amazon.com>
    Co-authored-by: Canlin Guo <canlinguosdu@gmail.com>

commit 6f2ad7b403569ac4fa602348b5c90a8ceed15b09
Author: wangyu <53896905+yenuo26@users.noreply.github.com>
Date:   Fri May 8 14:12:23 2026 +0800

    [Test] Unify L2/L3 test layout, Buildkite steps, and test helpers (#2556)

    Signed-off-by: wangyu <410167048@qq.com>
    Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
    Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>
    Co-authored-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

commit b85833eeec475497e28dd883c6436dcbfd5406de
Author: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Date:   Fri May 8 12:21:29 2026 +0800

    Update CODEOWNERS feature reviewers (#3378)

    Signed-off-by: David Chen <530634352@qq.com>

commit eeb7e698c983d97c9dff8c877376109cdabc71ca
Author: Chenguang Zheng <645327136@qq.com>
Date:   Fri May 8 09:28:52 2026 +0800

    [Clean] Remove multi-replica Bagel CI and related docs/configs  (#3407)

    Signed-off-by: Chenguang ZHENG <645327136@qq.com>

commit e6466cf0e621f51432f1c5afe7f23df908862763
Author: Nick Cao <ncao@redhat.com>
Date:   Thu May 7 11:23:59 2026 -0400

    [Refactor] Replace and ban a few torch.cuda functions in favor of torch.accelerator replacements. (#3365)

    Signed-off-by: Nick Cao <ncao@redhat.com>

commit 54277a8dd04088aaf591d3611973c4b547cc002b
Author: Gao Han <hgaoaf@connect.ust.hk>
Date:   Thu May 7 16:16:36 2026 +0800

    [chore] Update command to download dataset from huggingface-cli to hf (#3403)

    Signed-off-by: Gao Han <hgaoaf@connect.ust.hk>

commit 4a24a517abc7769b1399ded594558a3fe8269872
Author: Canlin Guo <canlinguosdu@gmail.com>
Date:   Thu May 7 11:54:16 2026 +0800

    [BugFix] Probe __dict__ instead of hasattr when patching WanRMS_norm (#3400)

    Signed-off-by: gcanlin <canlinguosdu@gmail.com>

commit 5d4b16e6e37fdde8578c17cab1165b9ad5effb9c
Author: Peiqi Yin <60515999+yinpeiqi@users.noreply.github.com>
Date:   Wed May 6 20:00:20 2026 -0700

    [BugFix] Qwen2.5-Omni streaming code2wav input handling (#3396)

    Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>

commit 3c85ca5536a767361c4a82b65a6d04c3a7d63258
Author: dengyunyang <584797741@qq.com>
Date:   Thu May 7 10:03:21 2026 +0800

    [bugfix][hunyuaniamge] Fix parameter issue introduced during PR #3107 rebase (#3395)

    Signed-off-by: dengyunyang <584797741@qq.com>

commit c483a23debe6fadf9312c78c9c65c129791006d0
Author: Daniel Huang <daniel1.huang@intel.com>
Date:   Wed May 6 17:08:25 2026 -0700

    [CI Patch] Qwen 2.5 CI Fixes for Intel XPU (#3083)

    Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 2856ff7aeac763e88b095a47d9af503901c50035
Author: Chendi.Xue <chendi.xue@intel.com>
Date:   Wed May 6 17:45:30 2026 -0500

    [XPU][DOCKER] update dockerfile.xpu after main repo updating to pt2.11 (#3393)

    Signed-off-by: Chendi Xue <chendi.xue@intel.com>

commit 56ca5d9a8f8779336f6dcdd6f73b0ad020eb77fd
Author: Chen-Yo Sun <chenyo.sun@mistral.ai>
Date:   Wed May 6 15:43:01 2026 -0700

    [BugFix] Forward CLI --tokenizer to per-stage engine configs (#3120)

    Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

commit 81ab2f98da21817638dbf14c0b0b46e2ad6354b1
Author: Haco <923390377@qq.com>
Date:   Thu May 7 06:32:01 2026 +0800

    [Config Refactor] Remove legacy Omni CLI arg helper and align tests with nullified parser defaults (#3144)

    Signed-off-by: xiaohajiayou <923390377@qq.com>

commit b25ea13cb04c7a56b944da110bceb07a5c2bd6f7
Author: amy-why-3459 <wuhaiyan17@huawei.com>
Date:   Thu May 7 06:21:00 2026 +0800

    [BugFix] Fixed a precision issue with one-word answers. (#3385)

    Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
    Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: Canlin Guo <961750412@qq.com>

commit 687a44e5c83cedf16882b188ce0b042197fe69c8
Author: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Date:   Wed May 6 18:17:57 2026 -0400

    [Bugfix][OmniVoice] Read voice-cloning fields from OmniTextPrompt in offline path (#3392)

    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

commit 19f8f428223fa8acbeabeed9d89609d623374689
Author: Juan Pablo Zuluaga <46724788+JuanPZuluaga@users.noreply.github.com>
Date:   Thu May 7 00:16:03 2026 +0200

    [Feat][Qwen3-Omni] Add CUDA graph support for Code2Wav decoder (#2376)

    Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
    Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>

commit 0a8204cb81bf8608b21fcc3a4199e5bde6b1136c
Author: Isotr0py <mozf@mail2.sysu.edu.cn>
Date:   Thu May 7 00:37:36 2026 +0800

    [Quantization] Redo Z-Image text encoder FP8 online quantization (#3279)

    Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
    Signed-off-by: Isotr0py <2037008807@qq.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>

commit b8bd75837ebf716854e40997995928128b6cf0db
Author: Lidang Jiang <119769478+Lidang-Jiang@users.noreply.github.com>
Date:   Wed May 6 23:43:19 2026 +0800

    [Bugfix] Fix missing ANSI colors in CLI logo when output is piped (#1636)

    Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>

commit 576afb6f53c3e817c1895a3790bacef2470c0fa9
Author: skf <54565339+skf-1999@users.noreply.github.com>
Date:   Wed May 6 22:37:39 2026 +0800

    [Feature] HunyuanImage-3.0 IT2I (image editing) support (#3107)

    Signed-off-by: TaffyOfficial <2324465096@qq.com>
    Signed-off-by: zuiho <2324465096@qq.com>
    Signed-off-by: skf1999 <13234016272@163.com>
    Co-authored-by: TaffyOfficial <2324465096@qq.com>
    Co-authored-by: dengyunyang <584797741@qq.com>
    Co-authored-by: John Liu BUAA <liukecheng97@gmail.com>

commit 1e8dc841503146bcb2b5af01b36d1eca94dd8e24
Author: Haco <923390377@qq.com>
Date:   Wed May 6 22:21:29 2026 +0800

    [Bugfix] Fix default diffusion stage config generator drops runtime engine args (#2559)

    Signed-off-by: xiaohajiayou <923390377@qq.com>
    Co-authored-by: reidliu41 <reidliu41@users.noreply.github.com>

commit 28558cc37471da8258c95aa515363a4a05fce601
Author: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
Date:   Wed May 6 21:11:35 2026 +0800

    [bugfix][CI] Fix qwen image performance degradation w/ vllm 0.20 & CUDA 13.0 (#3352)

    Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

commit 282e0b664231275d9a17f56880c99e084028a435
Author: amy-why-3459 <wuhaiyan17@huawei.com>
Date:   Wed May 6 20:42:56 2026 +0800

    [BugFix][CI] Change max_tokens from 150 to 2048 (#3376)

    Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
    Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>

commit 1e5f288a915494f8ffd9783a4886bbfe9929e65e
Author: Zheng Wengang <zwg0606@gmail.com>
Date:   Wed May 6 19:33:18 2026 +0800

    [FEAT] support multi-stage deployment (#2396)

    Signed-off-by: ZhengWG <zwg0606@gmail.com>
    Signed-off-by: Zheng Wengang <zwg0606@gmail.com>
    Signed-off-by: Peiqi Yin <60515999+yinpeiqi@users.noreply.github.com>
    Signed-off-by: yinpe <11810305@mail.sustech.edu.cn>
    Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
    Co-authored-by: Peiqi Yin <60515999+yinpeiqi@users.noreply.github.com>
    Co-authored-by: yinpe <11810305@mail.sustech.edu.cn>
    Co-authored-by: yinpeiqi <yinpeiqi809@gmail.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>
    Co-authored-by: Chenguang Zheng <645327136@qq.com>

commit e969d2e99f464044b50a59ea618ad9a8edcbfb9f
Author: fywc <hanzheli@kuaishou.com>
Date:   Wed May 6 18:11:22 2026 +0800

    [Docs] Add LTX-2-T2V and LTX-2-I2V recipes (#3294)

    Signed-off-by: hanzheli <hanzheli@kuaishou.com>
    Signed-off-by: fywc <hanzheli@kuaishou.com>

commit 6f784cbc50b2ef1489a73b7c89016f5d95c18d7c
Author: Vensen <vensenmu@gmail.com>
Date:   Wed May 6 15:35:41 2026 +0700

    [Bugfix]: skip faulty pipelines during registry iteration (#2999)

    Signed-off-by: vensen <vensenmu@gmail.com>
    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
    Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>
    Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
    Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>

commit 369a47d5a1874e2a5050c830d5a18398b52446b7
Author: dengyunyang <584797741@qq.com>
Date:   Wed May 6 15:31:11 2026 +0800

    [Hunyuanimage-3.0] Accuracy fix (#3373)

    Signed-off-by: dengyunyang <584797741@qq.com>

commit b076006c3541f1be53329bee8f7e8f91371c5ba0
Author: Haco <923390377@qq.com>
Date:   Wed May 6 14:25:20 2026 +0800

    [BugFix] Fix Whitelist optimization  CI failure (#3290)

    Signed-off-by: xiaohajiayou <923390377@qq.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
    Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>

commit 9f81a0a4b07087f72284813aebae90d2b6f076ea
Author: fywc <hanzheli@kuaishou.com>
Date:   Wed May 6 13:09:39 2026 +0800

    [Feat] support HSDP for DreamID-Omni (#3138)

    Signed-off-by: hanzheli <hanzheli@kuaishou.com>
    Signed-off-by: fywc <hanzheli@kuaishou.com>

commit f36d891ed106aa2a73710a9a706bfc1ddf1a7294
Author: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Date:   Wed May 6 12:43:08 2026 +0800

    Update WeChat QR code (#3368)

    Signed-off-by: David Chen <530634352@qq.com>

commit 354511b805f96dc2ffe8b72755af1764d2318fe1
Author: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Date:   Tue May 5 23:32:39 2026 -0400

    [CI][Bugfix] Relax stable-audio layerwise offload determinism tolerance to 1e-2 (#3371)

    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

commit 005621ba7fd92ad0f11369ea26a5f05b56dad9af
Author: Mike Qiu <qdy220091330@gmail.com>
Date:   Wed May 6 10:56:23 2026 +0800

    Support both "voice" and "speaker" params in chat completions (#3248)

    Signed-off-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
    Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
    Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>

commit 4e41b7bc2324eed1b2af6a09a40f3d7005271001
Author: Chendi.Xue <chendi.xue@intel.com>
Date:   Tue May 5 19:03:48 2026 -0500

    Enable Wan2.2-S2V modeling to vLLM-omni (#2751)

    Signed-off-by: Chendi Xue <chendi.xue@intel.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 353ac8a402738865eb2d38fb4b26b456561a50b1
Author: Chendi.Xue <chendi.xue@intel.com>
Date:   Tue May 5 18:48:46 2026 -0500

    [Enhancement] Offload transformer after switch to transformer-2 (#3224)

    Signed-off-by: Chendi Xue <chendi.xue@intel.com>
    Co-authored-by: Canlin Guo <canlinguosdu@gmail.com>

commit e49fbd8a2d1ec0ad2ea593dfc591091b30f42e82
Author: Ting FU <semmer@live.cn>
Date:   Wed May 6 05:47:13 2026 +0800

    [Feat] DiffusionEngine Support async batch infer  (#2729)

    Signed-off-by: Semmer <semmer@live.cn>
    Signed-off-by: jader <yjader@foxmail.com>
    Signed-off-by: asukaqaq-s <1311722138@qq.com>
    Co-authored-by: asukaqaq-s <1311722138@qq.com>
    Co-authored-by: jader <yjader@foxmail.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
    Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>

commit 2b85474c2347d360df1e245ee6e6b628041536fb
Author: Sy03 <1370724210@qq.com>
Date:   Wed May 6 02:57:55 2026 +0800

    [Bugfix] Propagate seed to Qwen3-TTS Fast AR sampler (#3350)

    Signed-off-by: Sy03 <1370724210@qq.com>

commit 5cf3f7947b84aecb0c908719c7573dcab6b00a06
Author: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Date:   Tue May 5 13:36:16 2026 -0400

    [Docs] Consolidate moss_tts_nano + ming_flash_omni_tts into TTS hub (#3358)

    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

commit 44cde33eb5b08b09e1d7a42d320b9ef5aa87f830
Author: TaffyOfficial <2587297563@qq.com>
Date:   Wed May 6 00:31:02 2026 +0800

    [Bugfix][HunyuanImage3] Fix offline AR garbage output by switching to Instruct chat template (#3243)

    Signed-off-by: zuiho <wu15922848573@outlook.com>
    Signed-off-by: TaffyOfficial <2324465096@qq.com>
    Signed-off-by: zuiho-kai <31877877+zuiho-kai@users.noreply.github.com>
    Signed-off-by: zuiho <2324465096@qq.com>
    Co-authored-by: TaffyOfficial <2324465096@qq.com>
    Co-authored-by: zuiho-kai <31877877+zuiho-kai@users.noreply.github.com>

commit a77c56725d481fc30643dd76c176208d8bd03262
Author: TJian <tunjian.tan@embeddedllm.com>
Date:   Tue May 5 23:28:37 2026 +0800

    [ROCm] [CI] [Bugfix] 2/N Fix Qwen2.5 and Qwen3 test (#3343)

    Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

commit fde38ca54f218900507745a4da0542ff9cd60a04
Author: Nick Cao <ncao@redhat.com>
Date:   Tue May 5 11:26:30 2026 -0400

    [Refactor][Qwen3-TTS] Construct Code2Wav decoder natively (#2341)

    Signed-off-by: Nick Cao <ncao@redhat.com>
    Co-authored-by: Claude <noreply@anthropic.com>

commit b64ab05d09f0f639c83b8e8a81e45f4b898d6a3f
Author: Juan Pablo Zuluaga <46724788+JuanPZuluaga@users.noreply.github.com>
Date:   Tue May 5 16:53:45 2026 +0200

    [TTS][SpeakerCacheManager] A global speaker cache manager for Voice Cloning (#2630)

    Signed-off-by: JuanPZuluaga <juanz9312@gmal.com>
    Co-authored-by: JuanPZuluaga <juanz9312@gmal.com>

commit a0918ce583985ce597d748fa223c0686204a1f5e
Author: boatman <109857087+sphinxkkkbc@users.noreply.github.com>
Date:   Tue May 5 22:49:57 2026 +0800

    [Feat]add cpu-offload/layerwise-offload for stable-audio-open & fix output inconsistency with same seed (#2909)

    Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
    Co-authored-by: sphinxkkkbc <binchengkang8@gmail.com>

commit f19891e4f1bf1f4b29dec941e722a74069a82c74
Author: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
Date:   Tue May 5 20:54:39 2026 +0800

    [bugfix][CI] Diffusers backend update (#3096)

    Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
    Signed-off-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
    Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

commit d0b531520d67f6a496afd6c1495aa765e0c78e65
Author: dengyunyang <584797741@qq.com>
Date:   Tue May 5 20:05:46 2026 +0800

    [Performance CI ]Hunyuan Image 3.0 DIT bench test (#2495)

    Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
    Signed-off-by: dengyunyang <584797741@qq.com>
    Signed-off-by: TaffyOfficial <2324465096@qq.com>
    Signed-off-by: ChenZhao <bounty-hunter@users.noreply.github.com>
    Signed-off-by: zuiho <2324465096@qq.com>
    Co-authored-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
    Co-authored-by: Gao Han <hgaoaf@connect.ust.hk>
    Co-authored-by: TaffyOfficial <2324465096@qq.com>
    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    Co-authored-by: TaffyOfficial <2587297563@qq.com>

commit 1a93818e54addb7c64ddce05cd53169ba895eb1d
Author: zhanqiuhu <49648934+ZhanqiuHu@users.noreply.github.com>
Date:   Tue May 5 04:13:30 2026 -0400

    [Bugfix] Add GatedRepoError Report (#1616)

    Signed-off-by: Zhanqiu Hu <zhu@redhat.com>

commit bb239fa94959932731e8962fe3a3d18ebbb33fd8
Author: Alex Brooks <albrooks@redhat.com>
Date:   Mon May 4 22:09:45 2026 -0600

    [Core] Support Async & Sync AutoRegressive Scheduling (#3306)

    Signed-off-by: Alex Brooks <albrooks@redhat.com>

commit 703e31fc470bb422bf36fbf6987707ffa6e9ffea
Author: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Date:   Mon May 4 16:41:29 2026 -0400

    [Docs] Consolidate per-model TTS examples into a single hub (#3234)

    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

commit e1b30061b637ea37d44a0d8b4b551dbe891f2dfc
Author: Alex Brooks <albrooks@redhat.com>
Date:   Mon May 4 10:55:54 2026 -0600

    [CI] Use Logprobs Check for Flaky Prefix Cache Test (#3199)

    Signed-off-by: Alex Brooks <albrooks@redhat.com>

commit c007d40b147a6130b2cc9f8fc6721f5aaf0179ee
Author: Canlin Guo <canlinguosdu@gmail.com>
Date:   Mon May 4 22:38:43 2026 +0800

    [NPU] Upgrade to v0.20.0 & align with GPU model runner (#3325)

    Signed-off-by: gcanlin <canlinguosdu@gmail.com>

commit c708aecb18491405a80ffabcb0f8aa54062baeb0
Author: Zhang Jian <jianmusings@gmail.com>
Date:   Mon May 4 22:24:35 2026 +0800

    [Diffusion] [Model] Support AudioX (#2077)

    Signed-off-by: Zhang Jian <jianmusings@gmail.com>
    Signed-off-by: Zhang <jianmusings@gmail.com>
    Signed-off-by: Zhang <zhang.jian@u.nus.edu>
    Signed-off-by: Zhang Jian <e0322744@u.nus.edu>
    Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
    Co-authored-by: Zhang Jian <e0322744@u.nus.edu>
    Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>

commit c3065115128e5369ca51be5a9c40f68640d55a6e
Author: Lancer <maruixiang6688@gmail.com>
Date:   Mon May 4 22:22:06 2026 +0800

    [Bugfix] Fix CUBLAS_STATUS_EXECUTION_FAILED when native Flash Attention is available (Wan2.2) (#3327)

    Signed-off-by: Lancer <maruixiang6688@gmail.com>

commit 5fc0bfe01eeff89e70befd01eac046f783a71072
Author: Nick Cao <ncao@redhat.com>
Date:   Mon May 4 10:17:25 2026 -0400

    [Cleanup] Use tokens_input() for TTS prompt construction (#3227)

    Signed-off-by: Nick Cao <ncao@redhat.com>
    Co-authored-by: Claude <noreply@anthropic.com>

commit 9a5370367682da57a8d7f50f28c4753c2b7bd2b7
Author: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>
Date:   Mon May 4 15:07:53 2026 +0200

    [Bugfix] GLM-Image: route t2i requests through the multimodal processor (#3034) (#3189)

    Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>

commit bb69cbc9f1a1b0379c0a4ba6895822f1b4d2089d
Author: Lancer <maruixiang6688@gmail.com>
Date:   Mon May 4 17:34:43 2026 +0800

    [Perf] Optimize RMSNorm in Z-Image (#3304)

    Signed-off-by: Lancer <maruixiang6688@gmail.com>

commit 33586d845ce62104f68dae8da34a38c2b557618b
Author: Alex Brooks <albrooks@redhat.com>
Date:   Sun May 3 23:50:31 2026 -0600

    [Bugfix] Use get_open_ports_list for stage ports in OmniMasterServer (#3333)

    Signed-off-by: Alex Brooks <albrooks@redhat.com>

commit 3a2679950745df348d0e933acdc58f9758b204e1
Author: bjf-frz <frz123db@gmail.com>
Date:   Mon May 4 11:54:33 2026 +0800

    [Bugfix] Fix GLM-Image prior token debug logging (#3165)

    Signed-off-by: bjf-frz <frz123db@gmail.com>

commit 6bb18af119c797fae31e843748fd14fd1e6b2efb
Author: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Date:   Sun May 3 22:32:59 2026 +0800

    [Chore][Doc] Fix example arg values in Profiler doc (#3309)

    Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

commit 21a3035c101976a17fde1f8485659945acc13f9b
Author: Dan250124 <416947747@qq.com>
Date:   Sun May 3 22:29:47 2026 +0800

    Fixed memory leak and Remove dead code (#3312)

    Signed-off-by: Dan250124 <416947747@qq.com>
    Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

commit e50a0665485178f428e8e37ad7a8cff2fc413484
Author: 汪志鹏 <wangzhipeng628@gmail.com>
Date:   Sun May 3 20:02:22 2026 +0800

    [BugFix]: Fix async scheduer transfer exceed KV cache (#3318)

    Signed-off-by: princepride <wangzhipeng628@gmail.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 4c06bb09ed9985da1c83c72ad704eb297c989986
Author: Lancer <maruixiang6688@gmail.com>
Date:   Sun May 3 15:53:28 2026 +0800

    [Feat] ERNIE image model (T2I) (#2861)

    Signed-off-by: Lancer <maruixiang6688@gmail.com>

commit 5dabbb2f18f650bdfa3c5cc60195e25d5c87fc83
Author: GuoSheng Feng <146159551+sfiisf@users.noreply.github.com>
Date:   Sun May 3 12:36:15 2026 +0800

    [Bugfix] Map Qwen3-TTS max_new_tokens to max_tokens (#3217)

    Signed-off-by: sfiisf <sfiisf@163.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 7491b674463362a2725ef728b303c60b691bbb08
Author: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Date:   Sun May 3 00:35:35 2026 -0400

    [Bugfix][MOSS-TTS-Nano] Drop fictional voice presets, require ref_audio (#3192)

    Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
    Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit d49356e4ca95e85613ae68243c838c69eabf5a02
Author: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Date:   Sun May 3 10:43:49 2026 +0800

    [Config Refactor] Migrate Ming and Ming-TTS deploy/pipline configs (#3154)

    Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
    Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
    Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

commit 6a837849dd73cfa27a29b476ff03780ee543e41c
Author: Alex Brooks <albrooks@redhat.com>
Date:   Sat May 2 18:02:47 2026 -0600

    [CI] Fix Bad TP Initialization in Dynin-Omni Tests (#3298)

    Signed-off-by: Alex …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-test label to trigger buildkite merge test CI ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants