[Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False by JuanPZuluaga · Pull Request #1913 · vllm-project/vllm-omni

JuanPZuluaga · 2026-03-16T08:12:13Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

in the Qwen3TTS Talker we run the CodePredictor on every decode step to generate the remaining 15 residual codebook tokens. currently, we have torch.compile(mode="default", dynamic=True), in high concurrency settings, we still have quite some overhead.

here, we eliminate a bit of the overhead by switching to fixed-shape CUDA graph capture: reduce-overhead + dynamic=False. We pad the input to fixed shape [bucket_bsz, 17, H] with is_causal=True so Inductor captures its own internal CUDA graphs, by design we know the talker has a max_seq_len of 16. So the code predictor has: [talker_hidden, code_0_embed, code_1_embed, ..., code_N_embed].

Also. we add some batch-size bucketing, where we pad batch dimension to power-of-2 buckets [1, 2, 4, 8, 16] matching what vLLM already does. we also implement a buffering + position_id caching for pre-allocating proj_buf for max batch size and caches position_ids per bucket to avoid per-step allocations.

Test Plan

Run evaluation as in #1852 and #1797

I used this YAML config:

stage_args:
  - stage_id: 0
    stage_type: llm
    runtime:
      devices: "0"
      max_batch_size: 16

  - stage_id: 1
    stage_type: llm
    runtime:
      devices: "0"
      max_batch_size: 16

runtime:
  max_inflight: 16
  connectors:
    connector_of_shared_memory:
      codec_streaming: true
      codec_chunk_frames: 32
      codec_left_context_frames: 32

Test Result

see the results and plots, results are clear, we improve in every setting.

Benchmark Results

Metric	Concurrency	cuda_graph	main
TTFP (ms)	4	153.9	193.1
TTFP (ms)	8	340.7	423.3
TTFP (ms)	16	1078.3	1242.0
E2E (ms)	4	1766.4	2321.5
E2E (ms)	8	2776.9	3317.0
E2E (ms)	16	4577.7	5194.0
RTF	4	0.313	0.405
RTF	8	0.556	0.702
RTF	16	0.797	0.885
Throughput (audio-s/s)	4	12.66	9.79
Throughput (audio-s/s)	8	15.27	11.94
Throughput (audio-s/s)	16	19.26	17.38

Improvement (cuda_graph vs main)

Metric	Concurrency	Improvement
TTFP	4	+20.3%
TTFP	8	+19.5%
TTFP	16	+13.2%
E2E	4	+23.9%
E2E	8	+16.3%
E2E	16	+11.9%
RTF	4	+22.7%
RTF	8	+20.7%
RTF	16	+9.9%

Plot saved to vllm_omni/comparison.png

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: JuanPZuluaga <juanz9312@gmal.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 01ce53a615

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-16T08:15:29Z

+        lm_heads = self._lm_heads_list
+        codec_embeds = self._codec_embeds_list


Initialize cached heads in no-compile fallback

When supports_torch_inductor() is false, _setup_compile() now returns early after setting only _compiled_model_fwd, but forward() unconditionally reads _lm_heads_list and _codec_embeds_list from these cached fields. In that environment (e.g., CPU or unsupported GPU), those fields stay None, so the first decode step fails with a NoneType subscript error instead of using the previous working path. Please populate these caches in the fallback branch (or avoid relying on them when compile is disabled).

Useful? React with 👍 / 👎.

Signed-off-by: JuanPZuluaga <juanz9312@gmal.com>

univa-HARRY · 2026-03-16T09:46:03Z

I’m following this and trying to reproduce it exactly based on the corresponding feature branch, but the server crashes when concurrency reaches around 3–4.

Is there any stage-config.yaml option I should pay attention to besides the settings you shared?

The error message is as follows.

[Stage-0] ERROR 03-16 09:43:22 [omni_stage.py:1514] Failed on request speech-beacc4826ddfc154: EngineCore encountered an issue. See stack trace (above) for the root cause.
[Stage-0] ERROR 03-16 09:43:22 [omni_stage.py:1514] Traceback (most recent call last):
[Stage-0] ERROR 03-16 09:43:22 [omni_stage.py:1514]   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/omni_stage.py", line 1507, in generation_single_request
[Stage-0] ERROR 03-16 09:43:22 [omni_stage.py:1514]     async for res in cast(AsyncLLM, stage_engine).generate(ein, llm_sampling_params, rid):
[Stage-0] ERROR 03-16 09:43:22 [omni_stage.py:1514]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 564, in generate
[Stage-0] ERROR 03-16 09:43:22 [omni_stage.py:1514]     q = await self.add_request(
[Stage-0] ERROR 03-16 09:43:22 [omni_stage.py:1514]         ^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 03-16 09:43:22 [omni_stage.py:1514]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 309, in add_request
[Stage-0] ERROR 03-16 09:43:22 [omni_stage.py:1514]     raise EngineDeadError()
[Stage-0] ERROR 03-16 09:43:22 [omni_stage.py:1514] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=7) ERROR 03-16 09:43:22 [async_omni.py:742] [AsyncOrchestrator] Stage 0 error on request speech-beacc4826ddfc154: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700] Streaming speech generation failed for speech-beacc4826ddfc154: {'request_id': 'speech-beacc4826ddfc154', 'stage_id': 0, 'error': 'EngineCore encountered an issue. See stack trace (above) for the root cause.'}
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700] Traceback (most recent call last):
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700]   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/openai/serving_speech.py", line 646, in _generate_audio_chunks
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700]     async for res in generator:
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700]   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 437, in generate
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700]     async for output in self._process_async_results(
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700]   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 597, in _process_async_results
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700]     engine_outputs, finished, output_to_yield = self._process_single_result(
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700]                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700]   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 745, in _process_single_result
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700]     raise RuntimeError(result)
(APIServer pid=7) ERROR 03-16 09:43:22 [serving_speech.py:700] RuntimeError: {'request_id': 'speech-beacc4826ddfc154', 'stage_id': 0, 'error': 'EngineCore encountered an issue. See stack trace (above) for the root cause.'}
(APIServer pid=7) ERROR:    Exception in ASGI application
(APIServer pid=7)   + Exception Group Traceback (most recent call last):
(APIServer pid=7)   |   File "/usr/local/lib/python3.12/dist-packages/starlette/_utils.py", line 81, in collapse_excgroups
(APIServer pid=7)   |     yield
(APIServer pid=7)   |   File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 270, in __call__
(APIServer pid=7)   |     async with anyio.create_task_group() as task_group:
(APIServer pid=7)   |                ^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   |   File "/usr/local/lib/python3.12/dist-packages/anyio/_backends/_asyncio.py", line 783, in __aexit__
(APIServer pid=7)   |     raise BaseExceptionGroup(
(APIServer pid=7)   | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
(APIServer pid=7)   +-+---------------- 1 ----------------
(APIServer pid=7)     | Traceback (most recent call last):
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 416, in run_asgi
(APIServer pid=7)     |     result = await app(  # type: ignore[func-returns-value]
(APIServer pid=7)     |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
(APIServer pid=7)     |     return await self.app(scope, receive, send)
(APIServer pid=7)     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/fastapi/applications.py", line 1160, in __call__
(APIServer pid=7)     |     await super().__call__(scope, receive, send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/applications.py", line 107, in __call__
(APIServer pid=7)     |     await self.middleware_stack(scope, receive, send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/errors.py", line 186, in __call__
(APIServer pid=7)     |     raise exc
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/errors.py", line 164, in __call__
(APIServer pid=7)     |     await self.app(scope, receive, _send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/cors.py", line 87, in __call__
(APIServer pid=7)     |     await self.app(scope, receive, send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/prometheus_fastapi_instrumentator/middleware.py", line 177, in __call__
(APIServer pid=7)     |     raise exc
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/prometheus_fastapi_instrumentator/middleware.py", line 175, in __call__
(APIServer pid=7)     |     await self.app(scope, receive, send_wrapper)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/exceptions.py", line 63, in __call__
(APIServer pid=7)     |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
(APIServer pid=7)     |     raise exc
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app
(APIServer pid=7)     |     await app(scope, receive, sender)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
(APIServer pid=7)     |     await self.app(scope, receive, send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 716, in __call__
(APIServer pid=7)     |     await self.middleware_stack(scope, receive, send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 736, in app
(APIServer pid=7)     |     await route.handle(scope, receive, send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 290, in handle
(APIServer pid=7)     |     await self.app(scope, receive, send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 130, in app
(APIServer pid=7)     |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
(APIServer pid=7)     |     raise exc
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app
(APIServer pid=7)     |     await app(scope, receive, sender)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 117, in app
(APIServer pid=7)     |     await response(scope, receive, send)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 269, in __call__
(APIServer pid=7)     |     with collapse_excgroups():
(APIServer pid=7)     |          ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)     |   File "/usr/lib/python3.12/contextlib.py", line 158, in __exit__
(APIServer pid=7)     |     self.gen.throw(value)
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/_utils.py", line 87, in collapse_excgroups
(APIServer pid=7)     |     raise exc
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 273, in wrap
(APIServer pid=7)     |     await func()
(APIServer pid=7)     |   File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 253, in stream_response
(APIServer pid=7)     |     async for chunk in self.body_iterator:
(APIServer pid=7)     |   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/openai/serving_speech.py", line 646, in _generate_audio_chunks
(APIServer pid=7)     |     async for res in generator:
(APIServer pid=7)     |   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 437, in generate
(APIServer pid=7)     |     async for output in self._process_async_results(
(APIServer pid=7)     |   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 597, in _process_async_results
(APIServer pid=7)     |     engine_outputs, finished, output_to_yield = self._process_single_result(
(APIServer pid=7)     |                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)     |   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 745, in _process_single_result
(APIServer pid=7)     |     raise RuntimeError(result)
(APIServer pid=7)     | RuntimeError: {'request_id': 'speech-beacc4826ddfc154', 'stage_id': 0, 'error': 'EngineCore encountered an issue. See stack trace (above) for the root cause.'}
(APIServer pid=7)     +------------------------------------
(APIServer pid=7) 
(APIServer pid=7) During handling of the above exception, another exception occurred:
(APIServer pid=7) 
(APIServer pid=7) Traceback (most recent call last):
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 416, in run_asgi
(APIServer pid=7)     result = await app(  # type: ignore[func-returns-value]
(APIServer pid=7)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
(APIServer pid=7)     return await self.app(scope, receive, send)
(APIServer pid=7)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/fastapi/applications.py", line 1160, in __call__
(APIServer pid=7)     await super().__call__(scope, receive, send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/applications.py", line 107, in __call__
(APIServer pid=7)     await self.middleware_stack(scope, receive, send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/errors.py", line 186, in __call__
(APIServer pid=7)     raise exc
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/errors.py", line 164, in __call__
(APIServer pid=7)     await self.app(scope, receive, _send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/cors.py", line 87, in __call__
(APIServer pid=7)     await self.app(scope, receive, send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/prometheus_fastapi_instrumentator/middleware.py", line 177, in __call__
(APIServer pid=7)     raise exc
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/prometheus_fastapi_instrumentator/middleware.py", line 175, in __call__
(APIServer pid=7)     await self.app(scope, receive, send_wrapper)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/exceptions.py", line 63, in __call__
(APIServer pid=7)     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
(APIServer pid=7)     raise exc
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app
(APIServer pid=7)     await app(scope, receive, sender)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
(APIServer pid=7)     await self.app(scope, receive, send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 716, in __call__
(APIServer pid=7)     await self.middleware_stack(scope, receive, send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 736, in app
(APIServer pid=7)     await route.handle(scope, receive, send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 290, in handle
(APIServer pid=7)     await self.app(scope, receive, send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 130, in app
(APIServer pid=7)     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
(APIServer pid=7)     raise exc
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app
(APIServer pid=7)     await app(scope, receive, sender)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 117, in app
(APIServer pid=7)     await response(scope, receive, send)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 269, in __call__
(APIServer pid=7)     with collapse_excgroups():
(APIServer pid=7)          ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/usr/lib/python3.12/contextlib.py", line 158, in __exit__
(APIServer pid=7)     self.gen.throw(value)
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/_utils.py", line 87, in collapse_excgroups
(APIServer pid=7)     raise exc
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 273, in wrap
(APIServer pid=7)     await func()
(APIServer pid=7)   File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 253, in stream_response
(APIServer pid=7)     async for chunk in self.body_iterator:
(APIServer pid=7)   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/openai/serving_speech.py", line 646, in _generate_audio_chunks
(APIServer pid=7)     async for res in generator:
(APIServer pid=7)   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 437, in generate
(APIServer pid=7)     async for output in self._process_async_results(
(APIServer pid=7)   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 597, in _process_async_results
(APIServer pid=7)     engine_outputs, finished, output_to_yield = self._process_single_result(
(APIServer pid=7)                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=7)   File "/vllm-workspace/vllm-omni/vllm_omni/entrypoints/async_omni.py", line 745, in _process_single_result
(APIServer pid=7)     raise RuntimeError(result)
(APIServer pid=7) RuntimeError: {'request_id': 'speech-beacc4826ddfc154', 'stage_id': 0, 'error': 'EngineCore encountered an issue. See stack trace (above) for the root cause.'}

JuanPZuluaga · 2026-03-16T10:21:12Z

Hello, my configuration is:

# Qwen3-TTS batch_size=4 config (streaming with async_chunk)
# Enables concurrent request processing with max_inflight=4
# 2-stage pipeline: Talker -> Code2Wav
async_chunk: true
stage_args:
  - stage_id: 0
    stage_type: llm
    runtime:
      devices: "0"
      max_batch_size: 16
    engine_args:
      model_stage: qwen3_tts
      model_arch: Qwen3TTSTalkerForConditionalGeneration
      hf_overrides:
        architectures: [Qwen3TTSTalkerForConditionalGeneration]
      worker_type: ar
      scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
      enforce_eager: false
      trust_remote_code: true
      async_scheduling: false
      enable_prefix_caching: false
      engine_output_type: latent
      gpu_memory_utilization: 0.3
      distributed_executor_backend: "mp"
      max_num_batched_tokens: 4096
      max_model_len: 4096
      custom_process_next_stage_input_func: vllm_omni.model_executor.stage_input_processors.qwen3_tts.talker2code2wav_async_chunk
    output_connectors:
      to_stage_1: connector_of_shared_memory
    default_sampling_params:
      temperature: 0.9
      top_k: 50
      max_tokens: 4096
      seed: 42
      detokenize: false
      repetition_penalty: 1.05
      stop_token_ids: [2150]

  - stage_id: 1
    stage_type: llm
    runtime:
      devices: "0"
      max_batch_size: 16
    engine_args:
      model_stage: code2wav
      model_arch: Qwen3TTSCode2Wav
      hf_overrides:
        architectures: [Qwen3TTSCode2Wav]
      worker_type: generation
      scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
      enforce_eager: true
      trust_remote_code: true
      async_scheduling: false
      enable_prefix_caching: false
      engine_output_type: audio
      gpu_memory_utilization: 0.2
      distributed_executor_backend: "mp"
      max_num_batched_tokens: 8192
      max_model_len: 32768
    engine_input_source: [0]
    final_output: true
    final_output_type: audio
    input_connectors:
      from_stage_0: connector_of_shared_memory
    tts_args:
      max_instructions_length: 500
    default_sampling_params:
      temperature: 0.0
      top_p: 1.0
      top_k: -1
      max_tokens: 65536
      seed: 42
      detokenize: true
      repetition_penalty: 1.0

runtime:
  enabled: true
  defaults:
    window_size: -1
    max_inflight: 16

  connectors:
    connector_of_shared_memory:
      name: SharedMemoryConnector
      extra:
        shm_threshold_bytes: 65536
        codec_streaming: true
        connector_get_sleep_s: 0.01
        connector_get_max_wait_first_chunk: 3000
        connector_get_max_wait: 300
        codec_chunk_frames: 32
        codec_left_context_frames: 32

  edges:
    - from: 0
      to: 1
      window_size: -1

could you try this to see whether it solves your issue? @univa-HARRY

univa-HARRY · 2026-03-17T00:24:41Z

@JuanPZuluaga

I guess it was the issue of max_num_batched_tokens in stage 0.
setting max_num_batched_tokens: 512 -> 8192 / max_model_len: 4096 -> 8192
resolves the problem. Thank you.

Sy0307 · 2026-03-17T12:08:07Z

Need more consideration. torch.compile(mode="reduce-overhead", dynamic=False) may have crash with enforce_eager: false as repeat cuda_graph capture/replay issue. I have tried this so plz think twice. @JuanPZuluaga

JuanPZuluaga · 2026-03-17T15:10:48Z

Need more consideration. torch.compile(mode="reduce-overhead", dynamic=False) may have crash with enforce_eager: false as repeat cuda_graph capture/replay issue. I have tried this so plz think twice. @JuanPZuluaga

@Sy0307 good point, this would be an issue if the compiled module would be inside CUDAGraphWrapper scope. But the CodePredictor module is explicitly excluded from vLLM's CUDA graphs (_cudagraph_mode = CUDAGraphMode.NONE in _talker_mtp_forward), so the two graph systems are independent and should not conflict each other i think. it's kind of verified in the benchmark with different concu=4/8/16 also with enforce_eager: false on the Talker.

However, if there's an specific experiment that you'd like me to try, i could that as well.

Sy0307 · 2026-03-17T16:17:02Z

Need more consideration. torch.compile(mode="reduce-overhead", dynamic=False) may have crash with enforce_eager: false as repeat cuda_graph capture/replay issue. I have tried this so plz think twice. @JuanPZuluaga

@Sy0307 good point, this would be an issue if the compiled module would be inside CUDAGraphWrapper scope. But the CodePredictor module is explicitly excluded from vLLM's CUDA graphs (_cudagraph_mode = CUDAGraphMode.NONE in _talker_mtp_forward), so the two graph systems are independent and should not conflict each other i think. it's kind of verified in the benchmark with different concu=4/8/16 also with enforce_eager: false on the Talker.

However, if there's an specific experiment that you'd like me to try, i could that as well.

Make sense. This addresses the main concern. Thanks for the improvement.

univa-HARRY · 2026-03-18T08:17:27Z

@JuanPZuluaga
Your PR and the other two optimization PRs you mentioned are independent of each other, so if all three are applied, is it correct to expect a multiplicative improvement in latency?

Also, when do you expect these optimizations to be incorporated?

JuanPZuluaga · 2026-03-18T11:14:46Z

@JuanPZuluaga Your PR and the other two optimization PRs you mentioned are independent of each other, so if all three are applied, is it correct to expect a multiplicative improvement in latency?

Also, when do you expect these optimizations to be incorporated?

I think these optimizations should be added soon.

linyueqian · 2026-03-18T16:47:06Z

I was looking at the change from growing seq_len to always passing max_seq=17, and one thing concerns me: proj_buf is pre-allocated and never zeroed between requests. The old code only passed proj_buf[:bsz, :step+1, :] so stale data was invisible. Now the full proj_buf[:padded_bsz, :max_seq, :] goes into the compiled forward every step, meaning positions step+2 through 16 still hold leftover embeddings from whatever ran in that batch slot last time. The causal mask should prevent attention to those positions, but if there's any off-by-one in the mask, the stale values would silently corrupt the output with no error, just subtly wrong audio. Might be worth adding a proj_buf[:padded_bsz].zero_() at the top of each forward call, or at least a test that verifies identical output between the old growing-window path and the new fixed-window path across consecutive requests with different batch sizes.

… feat/talker-cuda-graph-batched

JuanPZuluaga · 2026-03-19T08:08:19Z

I am adding the new results after merging main:

EDIT: updated the benchmark results

I used this YAML -- relevant params changed in YAML:

  - stage_id: 0
      max_batch_size: 16
      max_num_batched_tokens: 4096
  - stage_id: 1
      max_batch_size: 16
....
    max_inflight: 16
....
        codec_chunk_frames: 25
        codec_left_context_frames: 25

… feat/talker-cuda-graph-batched

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

… feat/talker-cuda-graph-batched

JuanPZuluaga · 2026-03-19T12:22:32Z

just added the proj_buf[:padded_bsz].zero_() to the code. @linyueqian

linyueqian

lgtm

### vllm-omni-audio-tts - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-perf - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-api - Source: [PR #2058](vllm-project/vllm-omni#2058) - [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection - Changes: - Bug fix: [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection ### vllm-omni-contrib - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-cicd - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-api - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-perf - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-contrib - Source: [PR #2038](vllm-project/vllm-omni#2038) - [Doc] Update docs and dockerfiles for rebase of vllm v0.18.0 ### vllm-omni-serving - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-contrib - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-api - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2032](vllm-project/vllm-omni#2032) - [CI] Change Bagel online test environment variable `VLLM_TEST_CLEAN_GPU_MEMORY` to `0` ### vllm-omni-cicd - Source: [PR #2031](vllm-project/vllm-omni#2031) - [CI] Fix test. - Changes: - Bug fix: [CI] Fix test. ### vllm-omni-cicd - Source: [PR #2017](vllm-project/vllm-omni#2017) - [CI] [ROCm] Setup `test-ready.yml` and `test-merge.yml` ### vllm-omni-cicd - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-perf - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-serving - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-image-gen - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-perf - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-serving - Source: [PR #2009](vllm-project/vllm-omni#2009) - [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni - Changes: - Bug fix: [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni ### vllm-omni-image-gen - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images - Additions: - Qwen-Image-Layered - Qwen-Image-Layered - Qwen-Image-Layered ### vllm-omni-api - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images ### vllm-omni-cicd - Source: [PR #1998](vllm-project/vllm-omni#1998) - [CI] Split BAGEL tests into dummy/real weight tiers (L2/L3) ### vllm-omni-serving - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-audio-tts - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-perf - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-serving - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-api - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-serving - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-cicd - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-api - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Additions: - `/v1/chat/completions` ### vllm-omni-perf - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) ### vllm-omni-contrib - Source: [PR #1976](vllm-project/vllm-omni#1976) - [skip ci][Docs] Update WeChat QR code (fix filename case) - Changes: - Bug fix: [skip ci][Docs] Update WeChat QR code (fix filename case) ### vllm-omni-contrib - Source: [PR #1974](vllm-project/vllm-omni#1974) - [Docs] Update WeChat QR code for community support ### vllm-omni-cicd - Source: [PR #1945](vllm-project/vllm-omni#1945) - Fix Base voice clone streaming quality and stop-token crash - Changes: - Bug fix: Fix Base voice clone streaming quality and stop-token crash ### vllm-omni-cicd - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1934](vllm-project/vllm-omni#1934) - Fix OmniGen2 transformer config loading for HF models - Changes: - Bug fix: Fix OmniGen2 transformer config loading for HF models ### vllm-omni-audio-tts - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-perf - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-audio-tts - Source: [PR #1926](vllm-project/vllm-omni#1926) - [Misc] removed qwen3_tts.py as it is out-dated ### vllm-omni-contrib - Source: [PR #1920](vllm-project/vllm-omni#1920) - [Docs] Add Wan2.1-T2V as supported video generation models - Changes: - New feature: [Docs] Add Wan2.1-T2V as supported video generation models ### vllm-omni-video-gen - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-perf - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-audio-tts - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-perf - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-api - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-perf - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-contrib - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-serving - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-cicd - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-image-gen - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-contrib - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-distributed - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-quantization - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-cicd - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-perf - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-contrib - Source: [PR #1890](vllm-project/vllm-omni#1890) - [NPU] Upgrade to v0.17.0 ### vllm-omni-contrib - Source: [PR #1889](vllm-project/vllm-omni#1889) - Add `Governance` section - Changes: - New feature: Add `Governance` section ### vllm-omni-distributed - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism ### vllm-omni-cicd - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism

…erhead and dynamic False (vllm-project#1913) Signed-off-by: JuanPZuluaga <juanz9312@gmal.com> Signed-off-by: JuanPZuluaga <juanz9312@gmail.com> Co-authored-by: JuanPZuluaga <juanz9312@gmal.com>

move to torch.compile reduce-overhead with fixed-shape CUDA graphs

01ce53a

Signed-off-by: JuanPZuluaga <juanz9312@gmal.com>

JuanPZuluaga requested a review from hsliuustc0106 as a code owner March 16, 2026 08:12

JuanPZuluaga changed the title ~~[Optim][Qwen3TTS][CodePredictor] support torch.compile reduce-overhead with fixed-shapes~~ [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False Mar 16, 2026

chatgpt-codex-connector Bot reviewed Mar 16, 2026

View reviewed changes

JuanPZuluaga added 2 commits March 16, 2026 08:17

clean a bit and change setup_compile

3823e37

Signed-off-by: JuanPZuluaga <juanz9312@gmal.com>

revert docstring

9623f54

Signed-off-by: JuanPZuluaga <juanz9312@gmal.com>

linyueqian self-requested a review March 16, 2026 17:10

Merge branch 'main' into feat/talker-cuda-graph-batched

d0f2354

Merge branch 'main' into feat/talker-cuda-graph-batched

bd85ed8

Merge branch 'main' into feat/talker-cuda-graph-batched

736bce8

linyueqian added the ready label to trigger buildkite CI label Mar 18, 2026

linyueqian added this to the v0.18.0 milestone Mar 18, 2026

linyueqian mentioned this pull request Mar 18, 2026

[RFC]: TTS Development Roadmap - March 2026 #1795

Open

JuanPZuluaga and others added 2 commits March 19, 2026 05:15

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

8d95b45

… feat/talker-cuda-graph-batched

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

7bc050a

… feat/talker-cuda-graph-batched

JuanPZuluaga added 3 commits March 19, 2026 08:22

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

a831062

… feat/talker-cuda-graph-batched

comments/ zero proj_buf

28aebab

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

4cc8eb7

… feat/talker-cuda-graph-batched

linyueqian approved these changes Mar 19, 2026

View reviewed changes

hsliuustc0106 merged commit b4342fb into vllm-project:main Mar 20, 2026
7 checks passed

JuanPZuluaga deleted the feat/talker-cuda-graph-batched branch March 20, 2026 05:09

evezhier mentioned this pull request Mar 20, 2026

Talker CUDA graph for Qwen-3 TTS #1925

Closed

5 tasks

JuanPZuluaga mentioned this pull request Mar 21, 2026

[BugFix][Qwen3TTS] CodePredictor CudaGraph Pool #2059

Merged

5 tasks

NickCao mentioned this pull request Mar 30, 2026

[Cleanup] Remove dead runtime.defaults config parameters #2343

Merged

5 tasks

		lm_heads = self._lm_heads_list
		codec_embeds = self._codec_embeds_list

Conversation

JuanPZuluaga commented Mar 16, 2026

Purpose

Test Plan

Test Result

Benchmark Results

Improvement (cuda_graph vs main)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

univa-HARRY commented Mar 16, 2026

Uh oh!

JuanPZuluaga commented Mar 16, 2026

Uh oh!

univa-HARRY commented Mar 17, 2026

Uh oh!

Sy0307 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JuanPZuluaga commented Mar 17, 2026

Uh oh!

Sy0307 commented Mar 17, 2026

Uh oh!

univa-HARRY commented Mar 18, 2026

Uh oh!

JuanPZuluaga commented Mar 18, 2026

Uh oh!

linyueqian commented Mar 18, 2026

Uh oh!

JuanPZuluaga commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JuanPZuluaga commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Sy0307 commented Mar 17, 2026 •

edited

Loading

JuanPZuluaga commented Mar 19, 2026 •

edited

Loading

JuanPZuluaga commented Mar 19, 2026 •

edited

Loading