Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions benchmarks/qwen3-tts/vllm_omni/configs/qwen3_tts_bs1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ stage_args:
engine_args:
model_stage: qwen3_tts
model_arch: Qwen3TTSTalkerForConditionalGeneration
hf_overrides:
architectures: [Qwen3TTSTalkerForConditionalGeneration]
worker_type: ar
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
enforce_eager: false
Expand Down Expand Up @@ -43,8 +41,6 @@ stage_args:
engine_args:
model_stage: code2wav
model_arch: Qwen3TTSCode2Wav
hf_overrides:
architectures: [Qwen3TTSCode2Wav]
worker_type: generation
scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
enforce_eager: true
Expand Down
4 changes: 0 additions & 4 deletions benchmarks/qwen3-tts/vllm_omni/configs/qwen3_tts_bs4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ stage_args:
engine_args:
model_stage: qwen3_tts
model_arch: Qwen3TTSTalkerForConditionalGeneration
hf_overrides:
architectures: [Qwen3TTSTalkerForConditionalGeneration]
worker_type: ar
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
enforce_eager: false
Expand Down Expand Up @@ -44,8 +42,6 @@ stage_args:
engine_args:
model_stage: code2wav
model_arch: Qwen3TTSCode2Wav
hf_overrides:
architectures: [Qwen3TTSCode2Wav]
worker_type: generation
scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
enforce_eager: true
Expand Down
4 changes: 0 additions & 4 deletions tests/perf/stage_configs/qwen3_tts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@ stage_args:
engine_args:
model_stage: qwen3_tts
model_arch: Qwen3TTSTalkerForConditionalGeneration
hf_overrides:
architectures: [Qwen3TTSTalkerForConditionalGeneration]
worker_type: ar
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
enforce_eager: false
Expand Down Expand Up @@ -46,8 +44,6 @@ stage_args:
engine_args:
model_stage: code2wav
model_arch: Qwen3TTSCode2Wav
hf_overrides:
architectures: [Qwen3TTSCode2Wav]
worker_type: generation
scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
enforce_eager: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ stage_args:
engine_args:
model_stage: fish_speech_slow_ar
model_arch: FishSpeechSlowARForConditionalGeneration
hf_overrides:
architectures: [FishSpeechSlowARForConditionalGeneration]
worker_type: ar
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
enforce_eager: false
Expand Down Expand Up @@ -43,8 +41,6 @@ stage_args:
engine_args:
model_stage: dac_decoder
model_arch: FishSpeechDACDecoder
hf_overrides:
architectures: [FishSpeechDACDecoder]
worker_type: generation
scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
enforce_eager: true
Expand Down
6 changes: 0 additions & 6 deletions vllm_omni/model_executor/stage_configs/qwen3_tts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@ stage_args:
engine_args:
model_stage: qwen3_tts
model_arch: Qwen3TTSTalkerForConditionalGeneration
# Force stage-specific registered architecture.
hf_overrides:
architectures: [Qwen3TTSTalkerForConditionalGeneration]
worker_type: ar
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
enforce_eager: false
Expand Down Expand Up @@ -43,9 +40,6 @@ stage_args:
engine_args:
model_stage: code2wav
model_arch: Qwen3TTSCode2Wav
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore architecture override for the Code2Wav stage

Removing hf_overrides.architectures from the Qwen3-TTS Code2Wav stage disables the config path that strips rope_parameters for this non-LLM decoder: Qwen3TTSConfig.get_text_config() only performs that strip when architectures contains Code2Wav (vllm_omni/model_executor/models/qwen3_tts/configuration_qwen3_tts.py), but OmniEngineArgs.create_model_config() sets hf_config.architectures only after OmniModelConfig initialization (vllm_omni/engine/arg_utils.py), while OmniModelConfig.__post_init__ already computes hf_text_config (vllm_omni/config/model.py). This means Code2Wav can run with mRoPE still enabled, which regresses stage-1 runtime behavior/latency for all qwen3_tts configs touched here.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement in this review is inaccurate. All changes can be kept, and there is no need to revert any of them. Below is a detailed analysis:

The Reviewer's Argument

The reviewer states that removing hf_overrides.architectures will cause Qwen3TTSConfig.get_text_config() to fail to strip rope_parameters, thereby causing Code2Wav to mistakenly enable mRoPE. They cited three points in the execution timeline:

  1. get_text_config() only strips rope_parameters when architectures includes "Code2Wav".
  2. create_model_config() sets hf_config.architectures after the initialization of OmniModelConfig.
  3. OmniModelConfig.__post_init__ has already computed hf_text_config during initialization.

Why This Argument is Invalid

Key Point 1: uses_mrope is a lazy property, not cached during __post_init__

    @property
    def uses_mrope(self) -> bool:
        return uses_mrope(self.hf_config)

This is a standard @property (not a @cached_property), meaning it is re-evaluated every time it is accessed.

Key Point 2: The uses_mrope() function calls config.get_text_config() every time

def _uses_mrope(config: PretrainedConfig) -> bool:
    rope_parameters = getattr(config, "rope_parameters", None)
    if rope_parameters is None:
        return False
    return "mrope_section" in rope_parameters

def uses_mrope(config: PretrainedConfig) -> bool:
    """Detect if the model with this config uses M-ROPE."""
    return (
        _uses_mrope(config)
        or _uses_mrope(config.get_text_config())
        or thinker_uses_mrope(config)
    )

Key Point 3: hf_config.architectures is correctly set before the model runner accesses uses_mrope

In create_model_config() within arg_utils.py, at line 209:

        omni_config.hf_config.architectures = omni_config.architectures

And the OmniModelConfig.architectures property returns [self.model_arch]:

    @property
    def architectures(self) -> list[str]:
        if self.model_arch is not None:
            return [self.model_arch]
        return super().architectures

Since model_arch: Qwen3TTSCode2Wav is preserved in the YAML, omni_config.architectures evaluates to ["Qwen3TTSCode2Wav"]. This is assigned to hf_config.architectures before create_model_config() returns.

Complete Timeline Analysis

  1. **During __post_init__**: hf_config.architectures holds its original value (from config.json), which does not include "Code2Wav". When hf_text_config is computed, rope_parameters is not yet stripped.
  2. Before create_model_config() returns (Line 209): hf_config.architectures = ["Qwen3TTSCode2Wav"] is set.
  3. During Model Runner initialization: model_config.uses_mrope is accessed → calls uses_mrope(hf_config) → calls hf_config.get_text_config(). By this point, self.architectures already contains "Code2Wav" → rope_parameters is correctly stripped → uses_mrope returns False.

The Impact of rope_parameters During __post_init__

The only place where hf_text_config.rope_parameters is used during __post_init__ is in _get_and_verify_max_len (to calculate max_model_len). However, the Code2Wav YAML explicitly specifies max_model_len: 32768, and the max_position_embeddings in the talker config is also 32768, so this validation will not fail.

The Case of FishSpeech

FishSpeechS2ProConfig.get_text_config() does not have a similar stripping logic for rope_parameters:

    def get_text_config(self, **kwargs) -> FishSpeechSlowARConfig:
        return self.text_config

Therefore, the removal of hf_overrides for FishSpeech has absolutely no impact.

Conclusion

All changes can be kept, and there is no need to revert any of them. While the timing issue pointed out by the reviewer does indeed exist during the __post_init__ phase (architectures has not been set at that point), it will not cause Code2Wav to mistakenly use mRoPE at runtime. This is because uses_mrope is lazily evaluated. When it is actually used by the model runner, hf_config.architectures has already been correctly set to ["Qwen3TTSCode2Wav"].

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use Claude-4.6-Opus-high review the review

# Force stage-specific registered architecture.
hf_overrides:
architectures: [Qwen3TTSCode2Wav]
worker_type: generation
scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
enforce_eager: true
Expand Down
6 changes: 0 additions & 6 deletions vllm_omni/model_executor/stage_configs/qwen3_tts_batch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ stage_args:
engine_args:
model_stage: qwen3_tts
model_arch: Qwen3TTSTalkerForConditionalGeneration
# Force stage-specific registered architecture.
hf_overrides:
architectures: [Qwen3TTSTalkerForConditionalGeneration]
worker_type: ar
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
enforce_eager: false
Expand Down Expand Up @@ -47,9 +44,6 @@ stage_args:
engine_args:
model_stage: code2wav
model_arch: Qwen3TTSCode2Wav
# Force stage-specific registered architecture.
hf_overrides:
architectures: [Qwen3TTSCode2Wav]
worker_type: generation
scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
enforce_eager: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ stage_args:
engine_args:
model_stage: qwen3_tts
model_arch: Qwen3TTSTalkerForConditionalGeneration
hf_overrides:
architectures: [Qwen3TTSTalkerForConditionalGeneration]
worker_type: ar
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
enforce_eager: false
Expand Down Expand Up @@ -38,8 +36,6 @@ stage_args:
engine_args:
model_stage: code2wav
model_arch: Qwen3TTSCode2Wav
hf_overrides:
architectures: [Qwen3TTSCode2Wav]
worker_type: generation
scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
enforce_eager: true
Expand Down
6 changes: 0 additions & 6 deletions vllm_omni/platforms/npu/stage_configs/qwen3_tts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@ stage_args:
engine_args:
model_stage: qwen3_tts
model_arch: Qwen3TTSTalkerForConditionalGeneration
# Force stage-specific registered architecture.
hf_overrides:
architectures: [Qwen3TTSTalkerForConditionalGeneration]
worker_type: ar
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
enforce_eager: true
Expand Down Expand Up @@ -43,9 +40,6 @@ stage_args:
engine_args:
model_stage: code2wav
model_arch: Qwen3TTSCode2Wav
# Force stage-specific registered architecture.
hf_overrides:
architectures: [Qwen3TTSCode2Wav]
worker_type: generation
scheduler_cls: vllm_omni.core.sched.omni_generation_scheduler.OmniGenerationScheduler
enforce_eager: true
Expand Down
Loading