Skip to content

RFC: Qwen3 TTS optimization plan#1

Closed
marksverdhei wants to merge 1 commit into
htfrom
docs/qwen3-tts-optimization-plan
Closed

RFC: Qwen3 TTS optimization plan#1
marksverdhei wants to merge 1 commit into
htfrom
docs/qwen3-tts-optimization-plan

Conversation

@marksverdhei

Copy link
Copy Markdown

Summary

  • Design document proposing a multi-phase optimization plan for Qwen3 TTS inference
  • Analyzes the current single-stage monolithic architecture and its bottlenecks
  • Proposes phased approach: code predictor optimization, multi-stage decomposition, streaming audio, and batched inference

Key Bottleneck

The code predictor runs 31 sequential transformer forward passes per generated token (codebooks 2-32). A 10-second utterance at 12.5 Hz = 125 tokens × 31 passes = 3,875 forward passes through the 5-layer code predictor.

Proposed Phases

Phase Description Impact
1a CUDA graphs for code predictor 15-30% latency reduction
1b KV cache in code predictor inner loop 20-40% latency reduction
2 Multi-stage decomposition (talker + decoder) Structural prerequisite for streaming/batching
3 Streaming audio output via async chunks First-chunk latency, progressive delivery
4 Batched and continuous batching Throughput under concurrent load

Test plan

  • Review design document for completeness and feasibility
  • Validate bottleneck analysis against profiling data
  • Confirm CUDA graph compatibility with code predictor loop
  • Assess quality impact of chunked streaming decode

🤖 Generated with Claude Code

Design document proposing multi-phase optimizations for Qwen3 TTS
inference: CUDA graphs, code predictor KV caching, multi-stage
decomposition, streaming audio output, and batched inference.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@marksverdhei marksverdhei changed the base branch from main to ht January 29, 2026 16:28
@marksverdhei

Copy link
Copy Markdown
Author

Merged into ht branch.

@marksverdhei marksverdhei deleted the docs/qwen3-tts-optimization-plan branch February 12, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant