Skip to content

Fix vllm-qwen35: switch to cu130-nightly for SM 12.1 compat#10

Merged
toku345 merged 2 commits into
mainfrom
fix/vllm-qwen35-cu130-nightly
Mar 2, 2026
Merged

Fix vllm-qwen35: switch to cu130-nightly for SM 12.1 compat#10
toku345 merged 2 commits into
mainfrom
fix/vllm-qwen35-cu130-nightly

Conversation

@toku345
Copy link
Copy Markdown
Owner

@toku345 toku345 commented Mar 1, 2026

Summary

  • Switch vllm-qwen35 image from vllm/vllm-openai:qwen3_5-cu130 to vllm/vllm-openai:cu130-nightly
  • The previous qwen3_5-cu130 image (02-23 build) crashed with a Triton kernel error on GB10 (SM 12.1)
  • The cu130-nightly image (03-01 build, commit afd089f2) includes fixes for both the Triton issue and the RMSNormGated bug (vllm-project/vllm#35423)
  • Update CLAUDE.md to reflect the image change and SM 12.1 MoE backend info

Verification

  • docker compose --profile qwen35 up — started without crash
  • /v1/chat/completions POST — correct response with reasoning separation
  • Model loaded: 33.38 GiB (FP8), KV cache: 71.96 GiB (943K tokens)
  • MoE backend: TRITON Fp8 (auto-selected for SM 12.1)

🤖 Generated with Claude Code

toku345 and others added 2 commits March 1, 2026 18:11
The previous qwen3_5-cu130 image (02-23 build) crashed with a Triton
kernel error on GB10 (SM 12.1). The cu130-nightly image (03-01 build,
commit afd089f2) includes fixes for both the Triton issue and the
RMSNormGated bug (vllm-project/vllm#35423).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 1, 2026

📝 Walkthrough

Walkthrough

vLLMのQwen3.5-35B-A3B-FP8イメージリファレンスをvllm/vllm-openai:cu130-nightly に更新しました。NG C 26.01互換性、cu130ビルド問題、TRITON Fp8 MoEバックエンド自動選択、ツール呼び出し機能、メモリ利用率設定などに関する説明を追加しました。

Changes

Cohort / File(s) Summary
vLLM Qwen3.5イメージ更新
CLAUDE.md, backends/vllm/compose.yml
vllm-qwen35サービスイメージをvllm/vllm-openai:cu130-nightlyに更新。NG C 26.01互換性、Triton/RMSNormGatedバグ対策、cu130ビルド問題に関するコメントと設定パラメータ(--gpu-memory-utilization 0.9、--max-model-len 32768)を追加。Forward Compatibility制約(ドライバー580でCUDA 13.1コンテナを同時実行不可)の注記を含む。

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: switching the vllm-qwen35 image to cu130-nightly for SM 12.1 compatibility, which directly aligns with the changeset modifications.
Description check ✅ Passed The description comprehensively details the image switch, the reason for the change (Triton kernel error), the specific fixes in the new image, and verification steps performed, all directly related to the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/vllm-qwen35-cu130-nightly

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@toku345
Copy link
Copy Markdown
Owner Author

toku345 commented Mar 1, 2026

@coderabbitai review please

@toku345 toku345 self-assigned this Mar 1, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 1, 2026

@toku345: I'll review the changes for you. I'll examine the image switch from qwen3_5-cu130 to cu130-nightly for SM 12.1 compatibility and the CLAUDE.md updates.


🧠 Learnings used
Learnt from: CR
Repo: toku345/dgx-llm-serve PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T04:03:19.782Z
Learning: Applies to backends/vllm/**/*qwen35*.{yml,yaml} : vLLM Qwen3.5-35B-A3B-FP8 では `qwen35` プロファイルを使用し、vLLM upstream イメージを使用する(NGC 26.01 は `qwen3_5_moe` 未対応)

Learnt from: CR
Repo: toku345/dgx-llm-serve PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T04:03:19.782Z
Learning: Applies to backends/trtllm/**/docker-compose*.{yml,yaml} : TensorRT-LLM イメージは `1.3.0rc3` (ARM64 対応) を使用し、SM120 `cudaErrorIllegalInstruction` エラーに対応済みであることを確認する

Learnt from: CR
Repo: toku345/dgx-llm-serve PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T04:03:19.782Z
Learning: Applies to backends/vllm/**/*multi*.{yml,yaml,json} : vLLM multi プロファイルで Qwen (25.11) と Nemotron (26.01) を使用する場合、異なるイメージを使用し、ツール呼び出しは無効化される

Learnt from: CR
Repo: toku345/dgx-llm-serve PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T04:03:19.782Z
Learning: Applies to backends/vllm/**/*26.01*.{yml,yaml,json} : ドライバ 580 環境では 26.01 (CUDA 13.1) コンテナは同時に 1 つまでしか起動できない。26.01 × 2 の並行起動は不可(Forward Compat 制約)
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backends/vllm/compose.yml`:
- Line 68: Replace the mutable image tag "vllm/vllm-openai:cu130-nightly" with
the immutable digest form to ensure reproducibility; update the image reference
in compose.yml from "vllm/vllm-openai:cu130-nightly" to
"vllm/vllm-openai@sha256:44486041d89ec04a2184e6136e9472dd2d8f19794f47a98e0ea7aed71d90a6c3"
so Docker Compose pulls the exact image digest rather than a moving tag.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a067758 and 3722b64.

📒 Files selected for processing (2)
  • CLAUDE.md
  • backends/vllm/compose.yml

Comment thread backends/vllm/compose.yml
@toku345 toku345 merged commit 99d0842 into main Mar 2, 2026
3 checks passed
@toku345 toku345 deleted the fix/vllm-qwen35-cu130-nightly branch March 2, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant