Fix vllm-qwen35: switch to cu130-nightly for SM 12.1 compat by toku345 · Pull Request #10 · toku345/dgx-llm-serve

toku345 · 2026-03-01T09:23:07Z

Summary

Switch vllm-qwen35 image from vllm/vllm-openai:qwen3_5-cu130 to vllm/vllm-openai:cu130-nightly
The previous qwen3_5-cu130 image (02-23 build) crashed with a Triton kernel error on GB10 (SM 12.1)
The cu130-nightly image (03-01 build, commit afd089f2) includes fixes for both the Triton issue and the RMSNormGated bug (vllm-project/vllm#35423)
Update CLAUDE.md to reflect the image change and SM 12.1 MoE backend info

Verification

docker compose --profile qwen35 up — started without crash
/v1/chat/completions POST — correct response with reasoning separation
Model loaded: 33.38 GiB (FP8), KV cache: 71.96 GiB (943K tokens)
MoE backend: TRITON Fp8 (auto-selected for SM 12.1)

🤖 Generated with Claude Code

The previous qwen3_5-cu130 image (02-23 build) crashed with a Triton kernel error on GB10 (SM 12.1). The cu130-nightly image (03-01 build, commit afd089f2) includes fixes for both the Triton issue and the RMSNormGated bug (vllm-project/vllm#35423). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-01T09:23:18Z

📝 Walkthrough

Walkthrough

vLLMのQwen3.5-35B-A3B-FP8イメージリファレンスをvllm/vllm-openai:cu130-nightly に更新しました。NG C 26.01互換性、cu130ビルド問題、TRITON Fp8 MoEバックエンド自動選択、ツール呼び出し機能、メモリ利用率設定などに関する説明を追加しました。

Changes

Cohort / File(s)	Summary
vLLM Qwen3.5イメージ更新 `CLAUDE.md`, `backends/vllm/compose.yml`	vllm-qwen35サービスイメージをvllm/vllm-openai:cu130-nightlyに更新。NG C 26.01互換性、Triton/RMSNormGatedバグ対策、cu130ビルド問題に関するコメントと設定パラメータ（--gpu-memory-utilization 0.9、--max-model-len 32768）を追加。Forward Compatibility制約（ドライバー580でCUDA 13.1コンテナを同時実行不可）の注記を含む。

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

Add Qwen3.5-35B-A3B-FP8 (qwen35) profile to vLLM backend #8 - vllm Qwen3.5プロフィールの更新とvllmイメージ参照の変更を行っており、本PRと共通の変更領域がある。

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: switching the vllm-qwen35 image to cu130-nightly for SM 12.1 compatibility, which directly aligns with the changeset modifications.
Description check	✅ Passed	The description comprehensively details the image switch, the reason for the change (Triton kernel error), the specific fixes in the new image, and verification steps performed, all directly related to the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/vllm-qwen35-cu130-nightly

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

toku345 · 2026-03-01T11:27:38Z

@coderabbitai review please

coderabbitai · 2026-03-01T11:28:03Z

@toku345: I'll review the changes for you. I'll examine the image switch from qwen3_5-cu130 to cu130-nightly for SM 12.1 compatibility and the CLAUDE.md updates.

🧠 Learnings used

Learnt from: CR
Repo: toku345/dgx-llm-serve PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T04:03:19.782Z
Learning: Applies to backends/vllm/**/*qwen35*.{yml,yaml} : vLLM Qwen3.5-35B-A3B-FP8 では `qwen35` プロファイルを使用し、vLLM upstream イメージを使用する（NGC 26.01 は `qwen3_5_moe` 未対応）

Learnt from: CR
Repo: toku345/dgx-llm-serve PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T04:03:19.782Z
Learning: Applies to backends/trtllm/**/docker-compose*.{yml,yaml} : TensorRT-LLM イメージは `1.3.0rc3` (ARM64 対応) を使用し、SM120 `cudaErrorIllegalInstruction` エラーに対応済みであることを確認する

Learnt from: CR
Repo: toku345/dgx-llm-serve PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T04:03:19.782Z
Learning: Applies to backends/vllm/**/*multi*.{yml,yaml,json} : vLLM multi プロファイルで Qwen (25.11) と Nemotron (26.01) を使用する場合、異なるイメージを使用し、ツール呼び出しは無効化される

Learnt from: CR
Repo: toku345/dgx-llm-serve PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T04:03:19.782Z
Learning: Applies to backends/vllm/**/*26.01*.{yml,yaml,json} : ドライバ 580 環境では 26.01 (CUDA 13.1) コンテナは同時に 1 つまでしか起動できない。26.01 × 2 の並行起動は不可（Forward Compat 制約）

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backends/vllm/compose.yml`:
- Line 68: Replace the mutable image tag "vllm/vllm-openai:cu130-nightly" with
the immutable digest form to ensure reproducibility; update the image reference
in compose.yml from "vllm/vllm-openai:cu130-nightly" to
"vllm/vllm-openai@sha256:44486041d89ec04a2184e6136e9472dd2d8f19794f47a98e0ea7aed71d90a6c3"
so Docker Compose pulls the exact image digest rather than a moving tag.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a067758 and 3722b64.

📒 Files selected for processing (2)

CLAUDE.md
backends/vllm/compose.yml

toku345 and others added 2 commits March 1, 2026 18:11

Update CLAUDE.md: reflect cu130-nightly image and MoE backend info

3722b64

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

toku345 self-assigned this Mar 1, 2026

coderabbitai Bot reviewed Mar 1, 2026

View reviewed changes

Comment thread backends/vllm/compose.yml

toku345 merged commit 99d0842 into main Mar 2, 2026
3 checks passed

toku345 deleted the fix/vllm-qwen35-cu130-nightly branch March 2, 2026 07:49

toku345 mentioned this pull request Mar 2, 2026

Pin vllm-qwen35 image to a stable release (replace cu130-nightly) #11

Closed

3 tasks

coderabbitai Bot mentioned this pull request Apr 24, 2026

refactor: curate to single vLLM qwen36 profile (Qwen3.6-35B-A3B-FP8) #40

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix vllm-qwen35: switch to cu130-nightly for SM 12.1 compat#10

Fix vllm-qwen35: switch to cu130-nightly for SM 12.1 compat#10
toku345 merged 2 commits into
mainfrom
fix/vllm-qwen35-cu130-nightly

toku345 commented Mar 1, 2026

Uh oh!

coderabbitai Bot commented Mar 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

toku345 commented Mar 1, 2026

Uh oh!

coderabbitai Bot commented Mar 1, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

toku345 commented Mar 1, 2026

Summary

Verification

Uh oh!

coderabbitai Bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

toku345 commented Mar 1, 2026

Uh oh!

coderabbitai Bot commented Mar 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Mar 1, 2026 •

edited

Loading