[NVIDIA] Add Kimi K2.5 NVFP4 GB200 disaggregated vLLM benchmarks via Dynamo#1008
Merged
Claude / Claude Code Review
completed
Apr 6, 2026 in 10m 8s
Code review found 2 potential issues
Found 5 candidates, confirmed 2. See review comments for details.
Details
| Severity | Count |
|---|---|
| 🔴 Important | 0 |
| 🟡 Nit | 2 |
| 🟣 Pre-existing | 0 |
| Severity | File:Line | Issue |
|---|---|---|
| 🟡 Nit | runners/launch_gb200-nv.sh:42-43 |
Misleading error message omits precision constraint for dynamo-vllm |
| 🟡 Nit | .github/configs/nvidia-master.yaml:6694 |
Missing concurrency 64 in 8k1k 1P4D sweep |
Annotations
Check warning on line 43 in runners/launch_gb200-nv.sh
claude / Claude Code Review
Misleading error message omits precision constraint for dynamo-vllm
The error message on line 42 says "Supported prefixes for dynamo-vllm: kimik2.5" but the guard condition checks both MODEL_PREFIX=="kimik2.5" AND PRECISION=="fp4", so a user passing MODEL_PREFIX=kimik2.5 with PRECISION=fp8 will see a self-contradictory message that lists kimik2.5 as both unsupported and supported, hiding the real problem (unsupported precision). The message should say "Supported combinations for dynamo-vllm: kimik2.5/fp4" to clearly communicate the constraint.
Check warning on line 6694 in .github/configs/nvidia-master.yaml
claude / Claude Code Review
Missing concurrency 64 in 8k1k 1P4D sweep
The 8k1k 1P4D search-space has `conc-list: [4, 8, 16, 32, 128]`, skipping 64, while the structurally identical 1k1k 1P4D config has `[4, 8, 16, 32, 64, 128]`. This creates a gap in the performance curve at concurrency=64 for the low-latency 8k1k configuration and appears to be a copy-paste oversight.
Loading