Fix FP8 CUTLASS crash on SM12.1 (DGX Spark) by ageev · Pull Request #151 · eugr/spark-vllm-docker

ageev · 2026-03-29T19:58:10Z

Summary

Fixes "This kernel only supports sm120." crash when running FP8 models (e.g. Qwen3.5-35B-FP8) on DGX Spark (SM12.1)
Replaces enable_sm120_only with enable_sm120_family in two CUTLASS kernel files, allowing SM12.1 (__CUDA_ARCH__ == 1210) to pass the architecture guard
One-line sed applied at build time, before final compilation

This is a temporary workaround until the upstream fix is merged: vllm-project/vllm#35568

Details

vLLM's FP8 CUTLASS kernel wrappers use enable_sm120_only, which checks __CUDA_ARCH__ == 1200 and traps on any other arch. On SM12.1 (GB10, DGX Spark), __CUDA_ARCH__ is 1210, triggering cudaErrorLaunchFailure on every FP8 GEMM call.

enable_sm120_family already exists in the codebase (>= 1200 && < 1300) and is used by the blockwise dispatch path. This fix applies it to the two remaining files that still use enable_sm120_only.

See: https://github.com/saifgithub/vllm-gb10-sm121

Rebuild notes

Requires --rebuild-vllm since the fix is compiled into the binary. If upgrading from a previous build, clean Docker build cache first:

docker builder prune
./build-and-copy.sh --rebuild-vllm [--tf5] [-t vllm-node-tf5]

Note: --tf5 and -t vllm-node-tf5 are optional — the container name depends on your setup (default is vllm-node).

Test plan

Verified on DGX Spark with Qwen3.5-35B-A3B-FP8 — model loads and serves without crash
Confirmed enable_sm120_family symbols present in compiled _C.abi3.so

Fixes "This kernel only supports sm120." error seen when launching Qwen3.5-35B-FP8 model on the recent container build. vLLM's FP8 CUTLASS kernels use enable_sm120_only, which checks __CUDA_ARCH__ == 1200 and traps on SM12.1 where __CUDA_ARCH__ is 1210. Replace with enable_sm120_family (>= 1200 && < 1300) which already exists in the codebase. See: https://github.com/saifgithub/vllm-gb10-sm121 Note: after applying this fix, a full rebuild with --rebuild-vllm is required. If upgrading from a previous build, clean Docker build cache first to ensure stale compiled objects are not reused: docker builder prune (a more radical approach that also worked: docker system prune --all --volumes) ./build-and-copy.sh --rebuild-vllm --tf5 -t vllm-node-tf5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix FP8 CUTLASS crash on SM12.1 (DGX Spark)#151

Fix FP8 CUTLASS crash on SM12.1 (DGX Spark)#151
ageev wants to merge 1 commit intoeugr:mainfrom
ageev:fix-sm121-fp8-cutlass

ageev commented Mar 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ageev commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Rebuild notes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ageev commented Mar 29, 2026 •

edited

Loading