[Bugfix] Auto-configure TRITON_PTXAS_PATH for new GPU architectures#32704
[Bugfix] Auto-configure TRITON_PTXAS_PATH for new GPU architectures#32704danielostrow wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
6e0d8d4 to
2edfbec
Compare
|
Hi @danielostrow, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
2edfbec to
641bf35
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a helpful auto-configuration mechanism for TRITON_PTXAS_PATH to support new GPU architectures. The implementation is sound and the accompanying tests are thorough. My review focuses on improving the debuggability of the new logic by adding logging to error-handling paths that currently fail silently. These changes will make it easier to diagnose issues if the auto-configuration does not behave as expected.
Triton bundles a ptxas binary from CUDA 12.8 that does not support
GPU architectures sm_110a (Jetson Thor) or sm_121a (DGX Spark GB10).
This causes Triton kernel compilation to fail with:
ptxas fatal: Value 'sm_121a' is not defined for option 'gpu-name'
This change adds automatic detection of new GPU architectures using
Triton's native backend detection and configures TRITON_PTXAS_PATH
to use the system CUDA toolkit's ptxas when needed.
The fix:
- Uses triton.backends.backends to detect GPU architecture
- Sets TRITON_PTXAS_PATH for GPUs with arch >= 110 (CC 11.0+)
- Respects user-configured TRITON_PTXAS_PATH if already set
- Fails gracefully if detection is unavailable
Tested on NVIDIA GB10 (DGX Spark) with CUDA 13.0 and Triton 3.5.1.
Related issues: vllm-project#31269, vllm-project#29469, vllm-project#32093
Signed-off-by: Daniel Ostrow <daniel@neuralintellect.com>
641bf35 to
ad16030
Compare
|
How are we looking on this? is it relevant? |
|
I'm experiencing the same problem and hope this PR can be merged as soon as possible. |
There was a problem hiding this comment.
Please note-- this script acts as a handoff when Triton updates are behind CUDA updates. in the future this handoff may be deemed irrelevant on Triton update
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas instead of Triton's bundled version (CUDA 12.8) which doesn't support sm_121a. This ensures Triton kernels compile correctly on DGX Spark GB10. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: seli-equinix <seli@equinix.com>
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas instead of Triton's bundled version (CUDA 12.8) which doesn't support sm_121a. This ensures Triton kernels compile correctly on DGX Spark GB10. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: seli-equinix <seli@equinix.com>
|
I think it is likely that this will be resolved by |
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas instead of Triton's bundled version (CUDA 12.8) which doesn't support sm_121a. This ensures Triton kernels compile correctly on DGX Spark GB10. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: seli-equinix <seli@equinix.com>
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas instead of Triton's bundled version (CUDA 12.8) which doesn't support sm_121a. This ensures Triton kernels compile correctly on DGX Spark GB10. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: seli-equinix <seli@equinix.com>
because that is only solved in future triton 3.7.0 then build vllm from upstream... I had running yesterday nemotron nvfp4 in jetson agx thor and dgx spark |
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas instead of Triton's bundled version (CUDA 12.8) which doesn't support sm_121a. This ensures Triton kernels compile correctly on DGX Spark GB10. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: seli-equinix <seli@equinix.com>
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas instead of Triton's bundled version (CUDA 12.8) which doesn't support sm_121a. This ensures Triton kernels compile correctly on DGX Spark GB10. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: seli-equinix <seli@equinix.com>
Summary
Triton bundles a ptxas binary from CUDA 12.8 that does not support GPU architectures sm_110a (Jetson Thor) or sm_121a (DGX Spark GB10). This causes Triton kernel compilation to fail with:
This PR adds automatic detection of new GPU architectures and configures TRITON_PTXAS_PATH to use the system CUDA toolkit's ptxas when needed.
Changes
_configure_triton_ptxas_for_new_gpus()function tovllm/triton_utils/importing.pytriton.backends.backends) to get GPU architectureTesting
Tested on NVIDIA GB10 (DGX Spark) with:
Verified that:
Related Issues
Fixes #31269
Fixes #32093
Related to #29469