Skip to content

[Bugfix] Auto-configure TRITON_PTXAS_PATH for new GPU architectures#32704

Open
danielostrow wants to merge 1 commit intovllm-project:mainfrom
danielostrow:fix-triton-ptxas-new-gpus
Open

[Bugfix] Auto-configure TRITON_PTXAS_PATH for new GPU architectures#32704
danielostrow wants to merge 1 commit intovllm-project:mainfrom
danielostrow:fix-triton-ptxas-new-gpus

Conversation

@danielostrow
Copy link
Copy Markdown

@danielostrow danielostrow commented Jan 20, 2026

Summary

Triton bundles a ptxas binary from CUDA 12.8 that does not support GPU architectures sm_110a (Jetson Thor) or sm_121a (DGX Spark GB10). This causes Triton kernel compilation to fail with:

ptxas fatal: Value 'sm_121a' is not defined for option 'gpu-name'

This PR adds automatic detection of new GPU architectures and configures TRITON_PTXAS_PATH to use the system CUDA toolkit's ptxas when needed.

Changes

  • Add _configure_triton_ptxas_for_new_gpus() function to vllm/triton_utils/importing.py
  • Uses Triton's native backend detection (triton.backends.backends) to get GPU architecture
  • Sets TRITON_PTXAS_PATH for GPUs with arch >= 110 (compute capability 11.0+)
  • Respects user-configured TRITON_PTXAS_PATH if already set
  • Fails gracefully if detection is unavailable
  • Add unit tests for the new functionality

Testing

Tested on NVIDIA GB10 (DGX Spark) with:

  • CUDA 13.0 (V13.0.88)
  • Triton 3.5.1
  • PyTorch 2.9.1+cu130

Verified that:

  • Triton kernels compile and execute successfully with the fix
  • Triton kernels fail without the fix (expected ptxas error)

Related Issues

Fixes #31269
Fixes #32093

Related to #29469

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added the bug Something isn't working label Jan 20, 2026
@danielostrow danielostrow force-pushed the fix-triton-ptxas-new-gpus branch from 6e0d8d4 to 2edfbec Compare January 20, 2026 17:13
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 20, 2026

Hi @danielostrow, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@danielostrow danielostrow force-pushed the fix-triton-ptxas-new-gpus branch from 2edfbec to 641bf35 Compare January 20, 2026 17:27
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helpful auto-configuration mechanism for TRITON_PTXAS_PATH to support new GPU architectures. The implementation is sound and the accompanying tests are thorough. My review focuses on improving the debuggability of the new logic by adding logging to error-handling paths that currently fail silently. These changes will make it easier to diagnose issues if the auto-configuration does not behave as expected.

Triton bundles a ptxas binary from CUDA 12.8 that does not support
GPU architectures sm_110a (Jetson Thor) or sm_121a (DGX Spark GB10).
This causes Triton kernel compilation to fail with:

    ptxas fatal: Value 'sm_121a' is not defined for option 'gpu-name'

This change adds automatic detection of new GPU architectures using
Triton's native backend detection and configures TRITON_PTXAS_PATH
to use the system CUDA toolkit's ptxas when needed.

The fix:
- Uses triton.backends.backends to detect GPU architecture
- Sets TRITON_PTXAS_PATH for GPUs with arch >= 110 (CC 11.0+)
- Respects user-configured TRITON_PTXAS_PATH if already set
- Fails gracefully if detection is unavailable

Tested on NVIDIA GB10 (DGX Spark) with CUDA 13.0 and Triton 3.5.1.

Related issues: vllm-project#31269, vllm-project#29469, vllm-project#32093

Signed-off-by: Daniel Ostrow <daniel@neuralintellect.com>
@danielostrow danielostrow force-pushed the fix-triton-ptxas-new-gpus branch from 641bf35 to ad16030 Compare January 20, 2026 17:30
@danielostrow
Copy link
Copy Markdown
Author

How are we looking on this? is it relevant?

@changqingla
Copy link
Copy Markdown

I'm experiencing the same problem and hope this PR can be merged as soon as possible.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note-- this script acts as a handoff when Triton updates are behind CUDA updates. in the future this handoff may be deemed irrelevant on Triton update

seli-equinix added a commit to seli-equinix/vllm that referenced this pull request Jan 29, 2026
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and
configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas
instead of Triton's bundled version (CUDA 12.8) which doesn't
support sm_121a.

This ensures Triton kernels compile correctly on DGX Spark GB10.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: seli-equinix <seli@equinix.com>
seli-equinix added a commit to seli-equinix/vllm that referenced this pull request Feb 3, 2026
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and
configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas
instead of Triton's bundled version (CUDA 12.8) which doesn't
support sm_121a.

This ensures Triton kernels compile correctly on DGX Spark GB10.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: seli-equinix <seli@equinix.com>
@mgoin
Copy link
Copy Markdown
Member

mgoin commented Feb 4, 2026

I think it is likely that this will be resolved by torch==2.10.0 update here #30525 since that pins to triton==3.6.0

@Kaweees
Copy link
Copy Markdown

Kaweees commented Feb 12, 2026

I think it is likely that this will be resolved by torch==2.10.0 update here #30525 since that pins to triton==3.6.0

@mgoin I could be completely wrong, but my configuration in #34470 uses torch>=2.10.0 and triton>=3.6.0 but still has issues

seli-equinix added a commit to seli-equinix/vllm that referenced this pull request Feb 16, 2026
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and
configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas
instead of Triton's bundled version (CUDA 12.8) which doesn't
support sm_121a.

This ensures Triton kernels compile correctly on DGX Spark GB10.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: seli-equinix <seli@equinix.com>
seli-equinix added a commit to seli-equinix/vllm that referenced this pull request Feb 16, 2026
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and
configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas
instead of Triton's bundled version (CUDA 12.8) which doesn't
support sm_121a.

This ensures Triton kernels compile correctly on DGX Spark GB10.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: seli-equinix <seli@equinix.com>
@johnnynunez
Copy link
Copy Markdown
Contributor

johnnynunez commented Feb 24, 2026

I think it is likely that this will be resolved by torch==2.10.0 update here #30525 since that pins to triton==3.6.0

@mgoin I could be completely wrong, but my configuration in #34470 uses torch>=2.10.0 and triton>=3.6.0 but still has issues

because that is only solved in future triton 3.7.0
you have to build from main due not cut branch yet
and use https://dev-discuss.pytorch.org/t/pytorch-2-11-rc1-produced-for-pytorch-torchvision/3316

then build vllm from upstream...

I had running yesterday nemotron nvfp4 in jetson agx thor and dgx spark

seli-equinix added a commit to seli-equinix/vllm that referenced this pull request Mar 5, 2026
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and
configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas
instead of Triton's bundled version (CUDA 12.8) which doesn't
support sm_121a.

This ensures Triton kernels compile correctly on DGX Spark GB10.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: seli-equinix <seli@equinix.com>
seli-equinix added a commit to seli-equinix/vllm that referenced this pull request Mar 11, 2026
Cherry-pick from PR vllm-project#32704 - auto-detects GPU arch >= 110 and
configures TRITON_PTXAS_PATH to use system CUDA toolkit's ptxas
instead of Triton's bundled version (CUDA 12.8) which doesn't
support sm_121a.

This ensures Triton kernels compile correctly on DGX Spark GB10.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: seli-equinix <seli@equinix.com>
@mergify mergify bot added the intel-gpu Related to Intel GPU label Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working intel-gpu Related to Intel GPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Nemotron Nano V3 FP16 on Jetson THOR [Feature]: Add support for NVIDIA Jetson AGX Thor

5 participants