[Build] Allow shipping PTX on a per-file basis by LucasWilkinson · Pull Request #18155 · vllm-project/vllm

LucasWilkinson · 2025-05-14T16:30:34Z

323.20 MB -> 324.36 MB

To help with the growing wheel size due to Blackwell allow for shipping PTX for heavy kernels that don't take advantage of new hardware features. Theres enough different gencodes now for certain kernels it makes sense to ship a single PTX implementation instead of multiple SASS. Currently mildly grows the wheel size but should help keep it capped as the Blackwell gencodes are added

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

github-actions · 2025-05-14T16:30:42Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

bnellnm · 2025-05-14T21:58:50Z

cmake/utils.cmake

  endif()

-  list(SORT SRC_CUDA_ARCHS COMPARE NATURAL ORDER ASCENDING)
+  list(SORT _SRC_CUDA_ARCHS COMPARE NATURAL ORDER ASCENDING)


Do the _TGT_CUDA_ARCHS need to be sorted too?

for a more general utility you're right they should be!; but the target arches come from extract_unique_cuda_archs_ascending so they are already sorted. I can open up a now PR to refactor some of this though; kinda want to preserve current behavior as much as possible in this one

bnellnm

Looks reasonable to me.

mgoin

Seems relatively safe to me. There might be a regression for marlin because of slow bf16 convert on A100 (IIRC) that might transfer to newer hardware, but also might not. Ultimately shouldn't be that big of a deal. @jinzhen-lin please step in if you have concerns with this since you refactored marlin most recently.

mgoin

The failures look closely related

jinzhen-lin · 2025-05-15T03:49:00Z

Seems relatively safe to me. There might be a regression for marlin because of slow bf16 convert on A100 (IIRC) that might transfer to newer hardware, but also might not. Ultimately shouldn't be that big of a deal. @jinzhen-lin please step in if you have concerns with this since you refactored marlin most recently.

Currently, I believe that the performance of use_atomic_add=True + bf16 may be impacted on sm90+. I will run some tests to verify.

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

CMakeLists.txt

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

LucasWilkinson · 2025-05-15T23:33:47Z

Failures resolved

mgoin

This is a nice middle ground, thanks for getting it working!

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

add per file ptx

f4fe93b

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

LucasWilkinson requested a review from tlrmchlsmth as a code owner May 14, 2025 16:30

mergify bot added the ci/build label May 14, 2025

LucasWilkinson added 4 commits May 14, 2025 16:31

minor cleanup

d0e3844

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

fixes

4ef0ba9

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

add back 7.5 for CUTLASS 2.x

0d85596

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

use PTX for marlin MOE too

420e433

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

LucasWilkinson changed the title ~~[WIP][Build] Allow shipping PTX on a per-file basis~~ [Build] Allow shipping PTX on a per-file basis May 14, 2025

bnellnm reviewed May 14, 2025

View reviewed changes

bnellnm approved these changes May 14, 2025

View reviewed changes

simon-mo enabled auto-merge (squash) May 14, 2025 22:44

simon-mo approved these changes May 14, 2025

View reviewed changes

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 14, 2025

simon-mo disabled auto-merge May 15, 2025 00:05

mgoin approved these changes May 15, 2025

View reviewed changes

mgoin requested changes May 15, 2025

View reviewed changes

LucasWilkinson mentioned this pull request May 15, 2025

[Build] Refactor cmake #18186

Closed

hopefully fix CI

c2e0f65

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

jinzhen-lin reviewed May 15, 2025

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

9.0 ptx for marlin

595103f

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

mgoin approved these changes May 15, 2025

View reviewed changes

simon-mo merged commit c7852a6 into vllm-project:main May 15, 2025
87 of 90 checks passed

DarkLight1337 mentioned this pull request May 21, 2025

[Bug][Failing Test] 2-node-tests-4-gpus-in-total - distributed/test_pipeline_parallel.py::test_tp_* #18417

Closed

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[Build] Allow shipping PTX on a per-file basis (vllm-project#18155)

6b40ffd

Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

LucasWilkinson mentioned this pull request Jun 16, 2025

FA2 8.0 PTX vllm-project/flash-attention#69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Build] Allow shipping PTX on a per-file basis#18155

[Build] Allow shipping PTX on a per-file basis#18155
simon-mo merged 7 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/allow-shipping-per-file-ptx

LucasWilkinson commented May 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 14, 2025

Uh oh!

bnellnm May 14, 2025

Uh oh!

LucasWilkinson May 14, 2025 •

edited

Loading

Uh oh!

bnellnm left a comment

Uh oh!

mgoin left a comment •

edited

Loading

Uh oh!

mgoin left a comment

Uh oh!

jinzhen-lin commented May 15, 2025

Uh oh!

Uh oh!

LucasWilkinson commented May 15, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

LucasWilkinson commented May 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 14, 2025

Uh oh!

bnellnm May 14, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnellnm left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

jinzhen-lin commented May 15, 2025

Uh oh!

Uh oh!

LucasWilkinson commented May 15, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LucasWilkinson commented May 14, 2025 •

edited by github-actions bot

Loading

LucasWilkinson May 14, 2025 •

edited

Loading

mgoin left a comment •

edited

Loading