Skip to content

[Bugfix] Fix PyTorch stable ABI compatibility for permute_cols#1

Closed
kilork wants to merge 1 commit intomainfrom
fix/stable-abi-tensor-reference
Closed

[Bugfix] Fix PyTorch stable ABI compatibility for permute_cols#1
kilork wants to merge 1 commit intomainfrom
fix/stable-abi-tensor-reference

Conversation

@kilork
Copy link
Copy Markdown
Owner

@kilork kilork commented Mar 21, 2026

Purpose

Fix build failure when combining PR vllm-project#37491 (CUTLASS upgrade to v4.4.2) with commit 8b10e4f (Migrate permute_cols to libtorch stable ABI). The build fails with:

static_assert(std::is_trivially_copyable_v<T>);
error: non-static data member 'torch::stable::detail::ToImpl<const torch::stable::Tensor&>::call(...)::Result::t' in a union may not have reference type

Root Cause

The PyTorch Stable ABI (introduced in PyTorch 2.10) requires all types to be trivially copyable for serialization between the C-shim and custom ops. Reference types (const Tensor&) are not trivially copyable and cannot be used as function parameters in STABLE_TORCH_LIBRARY registrations.

The code in csrc/libtorch_stable/ops.h and csrc/libtorch_stable/permute_cols.cu uses:

torch::stable::Tensor permute_cols(torch::stable::Tensor const& A,
                                   torch::stable::Tensor const& perm);

This is incompatible with the Stable ABI requirement documented at https://pytorch.org/docs/stable/library.html (see "Stable C++ ABI" / torch::stable namespace).

Fix

Change function signatures from pass-by-reference to pass-by-value:

csrc/libtorch_stable/ops.h:7-8

// Before:
torch::stable::Tensor permute_cols(torch::stable::Tensor const& A,
                                   torch::stable::Tensor const& perm);

// After:
torch::stable::Tensor permute_cols(torch::stable::Tensor A,
                                   torch::stable::Tensor perm);

csrc/libtorch_stable/permute_cols.cu:70-71

// Before:
torch::stable::Tensor permute_cols(torch::stable::Tensor const& A,
                                   torch::stable::Tensor const& perm) {

// After:
torch::stable::Tensor permute_cols(torch::stable::Tensor A,
                                   torch::stable::Tensor perm) {

Additional Fix

Also added missing CUTLASS include directories to _C_stable_libtorch target in CMakeLists.txt:1015-1016 (matching what _C already has).

Test Plan

Build vllm with both PR vllm-project#37491 applied and commit 8b10e4f present:

VLLM_USE_PRECOMPILED=1 uv pip install -e .

Test Result

Build completes successfully on CUDA.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Co-authored-by: Claude noreply@anthropic.com

The PyTorch Stable ABI requires all types to be trivially copyable.
Reference types (const Tensor&) are not trivially copyable and cannot
be used in STABLE_TORCH_LIBRARY registrations.

This fixes build failure when combining PR vllm-project#37491 (CUTLASS upgrade to
v4.4.2) with the libtorch stable ABI migration.

Also adds missing CUTLASS include directories to _C_stable_libtorch
target in CMakeLists.txt.
@kilork kilork force-pushed the fix/stable-abi-tensor-reference branch from 0e25ff9 to dbda8eb Compare March 21, 2026 08:51
@kilork kilork closed this Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant