Fix PyTorch 2.10 ABI compatibility and build logging by ussoewwin · Pull Request #2256 · Dao-AILab/flash-attention

ussoewwin · 2026-02-12T16:59:44Z

FlashAttention PyTorch 2.10+ Compatibility Fixes

This Pull Request addresses build and runtime compatibility issues with PyTorch 2.10 (and upcoming 2.11) on Windows (and potentially Linux).

Problem Description

Building FlashAttention with PyTorch 2.10+ (specifically 2.11.0a0 development builds) typically succeeds, but results in runtime errors or DLL load failures when importing the extension.

Key issues identified:

Header Inclusion: The original csrc/flash_attn/flash_api.cpp deliberately includes <torch/python.h> instead of <torch/extension.h> to reduce compilation time. However, in newer PyTorch versions (2.10+), this configuration seems insufficient for full ABI compatibility for extensions, leading to missing symbols or mismatched definitions at runtime.
Build Environment (Windows): setup.py and Windows build scripts lacked robustness against certain environment variable configurations.

Recommended Changes for PR

1. C++ Extension Header (csrc/flash_attn/flash_api.cpp)

Change: Replaced #include <torch/python.h> with #include <torch/extension.h>.
Reason: While the original code avoided this header to save compilation time, <torch/extension.h> is the standard and recommended header for C++ extensions to ensure ABI compatibility (_GLIBCXX_USE_CXX11_ABI, etc.). The runtime stability gained for PyTorch 2.10+ outweighs the minor increase in compilation time.

2. Build Script Improvements (setup.py)

Change: Added build-time logging for:
- PyTorch Version
- CUDA Version
- _GLIBCXX_USE_CXX11_ABI status
Reason: This provides critical context in build logs without affecting the build process itself.

Verification

The fixes were verified on the following environment:

OS: Windows (x64)
PyTorch: 2.10.0+cu130 (Nightly/Dev 2.11.0a0)
CUDA: 12.x
FlashAttention: v2.8.3

Verification Script (test_flash_attn.py)

A simple forward pass test was performed to confirm stability:

import torch
import flash_attn

print("Testing FlashAttention forward pass...")
q = torch.randn(2, 128, 8, 64, device='cuda', dtype=torch.float16)
k = torch.randn(2, 128, 8, 64, device='cuda', dtype=torch.float16)
v = torch.randn(2, 128, 8, 64, device='cuda', dtype=torch.float16)

try:
    out = flash_attn.flash_attn_func(q, k, v)
    print("Success: Flash Attention forward pass executed.")
    print(f"Output shape: {out.shape}")
    assert not torch.isnan(out).any(), "Output contains NaNs!"
except Exception as e:
    print(f"Failed: {e}")

janeyx99 · 2026-02-12T22:16:19Z

@ussoewwin ideally FA2 can also follow the steps of hopper FA3 and be ABI stable with both CPython and libtorch

relevant PRs: #1662 and #1791

…d in PR Dao-AILab#2256

ussoewwin · 2026-02-13T16:55:00Z

@janeyx99 Thanks for your technical advice.

I have amended the code based on the two PRs.

Summary of Changes:

Migrated to TORCH_LIBRARY (Stable ABI):
- In csrc/flash_attn/flash_api.cpp, I replaced PYBIND11_MODULE with TORCH_LIBRARY and TORCH_LIBRARY_IMPL.
- This aligns with PyTorch's native operator registration mechanism and ensures better ABI compatibility.
Operator Registration:
- Defined operators (fwd, varlen_fwd, etc.) using the torch::library schema, matching the style of the referenced PRs.
Python Interface Update:
- Updated flash_attn/flash_attn_interface.py to invoke kernels via torch.ops.flash_attn_2_cuda and utilize torch.library.custom_op (for PyTorch 2.4+).

This implementation should now fully comply with the Stable ABI requirements.

Fix PyTorch 2.10 ABI compatibility and build logging

f680e35

Refactor FlashAttention to use Stable ABI (TORCH_LIBRARY) as suggeste…

1aaff62

…d in PR Dao-AILab#2256

ussoewwin closed this Feb 13, 2026

ussoewwin deleted the fix-pytorch-2.10-abi branch February 13, 2026 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PyTorch 2.10 ABI compatibility and build logging#2256

Fix PyTorch 2.10 ABI compatibility and build logging#2256
ussoewwin wants to merge 2 commits intoDao-AILab:mainfrom
ussoewwin:fix-pytorch-2.10-abi

ussoewwin commented Feb 12, 2026

Uh oh!

janeyx99 commented Feb 12, 2026 •

edited

Loading

Uh oh!

ussoewwin commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ussoewwin commented Feb 12, 2026

FlashAttention PyTorch 2.10+ Compatibility Fixes

Problem Description

Recommended Changes for PR

1. C++ Extension Header (csrc/flash_attn/flash_api.cpp)

2. Build Script Improvements (setup.py)

Verification

Verification Script (test_flash_attn.py)

Uh oh!

janeyx99 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ussoewwin commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janeyx99 commented Feb 12, 2026 •

edited

Loading

ussoewwin commented Feb 13, 2026 •

edited

Loading