xformers v0.0.31 and rebuild for pytorch 2.7 by jeongseok-meta · Pull Request #50 · conda-forge/xformers-feedstock

jeongseok-meta · 2025-06-04T21:45:35Z

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

Closes #43
Closes #44
Closes #47
Closes #49
Closes #51

jeongseok-meta · 2025-06-04T21:45:40Z

conda-forge-admin · 2025-06-04T21:47:01Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/recipe.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/recipe.yaml:

ℹ️ Jinja2 variable references are suggested to take a ${{<one space><variable name><one space>}} form. See lines [5].
ℹ️ PyPI default URL is now pypi.org, and not pypi.io. You may want to update the default source url.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/16443451998. Examine the logs at this URL for more detail.}

…nda-forge-pinning 2025.06.04.10.18.16

jeongseok-meta · 2025-06-04T21:58:18Z

@conda-forge-admin, please rerender

…nda-forge-pinning 2025.06.04.10.18.16

jeongseok-meta · 2025-06-04T22:05:19Z

@conda-forge-admin, please rerender

…nda-forge-pinning 2025.06.04.10.18.16

jeongseok-meta · 2025-06-25T18:42:22Z

@conda-forge-admin, please rerender

…nda-forge-pinning 2025.06.25.17.20.38

export CUTLASS_NVCC_ARCHS="80 86 89"

jeongseok-meta · 2025-06-26T03:01:38Z

I had to restrict CUDA archs to 80/86/89 because compiling for sm90 makes xformers v0.0.31 crash (NVCC exit 255). It might not be the ideal fix, but the build otherwise fails as:

2025-06-25T22:33:51.2676675Z  │ │   ptxas info    : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80'
2025-06-25T22:33:51.2718272Z  │ │   ptxas info    : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE
2025-06-25T22:33:51.2719629Z  │ │       0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
2025-06-25T22:33:51.2731864Z  │ │   ptxas info    : Used 4 registers, used 0 barriers, 2752 bytes cmem[0]
2025-06-25T22:33:51.2743472Z  │ │   ptxas info    : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80'
2025-06-25T22:33:51.2749600Z  │ │   ptxas info    : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE
2025-06-25T22:33:51.2757423Z  │ │       0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
2025-06-25T22:33:51.2762864Z  │ │   ptxas info    : Used 4 registers, used 0 barriers, 2752 bytes cmem[0]
2025-06-25T22:35:29.9645558Z  │ │   error: command '$BUILD_PREFIX/bin/nvcc' failed with exit code 255
2025-06-25T22:35:37.0044164Z  │ │   error: subprocess-exited-with-error
2025-06-25T22:35:37.0050647Z  │ │   
2025-06-25T22:35:37.0054650Z  │ │   × Building wheel for xformers (pyproject.toml) did not run successfully.
2025-06-25T22:35:37.0057410Z  │ │   │ exit code: 1
2025-06-25T22:35:37.0062593Z  │ │   ╰─> See above for output.
2025-06-25T22:35:37.0065314Z  │ │   
2025-06-25T22:35:37.0067782Z  │ │   note: This error originates from a subprocess, and is likely not a problem with pip.
2025-06-25T22:35:37.0136068Z  │ │   full command: $PREFIX/bin/python $PREFIX/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpn3l0dn3w
2025-06-25T22:35:37.0139993Z  │ │   cwd: $SRC_DIR
2025-06-25T22:35:37.0194032Z  │ │   Building wheel for xformers (pyproject.toml): finished with status 'error'
2025-06-25T22:35:37.0386697Z  │ │   ERROR: Failed building wheel for xformers
2025-06-25T22:35:37.0507329Z  │ │ Failed to build xformers
2025-06-25T22:35:37.0644944Z  │ │ ERROR: Failed to build installable wheels for some pyproject.toml based projects (xformers)
2025-06-25T22:35:37.0897235Z  │ │ Exception information:
2025-06-25T22:35:37.0904226Z  │ │ Traceback (most recent call last):
2025-06-25T22:35:37.0912375Z  │ │   File "$PREFIX/lib/python3.10/site-packages/pip/_internal/cli/base_command.py", line 105, in _run_wrapper
2025-06-25T22:35:37.0920158Z  │ │     status = _inner_run()
2025-06-25T22:35:37.0928197Z  │ │   File "$PREFIX/lib/python3.10/site-packages/pip/_internal/cli/base_command.py", line 96, in _inner_run
2025-06-25T22:35:37.0933286Z  │ │     return self.run(options, args)
2025-06-25T22:35:37.0940102Z  │ │   File "$PREFIX/lib/python3.10/site-packages/pip/_internal/cli/req_command.py", line 68, in wrapper
2025-06-25T22:35:37.0944587Z  │ │     return func(self, options, args)
2025-06-25T22:35:37.0950364Z  │ │   File "$PREFIX/lib/python3.10/site-packages/pip/_internal/commands/install.py", line 436, in run
2025-06-25T22:35:37.0954404Z  │ │     raise InstallationError(
2025-06-25T22:35:37.0958743Z  │ │ pip._internal.exceptions.InstallationError: Failed to build installable wheels for some pyproject.toml based projects (xformers)
2025-06-25T22:35:37.0982968Z  │ │ Removed build tracker: '/tmp/pip-build-tracker-9v06rv7n'
2025-06-25T22:35:37.9395783Z  │ │ × error Script failed with status 1
2025-06-25T22:35:37.9477305Z  │ │ × error 
2025-06-25T22:35:37.9480310Z  │ │ × error Script execution failed.
2025-06-25T22:35:37.9491081Z  │ │ × error 
2025-06-25T22:35:37.9512840Z  │ │ × error   Work directory: /home/conda/feedstock_root/build_artifacts/bld/rattler-build_xformers_1750877301/work
2025-06-25T22:35:37.9519364Z  │ │ × error   Prefix: /home/conda/feedstock_root/build_artifacts/bld/rattler-build_xformers_1750877301/host_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place
2025-06-25T22:35:37.9524361Z  │ │ × error   Build prefix: /home/conda/feedstock_root/build_artifacts/bld/rattler-build_xformers_1750877301/build_env
2025-06-25T22:35:37.9528368Z  │ │ × error 
2025-06-25T22:35:37.9532744Z  │ │ × error To run the script manually, use the following command:
2025-06-25T22:35:37.9536627Z  │ │ × error 
2025-06-25T22:35:37.9541034Z  │ │ × error   cd "/home/conda/feedstock_root/build_artifacts/bld/rattler-build_xformers_1750877301/work" && ./conda_build.sh
2025-06-25T22:35:37.9545109Z  │ │ × error 
2025-06-25T22:35:37.9545498Z  │ │ × error To run commands interactively in the build environment:
2025-06-25T22:35:37.9548978Z  │ │ × error 
2025-06-25T22:35:37.9552712Z  │ │ × error   cd "/home/conda/feedstock_root/build_artifacts/bld/rattler-build_xformers_1750877301/work" && source build_env.sh
2025-06-25T22:35:37.9676730Z  │ │
2025-06-25T22:35:37.9681300Z  │ ╰─────────────────── (took 4 hours)
2025-06-25T22:35:38.2768372Z  │
2025-06-25T22:35:38.2772656Z  ╰─────────────────── (took 4 hours)
2025-06-25T22:35:39.8596601Z Error:   × Script failed to execute
2025-06-25T22:35:39.8600986Z 
2025-06-25T22:35:40.5902124Z 
2025-06-25T22:35:40.6959550Z ##[error]Bash exited with code '1'.
2025-06-25T22:35:40.7637084Z ##[section]Finishing: Run docker build

h-vetinari · 2025-07-02T00:33:13Z

-        export TORCH_CUDA_ARCH_LIST="5.3;6.0;6.1;7.0;7.5;8.0;8.6;8.9;9.0+PTX"
+    if [[ ${cuda_compiler_version} == 12.6 ]]; then
+        export TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9"
+        export CUTLASS_NVCC_ARCHS="80 86 89"


I don't think this is acceptable TBH. We need to keep support for more architectures, we shouldn't drop 9.0, and we also shouldn't drop the +PTX. Where do CUTLASS_NVCC_ARCHS come into play?

FYI: It builds with export TORCH_CUDA_ARCH_LIST="5.3;6.0;6.1;7.0;7.5;8.0;8.6;8.9", without defining CUTLASS_NVCC_ARCHS.

Not sure if we need CUDA 12.8+ for sm90 builds, but I also got an error once I added 9.0+PTX back in.

Yes, it appears that CUTLASS_NVCC_ARCHS is not necessary. This was the only green path I could find, but I welcome any suggestions or fixes as long as they work.

I'm trying a build with CUDA 12.8 and adding 9.0+PTX back in, but assuming that nothing crashes, we'll need to wait ~5 hours before we see the results.

@shermansiu Thanks for taking a look at this. ~~Feel free to directly modify this PR, by the way.~~

Oh, you are not a maintainer. Feel free to create another PR if you prefer then.

Also, I think we can add 5.3;6.0;6.1;7.0;7.5; back into TORCH_CUDA_ARCH_LIST?

I could, but I think we're almost done? I think we can drop 9.0+PTX for the CUDA 12.6 build (this PR) and add it back in once we add the CUDA 12.8 one, in a different PR (assuming it compiles).

Update: Using CUDA 12.8 did not work for 9.0+PTX... For debugging purposes, I'm just targeting that arch so I can figure out why it's throwing an error

jeongseok-meta · 2025-07-21T18:06:09Z

@conda-forge-admin, please rerender

…nda-forge-pinning 2025.07.21.12.24.54

This reverts commit dc711be.

…nda-forge-pinning 2025.07.21.12.24.54

h-vetinari

Thanks for the work here @jeongseok-meta!

jeongseok-meta · 2025-07-23T01:01:40Z

@h-vetinari Thank you for completing and merging this PR! Also, many thanks to @shermansiu for your valuable input!

h-vetinari · 2025-07-24T06:11:34Z

Good news: sm_90 works again when compiled with CUDA 12.9. It's been re-added in #52, along with new architectures sm_100 and sm_120. The CUDA builds now need ~9 hours, but OK, that's what we got the server for. 🙃

v0.0.30 is present in the git history, but was never published, since #50 was merged after bumping to v0.0.31. Squash of all the commits on main branch since then --------- Co-authored-by: Jeongseok Lee <jeongseok@meta.com> Co-authored-by: H. Vetinari <h.vetinari@gmx.com>

xformer v0.0.30

bbcdaae

conda-forge-webservices[bot] and others added 2 commits June 4, 2025 21:47

MNT: Re-rendered with conda-build 25.5.0, conda-smithy 3.50.0, and co…

944fb34

…nda-forge-pinning 2025.06.04.10.18.16

c_stdlib_version: 10.15

54f5c44

conda-forge-webservices[bot] and others added 2 commits June 4, 2025 22:00

MNT: Re-rendered with conda-build 25.5.0, conda-smithy 3.50.0, and co…

9764ba6

…nda-forge-pinning 2025.06.04.10.18.16

pytorch 2.7

24437df

MNT: Re-rendered with conda-build 25.5.0, conda-smithy 3.50.0, and co…

264863e

…nda-forge-pinning 2025.06.04.10.18.16

jeongseok-meta changed the title ~~xformer v0.0.30~~ xformers v0.0.30 Jun 4, 2025

jeongseok-meta changed the title ~~xformers v0.0.30~~ xformers v0.0.30 and rebuild for pytorch 2.7 Jun 4, 2025

jeongseok-meta added 5 commits June 4, 2025 16:58

XFORMERS_DISABLE_FLASH_ATTN: "1"

f355f2b

Match TORCH_CUDA_ARCH_LIST to pytorch

0804a29

Set FORCE_CUDA=1 and enable flash-attn

2a5256d

Add cuda-driver-dev to req.run

2ba78f7

xformers v0.0.31

dd29978

conda-forge-webservices[bot] and others added 2 commits June 25, 2025 18:44

MNT: Re-rendered with conda-build 25.5.0, conda-smithy 3.50.1, and co…

09c1a82

…nda-forge-pinning 2025.06.25.17.20.38

export TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9"

63c3907

export CUTLASS_NVCC_ARCHS="80 86 89"

jeongseok-meta marked this pull request as ready for review June 26, 2025 03:01

jeongseok-meta requested review from h-vetinari and jan-janssen as code owners June 26, 2025 03:01

jeongseok-meta changed the title ~~xformers v0.0.30 and rebuild for pytorch 2.7~~ xformers v0.0.31 and rebuild for pytorch 2.7 Jun 26, 2025

h-vetinari mentioned this pull request Jul 1, 2025

Rebuild for CUDA 12.9 conda-forge/pytorch-cpu-feedstock#393

Merged

h-vetinari reviewed Jul 2, 2025

View reviewed changes

Remove CUTLASS_NVCC_ARCHS

8999efa

MNT: Re-rendered with conda-build 25.5.0, conda-smithy 3.51.1, and co…

493226a

…nda-forge-pinning 2025.07.21.12.24.54

jeongseok-meta and others added 6 commits July 21, 2025 11:22

Add back 5.3;6.0;6.1;7.0;7.5; to TORCH_CUDA_ARCH_LIST

f187ba5

Reapply "Enable cirun-openstack-cpu-xlarge using Cirun"

a74aed2

This reverts commit dc711be.

MNT: Re-rendered with conda-build 25.5.0, conda-smithy 3.51.1, and co…

e940346

…nda-forge-pinning 2025.07.21.12.24.54

add back cuda arch 9.0 & PTX

4d2f258

simplify lower end of TORCH_CUDA_ARCH_LIST

b779cb7

remove cuda arch 9.0 again

8029c6c

h-vetinari approved these changes Jul 22, 2025

View reviewed changes

h-vetinari merged commit ea38755 into conda-forge:main Jul 22, 2025
24 checks passed

h-vetinari mentioned this pull request Jul 22, 2025

Upgrade to CUDA 12.9; v0.0.31.post1 #52

Merged

jeongseok-meta deleted the v0.0.30 branch July 23, 2025 01:00

h-vetinari mentioned this pull request Jul 24, 2025

Auto-merge does not work since 0.0.29.post1 #48

Closed

1 task

h-vetinari mentioned this pull request Aug 31, 2025

Actually publish v0.0.30 #59

Merged

5 tasks

Uh oh!

Conversation

jeongseok-meta commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeongseok-meta commented Jun 4, 2025

Uh oh!

conda-forge-admin commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeongseok-meta commented Jun 4, 2025

Uh oh!

jeongseok-meta commented Jun 4, 2025

Uh oh!

jeongseok-meta commented Jun 25, 2025

Uh oh!

jeongseok-meta commented Jun 26, 2025

Uh oh!

h-vetinari Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

shermansiu Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

jeongseok-meta Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

shermansiu Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

jeongseok-meta Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shermansiu Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

shermansiu Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

shermansiu Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

jeongseok-meta commented Jul 21, 2025

Uh oh!

h-vetinari left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeongseok-meta commented Jul 23, 2025

Uh oh!

h-vetinari commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeongseok-meta commented Jun 4, 2025 •

edited

Loading

conda-forge-admin commented Jun 4, 2025 •

edited

Loading

jeongseok-meta Jul 21, 2025 •

edited

Loading