Skip to content

xformers v0.0.31 and rebuild for pytorch 2.7#50

Merged
h-vetinari merged 21 commits into
conda-forge:mainfrom
jeongseok-meta:v0.0.30
Jul 22, 2025
Merged

xformers v0.0.31 and rebuild for pytorch 2.7#50
h-vetinari merged 21 commits into
conda-forge:mainfrom
jeongseok-meta:v0.0.30

Conversation

@jeongseok-meta
Copy link
Copy Markdown
Contributor

@jeongseok-meta jeongseok-meta commented Jun 4, 2025

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Closes #43
Closes #44
Closes #47
Closes #49
Closes #51

@jeongseok-meta
Copy link
Copy Markdown
Contributor Author

@conda-forge-admin, please rerender

@conda-forge-admin
Copy link
Copy Markdown
Contributor

conda-forge-admin commented Jun 4, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/recipe.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/recipe.yaml:

  • ℹ️ Jinja2 variable references are suggested to take a ${{<one space><variable name><one space>}} form. See lines [5].
  • ℹ️ PyPI default URL is now pypi.org, and not pypi.io. You may want to update the default source url.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/16443451998. Examine the logs at this URL for more detail.

conda-forge-webservices[bot] and others added 2 commits June 4, 2025 21:47
@jeongseok-meta
Copy link
Copy Markdown
Contributor Author

@conda-forge-admin, please rerender

conda-forge-webservices[bot] and others added 2 commits June 4, 2025 22:00
@jeongseok-meta
Copy link
Copy Markdown
Contributor Author

@conda-forge-admin, please rerender

@jeongseok-meta jeongseok-meta changed the title xformer v0.0.30 xformers v0.0.30 Jun 4, 2025
@jeongseok-meta jeongseok-meta changed the title xformers v0.0.30 xformers v0.0.30 and rebuild for pytorch 2.7 Jun 4, 2025
@jeongseok-meta
Copy link
Copy Markdown
Contributor Author

@conda-forge-admin, please rerender

conda-forge-webservices[bot] and others added 2 commits June 25, 2025 18:44
export CUTLASS_NVCC_ARCHS="80 86 89"
@jeongseok-meta
Copy link
Copy Markdown
Contributor Author

I had to restrict CUDA archs to 80/86/89 because compiling for sm90 makes xformers v0.0.31 crash (NVCC exit 255). It might not be the ideal fix, but the build otherwise fails as:

2025-06-25T22:33:51.2676675Z  │ │   ptxas info    : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80'
2025-06-25T22:33:51.2718272Z  │ │   ptxas info    : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE
2025-06-25T22:33:51.2719629Z  │ │       0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
2025-06-25T22:33:51.2731864Z  │ │   ptxas info    : Used 4 registers, used 0 barriers, 2752 bytes cmem[0]
2025-06-25T22:33:51.2743472Z  │ │   ptxas info    : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80'
2025-06-25T22:33:51.2749600Z  │ │   ptxas info    : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE
2025-06-25T22:33:51.2757423Z  │ │       0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
2025-06-25T22:33:51.2762864Z  │ │   ptxas info    : Used 4 registers, used 0 barriers, 2752 bytes cmem[0]
2025-06-25T22:35:29.9645558Z  │ │   error: command '$BUILD_PREFIX/bin/nvcc' failed with exit code 255
2025-06-25T22:35:37.0044164Z  │ │   error: subprocess-exited-with-error
2025-06-25T22:35:37.0050647Z  │ │   
2025-06-25T22:35:37.0054650Z  │ │   × Building wheel for xformers (pyproject.toml) did not run successfully.
2025-06-25T22:35:37.0057410Z  │ │   │ exit code: 1
2025-06-25T22:35:37.0062593Z  │ │   ╰─> See above for output.
2025-06-25T22:35:37.0065314Z  │ │   
2025-06-25T22:35:37.0067782Z  │ │   note: This error originates from a subprocess, and is likely not a problem with pip.
2025-06-25T22:35:37.0136068Z  │ │   full command: $PREFIX/bin/python $PREFIX/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpn3l0dn3w
2025-06-25T22:35:37.0139993Z  │ │   cwd: $SRC_DIR
2025-06-25T22:35:37.0194032Z  │ │   Building wheel for xformers (pyproject.toml): finished with status 'error'
2025-06-25T22:35:37.0386697Z  │ │   ERROR: Failed building wheel for xformers
2025-06-25T22:35:37.0507329Z  │ │ Failed to build xformers
2025-06-25T22:35:37.0644944Z  │ │ ERROR: Failed to build installable wheels for some pyproject.toml based projects (xformers)
2025-06-25T22:35:37.0897235Z  │ │ Exception information:
2025-06-25T22:35:37.0904226Z  │ │ Traceback (most recent call last):
2025-06-25T22:35:37.0912375Z  │ │   File "$PREFIX/lib/python3.10/site-packages/pip/_internal/cli/base_command.py", line 105, in _run_wrapper
2025-06-25T22:35:37.0920158Z  │ │     status = _inner_run()
2025-06-25T22:35:37.0928197Z  │ │   File "$PREFIX/lib/python3.10/site-packages/pip/_internal/cli/base_command.py", line 96, in _inner_run
2025-06-25T22:35:37.0933286Z  │ │     return self.run(options, args)
2025-06-25T22:35:37.0940102Z  │ │   File "$PREFIX/lib/python3.10/site-packages/pip/_internal/cli/req_command.py", line 68, in wrapper
2025-06-25T22:35:37.0944587Z  │ │     return func(self, options, args)
2025-06-25T22:35:37.0950364Z  │ │   File "$PREFIX/lib/python3.10/site-packages/pip/_internal/commands/install.py", line 436, in run
2025-06-25T22:35:37.0954404Z  │ │     raise InstallationError(
2025-06-25T22:35:37.0958743Z  │ │ pip._internal.exceptions.InstallationError: Failed to build installable wheels for some pyproject.toml based projects (xformers)
2025-06-25T22:35:37.0982968Z  │ │ Removed build tracker: '/tmp/pip-build-tracker-9v06rv7n'
2025-06-25T22:35:37.9395783Z  │ │ × error Script failed with status 1
2025-06-25T22:35:37.9477305Z  │ │ × error 
2025-06-25T22:35:37.9480310Z  │ │ × error Script execution failed.
2025-06-25T22:35:37.9491081Z  │ │ × error 
2025-06-25T22:35:37.9512840Z  │ │ × error   Work directory: /home/conda/feedstock_root/build_artifacts/bld/rattler-build_xformers_1750877301/work
2025-06-25T22:35:37.9519364Z  │ │ × error   Prefix: /home/conda/feedstock_root/build_artifacts/bld/rattler-build_xformers_1750877301/host_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place
2025-06-25T22:35:37.9524361Z  │ │ × error   Build prefix: /home/conda/feedstock_root/build_artifacts/bld/rattler-build_xformers_1750877301/build_env
2025-06-25T22:35:37.9528368Z  │ │ × error 
2025-06-25T22:35:37.9532744Z  │ │ × error To run the script manually, use the following command:
2025-06-25T22:35:37.9536627Z  │ │ × error 
2025-06-25T22:35:37.9541034Z  │ │ × error   cd "/home/conda/feedstock_root/build_artifacts/bld/rattler-build_xformers_1750877301/work" && ./conda_build.sh
2025-06-25T22:35:37.9545109Z  │ │ × error 
2025-06-25T22:35:37.9545498Z  │ │ × error To run commands interactively in the build environment:
2025-06-25T22:35:37.9548978Z  │ │ × error 
2025-06-25T22:35:37.9552712Z  │ │ × error   cd "/home/conda/feedstock_root/build_artifacts/bld/rattler-build_xformers_1750877301/work" && source build_env.sh
2025-06-25T22:35:37.9676730Z  │ │
2025-06-25T22:35:37.9681300Z  │ ╰─────────────────── (took 4 hours)
2025-06-25T22:35:38.2768372Z  │
2025-06-25T22:35:38.2772656Z  ╰─────────────────── (took 4 hours)
2025-06-25T22:35:39.8596601Z Error:   × Script failed to execute
2025-06-25T22:35:39.8600986Z 
2025-06-25T22:35:40.5902124Z 
2025-06-25T22:35:40.6959550Z ##[error]Bash exited with code '1'.
2025-06-25T22:35:40.7637084Z ##[section]Finishing: Run docker build

@jeongseok-meta jeongseok-meta marked this pull request as ready for review June 26, 2025 03:01
@jeongseok-meta jeongseok-meta changed the title xformers v0.0.30 and rebuild for pytorch 2.7 xformers v0.0.31 and rebuild for pytorch 2.7 Jun 26, 2025
Comment thread recipe/build.sh Outdated
export TORCH_CUDA_ARCH_LIST="5.3;6.0;6.1;7.0;7.5;8.0;8.6;8.9;9.0+PTX"
if [[ ${cuda_compiler_version} == 12.6 ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9"
export CUTLASS_NVCC_ARCHS="80 86 89"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is acceptable TBH. We need to keep support for more architectures, we shouldn't drop 9.0, and we also shouldn't drop the +PTX. Where do CUTLASS_NVCC_ARCHS come into play?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: It builds with export TORCH_CUDA_ARCH_LIST="5.3;6.0;6.1;7.0;7.5;8.0;8.6;8.9", without defining CUTLASS_NVCC_ARCHS.

Not sure if we need CUDA 12.8+ for sm90 builds, but I also got an error once I added 9.0+PTX back in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it appears that CUTLASS_NVCC_ARCHS is not necessary. This was the only green path I could find, but I welcome any suggestions or fixes as long as they work.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying a build with CUDA 12.8 and adding 9.0+PTX back in, but assuming that nothing crashes, we'll need to wait ~5 hours before we see the results.

Copy link
Copy Markdown
Contributor Author

@jeongseok-meta jeongseok-meta Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shermansiu Thanks for taking a look at this. Feel free to directly modify this PR, by the way.

Oh, you are not a maintainer. Feel free to create another PR if you prefer then.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we can add 5.3;6.0;6.1;7.0;7.5; back into TORCH_CUDA_ARCH_LIST?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but I think we're almost done? I think we can drop 9.0+PTX for the CUDA 12.6 build (this PR) and add it back in once we add the CUDA 12.8 one, in a different PR (assuming it compiles).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Using CUDA 12.8 did not work for 9.0+PTX... For debugging purposes, I'm just targeting that arch so I can figure out why it's throwing an error

@jeongseok-meta
Copy link
Copy Markdown
Contributor Author

@conda-forge-admin, please rerender

Copy link
Copy Markdown
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work here @jeongseok-meta!

@h-vetinari h-vetinari merged commit ea38755 into conda-forge:main Jul 22, 2025
24 checks passed
@jeongseok-meta jeongseok-meta deleted the v0.0.30 branch July 23, 2025 01:00
@jeongseok-meta
Copy link
Copy Markdown
Contributor Author

@h-vetinari Thank you for completing and merging this PR! Also, many thanks to @shermansiu for your valuable input!

@h-vetinari
Copy link
Copy Markdown
Member

Good news: sm_90 works again when compiled with CUDA 12.9. It's been re-added in #52, along with new architectures sm_100 and sm_120. The CUDA builds now need ~9 hours, but OK, that's what we got the server for. 🙃

@h-vetinari h-vetinari mentioned this pull request Aug 31, 2025
5 tasks
h-vetinari added a commit that referenced this pull request Aug 31, 2025
v0.0.30 is present in the git history, but was never published, since #50 was merged after bumping to v0.0.31.

Squash of all the commits on main branch since then
---------

Co-authored-by: Jeongseok Lee <jeongseok@meta.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants