-
-
Notifications
You must be signed in to change notification settings - Fork 25
Update to 1.22.2 and CUDA 12.9 -- use cirun for compilation #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to 1.22.2 and CUDA 12.9 -- use cirun for compilation #149
Conversation
…penstack-cpu-xlarge using Cirun
…nda-forge-pinning 2025.05.20.06.20.57
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/19783971603. Examine the logs at this URL for more detail. |
|
I'm a bit lost on how to move this forward or which documentation to follow. Do all maintainers have to accept the ToS? Is there an obvious next step to take here, @hmaarrfk? |
|
The 1.22.0 builds fail due to a hash-issue with some of the external dependencies pulled in by cmake. The issue is reportedly fixed in 1.22.1. I changed this PR to target that version and rerendered. |
|
The Cirun fails with
It would be great if those missing could create the appropriate PRs so that we can finally get this over the finish line ❤️ . |
h-vetinari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@h-vetinari stated here that all maintainers need to agree to the ToS by going through the motions described here.
Not all listed maintainers need to accept, but anyone who wants to trigger CI. Things ran fine here already, but you've misconfigured things, and jobs with unknown labels fail.
This must be fixed, because as it stands, you're wasting an enormous amount of resources by doing each job three times
CUDA 12.8 added support for architectures `sm_100`, `sm_101` and `sm_120`, while CUDA 12.9 further added `sm_103` and `sm_121`. To build for these, maintainers will need to modify their existing list of specified architectures (e.g. `CMAKE_CUDA_ARCHITECTURES`, `TORCH_CUDA_ARCH_LIST`, etc.) for their package. A good balance between broad support and storage footprint (resp. compilation time) is to add `sm_100` and `sm_120`. Since CUDA 12.8, the conda-forge nvcc package now sets `CUDAARCHS` and `TORCH_CUDA_ARCH_LIST` in its activation script to a string containing all of the supported real architectures plus the virtual architecture of the latest. Recipes for packages who use these variables to control their build but do not want to build for all supported architectures will need to override these variables in their build script. ref: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#new-features
…5.08.31.03.07.54 Other tools: - conda-build 25.7.0 - rattler-build 0.46.0 - rattler-build-conda-compat 1.4.5
9ff191d to
df40b0e
Compare
|
@h-vetinari, do you see a way to unblock this PR? I'm a bit lost here... |
|
It seems we are cross-compiling here? Any chance the wrong libcudadevrt is fed to the linker? |
Hey Leo, thanks for the quick response! We're cross-compiling aarch+CUDA in many places (including on this feedstock with CUDA 12.6), so I'd be surprised if a more or less standard setup (that we've been using for years) would suddenly stop working. We'd be seeing this in a lot more places. |
It seems like we might still be OOM'ing, does it not: https://github.com/conda-forge/onnxruntime-feedstock/actions/runs/19094250500/job/54550686105?pr=149#step:3:3757 |
|
Indeed. But d14dff9 was only experimental to see if it would change anything about (it didn't...) and the commit before that ran fine on linux-64. We can however try 2xlarge, just send a PR for using |
…5.11.14.11.30.28 Other tools: - conda-build 25.9.0 - rattler-build 0.49.0 - rattler-build-conda-compat 1.4.9
…5.11.15.23.15.53 Other tools: - conda-build 25.9.0 - rattler-build 0.49.0 - rattler-build-conda-compat 1.4.9
|
Should we consider skipping cuda builds on linux-aarch64 for now? |
Fine by me. I don't know what's wrong with I've tried dropping the older arches, but the same error remains even for stuff that's definitely still supported, and I don't know how to debug this |
|
after you drop things, ping Perhaps he can help troubleshoot things. |
|
I skipped the aarch64 cuda builds and the cuda+novec builds. On the other hand, I re-added the Windows-novec builds. I think this is finally ready to go 😍! |
cbourjau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge by the end of today if there are no objections. Thanks to everybody involved already!
|
i updated the title a bit. |
|
@dslarm I'm afraid we had to remove CUDA builds from linux-aarch64 to get this PR over the finish line. Are you interested in taking a look at this at some point? |
Note that only builds triggered by maintainers of the feedstock (and core)
who have accepted the terms of service and privacy policy will run
on Github actions via Cirun.
at https://github.com/Quansight/open-gpu-server
Closes #150
Closes #151
Closes #152
Closes #158