Add linux-aarch64 and cuda 12.8#438
Conversation
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/16682556960. Examine the logs at this URL for more detail. |
|
Thank you for troubleshooting some GCC + CUDA incompatibilities.
I simply cannot review anything that doesn't address the broken status of this recipe for other platforms. Conda-forge is where it is today because we strive for compatibiliy with many platforms. x86 on linux and mac builds must be addressed first, before I can spend time thinking about aarch64. My recommendation remains the same:
|
|
...
The Linux x86 builds work - I have been through every single one of them manually with build-locally.py, and done the same for linux-aarch64 The problem is that these builds are taking 24 hours (fwiw, they take 30-ish minutes on a 192 core box) .. and every one of them is timing out - eg: I haven't tried to build osx manually - I don't have the right machine to do the x86 versions.. and osx-arm64 is just not the same as the build system uses. |
|
Thank you for that clarification. Isuru and I merged a "cleanup" PR to help alleviate the build matrix. Do you want to try to rebase oen that? It should also speed up your builds of ARM allowing you to build the big stuff "once" effectively as close to a 4x speedup as we can get. |
|
rebasing might be hard for the build script. i agree that it might be easier to just "redo" the changes to that one manually. |
I'll take a look, my changes to the build script were - in the end - fairly minor.. |
a7da5db to
e253ee2
Compare
|
Hi! This is the friendly automated conda-forge-linting service. I was trying to look for recipes to lint for you, but it appears we have a merge conflict. Please try to merge or rebase with the base branch to resolve this conflict. Please ping the 'conda-forge/core' team (using the |
e253ee2 to
ffb5957
Compare
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/17578969045. Examine the logs at this URL for more detail. |
|
@conda-forge-admin, please rerender |
* obvious changes to build.sh (noting NVIDIA's aarch64 is 'sbsa')
* replace custom bazel toolchain with the gen-bazel-toolchain package
* add patches:
- 0031-bump-h5py-req.patch - h5py and psutil are not available as a binary for
the versions of h5py in tensorflow 2.18's spec, but the
system won't try to build it. Bump the version so we get ones that
do exist.
- 0032-gpu_prim-error.patch - as per openxla's pull 16095 - but extended to also fix the Store methods, needed for cuda 12.8/12.9
…pull 393) * The vendored xnnpack in tensorflow 2.18 is incompatible with gcc14, so we pin to gcc13 * Hold back compiler versions for aarch64 to gcc-11 with cuda. cuda 12.8 only handles aarch64 neon if gcc is < 12
0020c69 to
f7b55fb
Compare
|
could someone review this please - it is ready for that - AIUI CI fails are normal for this package. What built before builds now via build-locally.py, along with cuda 12.8 (as this is earliest version that will build tensorflow for Arm) and the arm builds. The new aarch64 support is now using cross-compilation. |
…5.09.04.02.19.31 Other tools: - conda-build 25.7.0 - rattler-build 0.47.0 - rattler-build-conda-compat 1.4.6
|
@dslarm, unfortunately your odyssey isn't finished yet. There was no CUDA 12.8 job for aarch here yet, and now that I've added it, it runs into: |
…5.09.07.20.31.15 Other tools: - conda-build 25.7.0 - rattler-build 0.47.0 - rattler-build-conda-compat 1.4.6
|
OK, the intention seems to have been to downgrade to GCC 11 on aarch; I did that now. But it still runs into which looks like bazel not understanding the cross-compiler setup, and related to rebuilding the vendored LLVM. Interestingly, this doesn't happen in the CPU build, which likely explains why we haven't seen this for |
|
Sigh reminds me of conda/conda-build#5571. Edit: unrelated tough |
|
[..]
Thanks - that must be where I lost all hope last time with cross-compile of the CUDA side.. I'll try again.. there's already one dirty patch that was needed for CPU side, so another dirty patch may be in order .. |
|
Is currently failing early(ish) with: this seems at odds with what cuda 12.8 claims (it should support sm_100) |
Yeah, but 80c6778 was mostly for completeness (i.e. if we bump the CUDA version, we should match the capabilities). Feel free to remove the newer arches for now (or revert that commit), it's not the key part of this PR. |
|
So, this now passes on everything except aarch+CUDA. We could take the aarch+CPU support and take this as an intermediate win that the other PRs can build on top of. Thoughts @conda-forge/tensorflow? |
|
does it pass with aarch native compilation dslarm was willing to do that and invoke CFEP03. I'm personally too burned out to care about anything that doesn't "unpin" abseil on tensorflow. Its causing me real solving issues, where alot of the stack is getting downgraded to add tensorflow to an environment. But you two got this to a good place, so maybe this is good to merge. |
|
I'd take the win of linux-aarch64 - without CUDA - and using cross-compilation - can we move to merge this (any outstanding reviews etc)? Although I'm happy to run the native case ('CFEP03' to get everything) - I think I may be get done with cross-compiled cuda when I can give a bit more time to it - hence take the win on cross-compiled non-cuda for now. FWIW, I think there's an issue with the bazel-toolchain for cross-compile with CUDA, and it's just painful to debug (not a bazel expert..) - I assume (but haven't done the debug yet) the Rust.* compile that fails when cross-compiling for cuda+aarch64 is also done when plain aarch64 cross-compiles. If that's the case, then it must surely be in the cuda scripts 'crosstool_wrapper_driver_is_not_gcc' etc that appear to get invoked only in the cuda case and that would be missing the aarch64 system include directories. |
Add linux-aarch64 and cuda 12.8
|
Ok. Feel free to upload the logs here for cfep03 |
|
Thanks all!!! |
FWIW, since macos builds didn't change appreciably here, I'm not planning to do any CFEP 03 builds for this. If @dslarm can get the aarch+CUDA builds unblocked, I'd prefer a separate PR for that, even if it's built locally. |
|
Thanks both for your help - I'll continue to try more on cuda shortly.. |
|
Hm, I definitely thought this, but apparently didn't write it down - better late than never I hope: Thanks so much @dslarm for the patience and persistence in shepherding this PR to completion! This one was particularly tricky and took a very long time, sorry about that. 🙃 |
This PR is a cleaned and updated branch to bring in both aarch64 and also cuda 12.8. This replaces the PR #426.
Nutshell of #426 learnings:
Recent learnings:
Next
Outstanding questions - may or may not need me to do something: