Migrate for CUDA 12.9#7476
Conversation
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
|
Draft until we fix smithy to update correctly |
|
The smithy changes have landed and were released in 3.50.1 (thanks @beckermr!), so this is ready for review! PTAL especially @conda-forge/cuda. Perhaps relevant: there's some weird linker errors in conda-forge/pytorch-cpu-feedstock#393 that seem to be due to the CUDA 12.9 toolchain (or some interaction with it). |
hmaarrfk
left a comment
There was a problem hiding this comment.
I beleive you have addressed all the technical blockers correct.
|
Blockers from the infrastructure side should all be resolved, but we haven't got a passing 12.9 build for pytorch yet, and I'd like to understand what's going wrong in the toolchain there (aside from some input or green light from @conda-forge/cuda on this in general). Feedstocks that want to build with 12.9 can do so of course (and feedback would be welcome!): simply copy the migrator from this PR, add |
|
Perfect. I guess my approval is on the structure of this PR and once the PyTorch build is ready this can be merged without further input from me |
| - 12.9 # [((linux and (x86_64 or aarch64)) or win64) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] | ||
|
|
||
| c_compiler_version: # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] | ||
| - 13 # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] |
There was a problem hiding this comment.
CUDA 12.8 and 12.9 both support GCC 14. I haven't tracked the GCC 14 migration elsewhere on conda-forge enough to know if this should be bumped to 14 or not.
There was a problem hiding this comment.
I know, and I'm planning to make use of this. If #7421 gets merged first, I'll update to 14 here. Or if this PR gets merged first, I'll bump the pin in the cuda129.yaml file in the other PR.
There was a problem hiding this comment.
On the topic of GCC 14 (which we'll bump to in a few days), it seems that we maybe should stay on GCC 13 for CUDA 12.9 for now. At least on the pytorch side, this combination ran into issues, namely
Looks like GCC 14 might be premature, at least for pytorch (or at least without turning off
-Wincompatible-pointer-types):$SRC_DIR/third_party/XNNPACK/src/f16-conv-hwc2chw/f16-conv-hwc2chw-3x3s2p1c3x4-neonfp16arith-2x2.c:53:62: error: passing argument 1 of 'vld1_dup_u16' from incompatible pointer type [-Wincompatible-pointer-types] 53 | const float16x4_t vmax = vreinterpret_f16_u16(vld1_dup_u16(¶ms->scalar.max)); | ^~~~~~~~~~~~~~~~~~~ | | | const xnn_float16 * {aka const _Float16 *} In file included from $SRC_DIR/third_party/XNNPACK/src/f16-conv-hwc2chw/f16-conv-hwc2chw-3x3s2p1c3x4-neonfp16arith-2x2.c:8: $BUILD_PREFIX/lib/gcc/aarch64-conda-linux-gnu/14.3.0/include/arm_neon.h:13130:31: note: expected 'const uint16_t *' {aka 'const short unsigned int *'} but argument is of type 'const xnn_float16 *' {aka 'const _Float16 *'} 13130 | vld1_dup_u16 (const uint16_t* __a) | ~~~~~~~~~~~~~~~~^~~Curious also that this doesn't seem to be an issue on x64, only on aarch64.
At first glance it appears that the type of
params->scalar.maxgets messed up, because casting from_Float16touint16sounds very risky, and thevld1_dup_u16in GCC really is about integers (so I can't see how it'd be a case of picking the wrong overload).
There was a problem hiding this comment.
So #7421 has been merged now. But for now I'm leaving CUDA 12.9 on GCC 13, until the above issue gets fixed or someone tells me that the issue is somehow specific to pytorch.
There was a problem hiding this comment.
[...] or someone tells me that the issue is somehow specific to pytorch.
FYI, since it turns out that the problems were specific to pytorch, I'm bumping the CUDA 12.9 migrator to GCC 14 now, to match the rest of the pinning: #7563
|
We finally merged conda-forge/pytorch-cpu-feedstock#393, though on windows we had to downgrade to 12.8 because 12.9 was OOM-ing even on the largest possible machine. I'm fine with keeping this specific to pytorch (which is a beast to build anyway), as long as we're reasonably confident that there are no big unresolved issues with 12.9 on windows. It does seem like the toolchain has a problem (or a regression) there though. |
|
@conda-forge/cuda can someone please comment whether this is good to go from your end. Several feedstocks are waiting to support the new architectures. I think the remaining open points encountered specifically on the pytorch feedstock (win+12.9 OOMs but works with 12.8; linux compilation errors when using GCC 14) aren't big enough to be blockers for getting this started. |
bdice
left a comment
There was a problem hiding this comment.
This looks fine to me. Maybe @jakirkham or @carterbox can take a quick peek before merging?
| - 12.9 # [((linux and (x86_64 or aarch64)) or win64) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] | ||
|
|
||
| c_compiler_version: # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] | ||
| - 13 # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] |
Co-authored-by: Daniel Ching <9604511+carterbox@users.noreply.github.com>
|
Alright, thanks for the inputs @bdice @carterbox. I'll merge this in 72h unless there are other comments. |
Builds on top of #7005 after the problems there were rendered obsolete by dropping CUDA 11.8 (c.f. #7404, #7431)
As a demo, I've opened conda-forge/pytorch-cpu-feedstock#393
though this currently needs a smithy PR (conda-forge/conda-smithy#2335) due to an issue with the variant algebra for exactly the case we want to do here: conda-forge/conda-smithy#2331Closes #7005
Closes #6980