Skip to content

Migrate for CUDA 12.9#7476

Merged
h-vetinari merged 10 commits into
conda-forge:mainfrom
h-vetinari:cuda129
Jul 14, 2025
Merged

Migrate for CUDA 12.9#7476
h-vetinari merged 10 commits into
conda-forge:mainfrom
h-vetinari:cuda129

Conversation

@h-vetinari

@h-vetinari h-vetinari commented Jun 13, 2025

Copy link
Copy Markdown
Member

Builds on top of #7005 after the problems there were rendered obsolete by dropping CUDA 11.8 (c.f. #7404, #7431)

As a demo, I've opened conda-forge/pytorch-cpu-feedstock#393 though this currently needs a smithy PR (conda-forge/conda-smithy#2335) due to an issue with the variant algebra for exactly the case we want to do here: conda-forge/conda-smithy#2331

Closes #7005
Closes #6980

@h-vetinari h-vetinari requested a review from a team as a code owner June 13, 2025 04:23
@conda-forge-admin

Copy link
Copy Markdown
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@h-vetinari h-vetinari marked this pull request as draft June 13, 2025 06:08
@h-vetinari

Copy link
Copy Markdown
Member Author

Draft until we fix smithy to update correctly

@h-vetinari h-vetinari marked this pull request as ready for review June 14, 2025 20:33
@h-vetinari

Copy link
Copy Markdown
Member Author

The smithy changes have landed and were released in 3.50.1 (thanks @beckermr!), so this is ready for review!

PTAL especially @conda-forge/cuda.

Perhaps relevant: there's some weird linker errors in conda-forge/pytorch-cpu-feedstock#393 that seem to be due to the CUDA 12.9 toolchain (or some interaction with it).

@hmaarrfk hmaarrfk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I beleive you have addressed all the technical blockers correct.

@h-vetinari

Copy link
Copy Markdown
Member Author

Blockers from the infrastructure side should all be resolved, but we haven't got a passing 12.9 build for pytorch yet, and I'd like to understand what's going wrong in the toolchain there (aside from some input or green light from @conda-forge/cuda on this in general).

Feedstocks that want to build with 12.9 can do so of course (and feedback would be welcome!): simply copy the migrator from this PR, add use_local: true and then rerender.

@hmaarrfk

Copy link
Copy Markdown
Contributor

Perfect. I guess my approval is on the structure of this PR and once the PyTorch build is ready this can be merged without further input from me

Comment thread recipe/migrations/cuda129.yaml Outdated
- 12.9 # [((linux and (x86_64 or aarch64)) or win64) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]

c_compiler_version: # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- 13 # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA 12.8 and 12.9 both support GCC 14. I haven't tracked the GCC 14 migration elsewhere on conda-forge enough to know if this should be bumped to 14 or not.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, and I'm planning to make use of this. If #7421 gets merged first, I'll update to 14 here. Or if this PR gets merged first, I'll bump the pin in the cuda129.yaml file in the other PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the topic of GCC 14 (which we'll bump to in a few days), it seems that we maybe should stay on GCC 13 for CUDA 12.9 for now. At least on the pytorch side, this combination ran into issues, namely

Looks like GCC 14 might be premature, at least for pytorch (or at least without turning off -Wincompatible-pointer-types):

 $SRC_DIR/third_party/XNNPACK/src/f16-conv-hwc2chw/f16-conv-hwc2chw-3x3s2p1c3x4-neonfp16arith-2x2.c:53:62: error: passing argument 1 of 'vld1_dup_u16' from incompatible pointer type [-Wincompatible-pointer-types]
   53 |   const float16x4_t vmax = vreinterpret_f16_u16(vld1_dup_u16(&params->scalar.max));
      |                                                              ^~~~~~~~~~~~~~~~~~~
      |                                                              |
      |                                                              const xnn_float16 * {aka const _Float16 *}
In file included from $SRC_DIR/third_party/XNNPACK/src/f16-conv-hwc2chw/f16-conv-hwc2chw-3x3s2p1c3x4-neonfp16arith-2x2.c:8:
$BUILD_PREFIX/lib/gcc/aarch64-conda-linux-gnu/14.3.0/include/arm_neon.h:13130:31: note: expected 'const uint16_t *' {aka 'const short unsigned int *'} but argument is of type 'const xnn_float16 *' {aka 'const _Float16 *'}
13130 | vld1_dup_u16 (const uint16_t* __a)
      |               ~~~~~~~~~~~~~~~~^~~

Curious also that this doesn't seem to be an issue on x64, only on aarch64.

At first glance it appears that the type of params->scalar.max gets messed up, because casting from _Float16 to uint16 sounds very risky, and the vld1_dup_u16 in GCC really is about integers (so I can't see how it'd be a case of picking the wrong overload).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So #7421 has been merged now. But for now I'm leaving CUDA 12.9 on GCC 13, until the above issue gets fixed or someone tells me that the issue is somehow specific to pytorch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

@h-vetinari h-vetinari Jul 15, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] or someone tells me that the issue is somehow specific to pytorch.

FYI, since it turns out that the problems were specific to pytorch, I'm bumping the CUDA 12.9 migrator to GCC 14 now, to match the rest of the pinning: #7563

@h-vetinari h-vetinari mentioned this pull request Jun 29, 2025
19 tasks
@h-vetinari

Copy link
Copy Markdown
Member Author

We finally merged conda-forge/pytorch-cpu-feedstock#393, though on windows we had to downgrade to 12.8 because 12.9 was OOM-ing even on the largest possible machine. I'm fine with keeping this specific to pytorch (which is a beast to build anyway), as long as we're reasonably confident that there are no big unresolved issues with 12.9 on windows. It does seem like the toolchain has a problem (or a regression) there though.

@h-vetinari

h-vetinari commented Jul 8, 2025

Copy link
Copy Markdown
Member Author

@conda-forge/cuda can someone please comment whether this is good to go from your end. Several feedstocks are waiting to support the new architectures.

I think the remaining open points encountered specifically on the pytorch feedstock (win+12.9 OOMs but works with 12.8; linux compilation errors when using GCC 14) aren't big enough to be blockers for getting this started.

@bdice bdice left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me. Maybe @jakirkham or @carterbox can take a quick peek before merging?

- 12.9 # [((linux and (x86_64 or aarch64)) or win64) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]

c_compiler_version: # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- 13 # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

Comment thread recipe/migrations/cuda129.yaml
Co-authored-by: Daniel Ching <9604511+carterbox@users.noreply.github.com>
@h-vetinari

Copy link
Copy Markdown
Member Author

Alright, thanks for the inputs @bdice @carterbox. I'll merge this in 72h unless there are other comments.

leofang added a commit to regro-cf-autotick-bot/cupy-feedstock that referenced this pull request Jul 13, 2025
@h-vetinari h-vetinari merged commit cab1800 into conda-forge:main Jul 14, 2025
3 checks passed
@h-vetinari h-vetinari deleted the cuda129 branch July 14, 2025 21:48
@h-vetinari h-vetinari mentioned this pull request Aug 18, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding CUDA 12.9 migration

6 participants