Skip to content
54 changes: 54 additions & 0 deletions recipe/migrations/cuda129.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
migrator_ts: 1738229377
__migrator:
kind:
version
migration_number:
1
build_number:
1
paused: false
override_cbc_keys:
- cuda_compiler_stub
check_solvable: false
primary_key: cuda_compiler_version
ordering:
cuda_compiler_version:
- 12.4
- 12.6
- 12.8
- None
- 12.9
# to allow manual opt-in for CUDA 11.8, see
# https://github.com/conda-forge/conda-forge-pinning-feedstock/pull/7472
# must be last due to how cuda_compiler ordering in that migrator works
- 11.8
commit_message: |
Upgrade to CUDA 12.9

CUDA 12.8 added support for architectures `sm_100`, `sm_101` and `sm_120`,
while CUDA 12.9 further added `sm_103` and `sm_121`. To build for these,
maintainers will need to modify their existing list of specified architectures
(e.g. `CMAKE_CUDA_ARCHITECTURES`, `TORCH_CUDA_ARCH_LIST`, etc.)
for their package. A good balance between broad support and storage
footprint (resp. compilation time) is to add `sm_100` and `sm_120`.

Since CUDA 12.8, the conda-forge nvcc package now sets `CUDAARCHS` and
`TORCH_CUDA_ARCH_LIST` in its activation script to a string containing all
of the supported real architectures plus the virtual architecture of the
latest. Recipes for packages who use these variables to control their build
but do not want to build for all supported architectures will need to override
these variables in their build script.

ref: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#new-features

cuda_compiler_version: # [((linux and (x86_64 or aarch64)) or win64) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- 12.9 # [((linux and (x86_64 or aarch64)) or win64) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]

c_compiler_version: # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- 13 # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA 12.8 and 12.9 both support GCC 14. I haven't tracked the GCC 14 migration elsewhere on conda-forge enough to know if this should be bumped to 14 or not.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, and I'm planning to make use of this. If #7421 gets merged first, I'll update to 14 here. Or if this PR gets merged first, I'll bump the pin in the cuda129.yaml file in the other PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the topic of GCC 14 (which we'll bump to in a few days), it seems that we maybe should stay on GCC 13 for CUDA 12.9 for now. At least on the pytorch side, this combination ran into issues, namely

Looks like GCC 14 might be premature, at least for pytorch (or at least without turning off -Wincompatible-pointer-types):

 $SRC_DIR/third_party/XNNPACK/src/f16-conv-hwc2chw/f16-conv-hwc2chw-3x3s2p1c3x4-neonfp16arith-2x2.c:53:62: error: passing argument 1 of 'vld1_dup_u16' from incompatible pointer type [-Wincompatible-pointer-types]
   53 |   const float16x4_t vmax = vreinterpret_f16_u16(vld1_dup_u16(&params->scalar.max));
      |                                                              ^~~~~~~~~~~~~~~~~~~
      |                                                              |
      |                                                              const xnn_float16 * {aka const _Float16 *}
In file included from $SRC_DIR/third_party/XNNPACK/src/f16-conv-hwc2chw/f16-conv-hwc2chw-3x3s2p1c3x4-neonfp16arith-2x2.c:8:
$BUILD_PREFIX/lib/gcc/aarch64-conda-linux-gnu/14.3.0/include/arm_neon.h:13130:31: note: expected 'const uint16_t *' {aka 'const short unsigned int *'} but argument is of type 'const xnn_float16 *' {aka 'const _Float16 *'}
13130 | vld1_dup_u16 (const uint16_t* __a)
      |               ~~~~~~~~~~~~~~~~^~~

Curious also that this doesn't seem to be an issue on x64, only on aarch64.

At first glance it appears that the type of params->scalar.max gets messed up, because casting from _Float16 to uint16 sounds very risky, and the vld1_dup_u16 in GCC really is about integers (so I can't see how it'd be a case of picking the wrong overload).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So #7421 has been merged now. But for now I'm leaving CUDA 12.9 on GCC 13, until the above issue gets fixed or someone tells me that the issue is somehow specific to pytorch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

@h-vetinari h-vetinari Jul 15, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] or someone tells me that the issue is somehow specific to pytorch.

FYI, since it turns out that the problems were specific to pytorch, I'm bumping the CUDA 12.9 migrator to GCC 14 now, to match the rest of the pinning: #7563


cxx_compiler_version: # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- 13 # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]

fortran_compiler_version: # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- 13 # [(linux and (x86_64 or aarch64)) and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]