Skip to content

doc: add CUDA changes to 25.11 release notes#456510

Open
ConnorBaker wants to merge 1 commit intoNixOS:masterfrom
ConnorBaker:feat/cuda-25-11-release-notes
Open

doc: add CUDA changes to 25.11 release notes#456510
ConnorBaker wants to merge 1 commit intoNixOS:masterfrom
ConnorBaker:feat/cuda-25-11-release-notes

Conversation

@ConnorBaker
Copy link
Contributor

@ConnorBaker ConnorBaker commented Oct 28, 2025

Important

Needs to be updated to reflect changes made before release.

As title.

Things done

  • Built on platform:
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • Tested, as applicable:
  • Ran nixpkgs-review on this PR. See nixpkgs-review usage.
  • Tested basic functionality of all binary files, usually in ./result/bin/.
  • Nixpkgs Release Notes
    • Package update: when the change is major or breaking.
  • NixOS Release Notes
    • Module addition: when adding a new NixOS module.
    • Module update: when the change is significant.
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other READMEs.

Add a 👍 reaction to pull requests you find important.

@ConnorBaker ConnorBaker self-assigned this Oct 28, 2025
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 8.has: changelog This PR adds or changes release notes 8.has: documentation This PR adds or changes documentation labels Oct 28, 2025
@ConnorBaker ConnorBaker added the 6.topic: cuda Parallel computing platform and API label Oct 28, 2025
@nixpkgs-ci nixpkgs-ci bot added the 9.needs: reviewer This PR currently has no reviewers requested and needs attention. label Oct 28, 2025
- Package expressions are being updated to support building against older releases (where appropriate) to support out-of-tree consumers like [cuda-legacy](https://github.com/nixos-cuda/cuda-legacy).
- `_cuda.lib.licenses` was created to introduce several NVIDIA licenses new to Nixpkgs and used almost entirely within the CUDA package set.
- `_cuda.lib.getRedistSystem` had an API change to accommodate a new dependency on CUDA version.
- Many members of the CUDA package set are now marked as broken if `config.cudaSupport` is not set. This change was made to prevent the common mistake of using CUDA tooling to build a CUDA application but forgetting to enable `config.cudaSupport`, resulting in an application which supports CUDA and dependencies which do not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed about this with @SomeoneSerge during the meeting today. He implied that it would be better to fall back to the previous behavior. I'll let him elaborate.

@nixpkgs-ci nixpkgs-ci bot removed the 9.needs: reviewer This PR currently has no reviewers requested and needs attention. label Oct 28, 2025
@ConnorBaker
Copy link
Contributor Author

ConnorBaker commented Oct 29, 2025

@SomeoneSerge following up on #437723 (comment):

One thing that came up on the weekly today is the new meta.broken behaviour: the PR makes the change to define all cudaPackages.* as "broken" unless config.cudaSupport is enabled globally. The motivation is understood to be to prevent the common mistake of mixing non- and CUDA-enabled dependencies, such as e.g. trying to use cpu-only torch in a CUDA-on torchvision build, or accidentally consuming cpu-only MPI for CUDA torch.

Yep! That's the goal -- unfortunately, the only way I could think of to achieve that currently is through the meta.broken mechanism. (Important to note: the change doesn't prevent people from using CPU-only builds of things -- it just makes it an error to use CUDA things without indicating CUDA should be enabled, under the assumption including CUDA things means they intend to use CUDA.)

As a litmus test, we'd like the torchWithCuda expression in vanilla/non-CUDA nixpkgs instnaces to produce an early error, warning of the invalid combination. The present PR achieves this by refusing to evaluate cudaPackages in the first place. To be clear, this is a breaking change in the interface, which is fine per se. The drawback of this approach is we refuse to evaluate even the valid derivations, i.e. cudaPackages.* and their immediate (i.e. non-transitive, distance-1) reverse dependencies.

Ideally we'd have a mechanism which allows a derivation to be modified by its inputs -- like setup hooks, but restricted to evaluation. With such a mechanism we could say "cuda_nvcc itself doesn't require CUDA support, but including it in as a dependency means your derivation will".

But right now all I have is the sledgehammer of meta.broken -- so yes, cuda_nvcc being marked as broken when config.cudaSupport is not true is inaccurate but necessary to prevent things which consume it from carrying on without config.cudaSupport being set to true.

To me, this is very much a desirable trade-off: I've lost several weeks of my life at this point to helping people troubleshoot runtime failures and trace dependencies only to find out they've not configured support for CUDA correctly. I want it to fail fast and save me the effort -- it's a small bonus that using meta.broken in this way with _cuda.lib.mkMetaBroken means I can see failed assertions when evaluation is done with --show-trace --trace-verbose.

I think the correct thing to do would be:

  • Introduce options.cuda.allowMixing = mkEnableOption "..." // { default = true; } (any more descriptive name)
  • Restore old meta.broken behaviour, conditioned on config.cuda.allowMixing
  • Change release-cuda.nix and pkgsCuda variant to set allowMixing = false
  • Lower the config.cudaSupport check into the leaves like torchWithCuda

That said, the motivation for the change is quite relatable, and I'm still wondering if I'm being too conservative...

I'm wary of the top-level config options -- types and merging are well supported for Nixpkgs imports because config is processed by the module system, but when providing config.nixpkgs.config to a NixOS configuration we only get recursive attribute set updates (

merge = args: lib.foldr (def: mergeConfig def.value) { };
), so it's last entry wins.

I'm open to discussing mixing in the future, but as Nixpkgs exists now it would create a great deal more pain for me to allow it and I really don't want to have to deal with that. I feel that, given the unfree licenses needed to build CUDA stuff, it's not too much to ask for users to ensure they also set config.cudaSupport (or even use pkgsCuda!).

EDIT: Clarified about config.nixpkgs.config as it pertains to NixOS system configurations.

@ConnorBaker ConnorBaker moved this from New to 👀 Awaits reviews in CUDA Team Oct 29, 2025
@SomeoneSerge
Copy link
Contributor

To me, this is very much a desirable trade-off

NGL, I'm leaning towards the same conclusion...

we only get dumb attribute set update behavior, so it's last entry wins (and no recursive merging)

H'm, well that's a detail of top-level/variants.nix implementation, which only exists to group aliases waiting for removal in the first place

@ConnorBaker
Copy link
Contributor Author

H'm, well that's a detail of top-level/variants.nix implementation, which only exists to group aliases waiting for removal in the first place

Edited my previous post to clarify — the complaints about merging are specifically with respect to NixOS configurations.

@SomeoneSerge
Copy link
Contributor

but when providing config.nixpkgs.config to a NixOS configuration we only get recursive attribute set updates (

merge = args: lib.foldr (def: mergeConfig def.value) { };

so it's last entry wins.

😩 😩 😩 god yes true

Copy link
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the nits above, changelog 👍🏿 !

@@ -16,6 +16,16 @@
If a newer C++ library feature is not available on the default deployment target, you will need to increase the deployment target.
See the Darwin platform documentation for more details.

- A great number of changes to CUDA packaging:
- CUDA 13 is now available.
- Creation of CUDA package sets has been rewritten to remove module system evaluation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly skeptical about modules fully going away within the release cycle, tbh. The complete item would be "separate data and package set layers". The current progress towards that I'd maybe describe as "phase out the multiplex.nix builder in favour of [more explicit DI-style passing stuff from manifests to the builder]", but I don't claim to have the full picture as I'm still reading the diffs. Fwiw, seems like much of the logic of multiplex is still in place, moved to buildRedist/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modules can always come back -- when there's a need for them :)

Building directly from the manifests cleans up and simplifies a lot of the logic, at the expense of essentially removing all validation of manifest structure.

buildRedist is essentially an updated version of the builder from generic, just with more checks and a healthy dose of offloading of various stages to a build script.

The multiplexing logic is gone entirely -- arguably it was a mistake to allow multiple versions of a single CUDA package within the package set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multiplexing logic is gone entirely -- arguably it was a mistake to allow multiple versions of a single CUDA package within the package set.

Yes, absolutely. Now this would be a good item for the changelog: it's an intentional change, it's likely persistent, it's high-level, it says something about the direction, etc, etc. Modules are an internal detail and they are not a problem in themselves, it's the incompatibility with DI that was the reason we wanted to get rid of the previous implementation...

Comment on lines +25 to +26
- `_cuda.lib.licenses` was created to introduce several NVIDIA licenses new to Nixpkgs and used almost entirely within the CUDA package set.
- `_cuda.lib.getRedistSystem` had an API change to accommodate a new dependency on CUDA version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer about internals&subject to change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's already documented in the Nixpkgs reference manual so I'm comfortable leaving it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the disclaimers in the manual either, and I'd like to see them here as well anyway

- Package expressions are being updated to support building against older releases (where appropriate) to support out-of-tree consumers like [cuda-legacy](https://github.com/nixos-cuda/cuda-legacy).
- `_cuda.lib.licenses` was created to introduce several NVIDIA licenses new to Nixpkgs and used almost entirely within the CUDA package set.
- `_cuda.lib.getRedistSystem` had an API change to accommodate a new dependency on CUDA version.
- Many members of the CUDA package set are now marked as broken if `config.cudaSupport` is not set. This change was made to prevent the common mistake of using CUDA tooling to build a CUDA application but forgetting to enable `config.cudaSupport`, resulting in an application which supports CUDA and dependencies which do not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer about the breaking change, and a reference to the config option for recovering the unsafe behaviour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a reference to the config.cudaSupport option, but I'm not introducing a way to re-arm the foot gun (at least not now, and not in this PR).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not introducing a way to re-arm the foot gun (at least not now, and not in this PR).

OK I open a separate one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking... in principle, the unsafe behaviour can be achieved OOT using overlays instead of the extra config 🤔

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
@ConnorBaker ConnorBaker force-pushed the feat/cuda-25-11-release-notes branch from 7777ce2 to 4390bef Compare October 29, 2025 18:33
@nixpkgs-ci nixpkgs-ci bot added the 12.approvals: 1 This PR was reviewed and approved by one person. label Oct 29, 2025
- Package expressions are being updated to support building against older releases (where appropriate) to support out-of-tree consumers like [cuda-legacy](https://github.com/nixos-cuda/cuda-legacy).
- `_cuda.lib.licenses` was created to introduce several NVIDIA licenses new to Nixpkgs and used almost entirely within the CUDA package set.
- `_cuda.lib.getRedistSystem` had an API change to accommodate a new dependency on CUDA version.
- Many members of the CUDA package set are now marked as broken if [`config.cudaSupport`](https://nixos.org/manual/nixpkgs/stable/#opt-cudaSupport) is not set. This change was made to prevent the common mistake of using CUDA tooling to build a CUDA application but forgetting to enable [`config.cudaSupport`](https://nixos.org/manual/nixpkgs/stable/#opt-cudaSupport), resulting in an application which supports CUDA and dependencies which do not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Many members of the CUDA package set are now marked as broken if [`config.cudaSupport`](https://nixos.org/manual/nixpkgs/stable/#opt-cudaSupport) is not set. This change was made to prevent the common mistake of using CUDA tooling to build a CUDA application but forgetting to enable [`config.cudaSupport`](https://nixos.org/manual/nixpkgs/stable/#opt-cudaSupport), resulting in an application which supports CUDA and dependencies which do not.
- Breaking: CUDA without [`config.cudaSupport`](https://nixos.org/manual/nixpkgs/stable/#opt-cudaSupport) is no longer supported. Building packages with CUDA dependencies without enabling `cudaSupport` globally is unsafe and often leads to inconsistent deployments, such as e.g. mixing CUDA-enabled `torchvision` with CPU-only `torch`. Starting with this release, the only supported way to use CUDA in Nixpkgs is `import <nixpkgs> { config.cudaSupport = true; }`.

And mv to the top


- `_cuda.lib.getRedistSystem` had an API change to accommodate a new dependency on CUDA version.

- Many members of the CUDA package set are now marked as broken if [`config.cudaSupport`](https://nixos.org/manual/nixpkgs/stable/#opt-cudaSupport) is not set. This change was made to prevent the common mistake of using CUDA tooling to build a CUDA application but forgetting to enable [`config.cudaSupport`](https://nixos.org/manual/nixpkgs/stable/#opt-cudaSupport), resulting in an application which supports CUDA and dependencies which do not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Many members of the CUDA package set are now marked as broken if [`config.cudaSupport`](https://nixos.org/manual/nixpkgs/stable/#opt-cudaSupport) is not set. This change was made to prevent the common mistake of using CUDA tooling to build a CUDA application but forgetting to enable [`config.cudaSupport`](https://nixos.org/manual/nixpkgs/stable/#opt-cudaSupport), resulting in an application which supports CUDA and dependencies which do not.
- CUDA without global `config.cudaSupport` is no longer supported, cf. the ["Highlights"](#sec-nixpkgs-release-25.11-highlights) section.

@nixpkgs-ci nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Nov 29, 2025
@GaetanLepage GaetanLepage added the backport release-25.11 Backport PR automatically label Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.status: merge conflict This PR has merge conflicts with the target branch 8.has: changelog This PR adds or changes release notes 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 12.approvals: 1 This PR was reviewed and approved by one person. backport release-25.11 Backport PR automatically

Projects

Status: 👀 Awaits reviews

Development

Successfully merging this pull request may close these issues.

3 participants