Skip to content

check-meta: custom remediation messages#456908

Closed
SomeoneSerge wants to merge 2 commits intoNixOS:masterfrom
SomeoneSerge:feat/broken-why
Closed

check-meta: custom remediation messages#456908
SomeoneSerge wants to merge 2 commits intoNixOS:masterfrom
SomeoneSerge:feat/broken-why

Conversation

@SomeoneSerge
Copy link
Contributor

Things done

Allows packages to define meta.problems ahead of a complete RFC 127 "Problems" implementation (#177272). Uses meta.problems to customize remediation messages for broken and unsupported packages. Migrates cudaPackages assertions to meta.problems.

Cf. discussion in #437723 (comment) and #456510 (comment). The TLDR is that it's highly undesirable to revert the changes from CUDA 13 PR, and, in fact, a high time to rip the band aid off on the local/unsafe cudaSupport. Ripping it off while displaying the NIXPKGS_ALLOW_BROKEN message would be an absolute disaster for SEO and documentation, as in totally and hopelessly ruinous. The sought compromise is to rip off, but to add the possible minimum extra data to meta to make guiding the user possible.

Example:

$ # BEFORE:
$ nix-instantiate --arg config '{allowUnfree = true;}' -A python3Packages.torchWithCuda '<nixpkgs>'
...
       error: Package ‘cuda12.8-cuda_cccl-12.8.90’ in /nix/store/hdccjmai76wac9xh25xciw6wmg24qkdk-source/pkgs/development/cuda-modules/packages/cuda_cccl.nix:27 is marked as broken, refusing to evaluate.
...
       a) To temporarily allow broken packages, you can use an environment variable
          for a single invocation of the nix tools.

            $ export NIXPKGS_ALLOW_BROKEN=1
...
$ # AFTER:
$ nix-instantiate --arg config '{allowUnfree = true;}' -A python3Packages.torchWithCuda
...
       error: Package ‘cuda12.8-cuda_cccl-12.8.90’ in /home/else/s/nixpkgses/broken-why/pkgs/development/cuda-modules/packages/cuda_cccl.nix:27 is not available on the requested hostPlatform:
...
       Known problems:

       CUDA without global `config.cudaSupport` is unsafe and unsupported.
       Cf. NixOS 25.11 Release Notes.

       a) Use `import <nixpkgs> { config.cudaSupport = true; }`.
       b) For `nixos-rebuild`, set
         { nixpkgs.config.cudaSupport = true; }
       in `configuration.nix`.
       c) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
         { cudaSupport = true; }
       to ~/.config/nixpkgs/config.nix.
  • Evaluated on platform:
    • x86_64-linux
  • Nixpkgs Release Notes
    • Not applicable (meant to stay private not a complete RFC127 implementation)
  • Fits [CONTRIBUTING.md], [pkgs/README.md], [maintainers/README.md] and other READMEs.

CC @NixOS/nixpkgs-core because check-meta.nix and trying to sneak in a relatively ad hoc fix into the release.
CC @piegamesde because RFC 127.
CC @ConnorBaker.


Add a 👍 reaction to pull requests you find important.

Waiting for a more comprehensive RFC 127 "Problems" implementation
(NixOS#177272), proactively packages to specify `meta.problems` and use
`problems` of kinds `unsupported` and `broken` to display context-aware
remediation messages.

This change is motivated by the merging of the CUDA13 PR, which included
denying in-tree support for using CUDA without enabling it Nixpkgs-wide.
Implementation-wise, unsupported packages were marked as `broken` (TBD:
make "unsupported"), and reasons are visible when `--trace-verbose`,
but obscured by the long and unhelpful NIXPKGS_ALLOW_BROKEN message.

Instead of reverting the CUDA13 changes, which are blocking for a number
of updates in the SciComp ecosystem, it seems better to allow
customization of remediation mesages.

In `cudaPackages` (and dependend `pythonXPackages`), we used to rely on
an ad hoc `brokenConditions` passthru attribute. Instead of moving that
to `meta`, rewrite them in the form compatible with the future RFC127
implementation.
@ConnorBaker
Copy link
Contributor

I’ll look at this tomorrow (hopefully), but from what I remember the original problems RFC implementation was superseded by #338267, which was blocked due to performance concerns. I’ve not looked at this PR, but that may be a concern raised here.

@SomeoneSerge
Copy link
Contributor Author

performance concerns

That's the reason I restrained myself to listOf any and didn't add any handler support: all the errors are still generated exclusively by looking at the bool meta.broken&c, and meta.problems are only ever accessed from the remediation branch. There's a logic to scan the list in cudaPackages, but that's the cost we already have been paying anyway

A couple Python/SciComp and CUDA packages previously relied on making
`meta.broken` and `meta.unsupported` conditions more reliable by listing
the reasons in `passthru.brokenConditions` (&c), used in conjunction
with `addErrorContext` or ad hoc `assert`. Now that the need is more
urgent (Cf. 25.11 config.cudaSupport requirement), use an
RFC127-compatible scheme to list problems and display meaningful errors.

cudaPackages: moved assertions from `backendStdenv` (keep it a tiny
shim, hopefully eventually remove) to `cuda_nvcc`.
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 6.topic: python Python is a high-level, general-purpose programming language. 6.topic: stdenv Standard environment 6.topic: cuda Parallel computing platform and API labels Oct 30, 2025
@nixpkgs-ci nixpkgs-ci bot added the 8.has: documentation This PR adds or changes documentation label Oct 30, 2025
@emilazy
Copy link
Member

emilazy commented Oct 31, 2025

Although I fully support proper meta.problems functionality, I don’t feel comfortable rushing review and merge of such a significant change at this stage in the release and think we ought to reconsider the timing of the CUDA changes leading to it; I left at comment at #437723 (comment).

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Oct 31, 2025

RFC implementation ... superseded by #338267

Thanks, wasn't aware of the rework and didn't go too far searching.
I there's a proper implementation there, with tests, and with Adam calling "use concatMap" at people :)

The config.cudaSupport change and the release aside, I'm very much of the opinion that we do not need to finish all of that upfront before we start actually replacing ad hoc warns and asserts with more future-proof and rfc-compatible meta.problems. We just can't advertise them or call them public, but that's about it. That's why the diff for check-meta here is -300 +400, of which 300 are indent changes coming from nixfmt.

[Accidentally clicked C-Return]

That said, I agree about the timing, and also appreciate that Connor's hope was to merge the PR before October...

@nixpkgs-ci nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Nov 1, 2025
@LuNeder
Copy link
Contributor

LuNeder commented Nov 1, 2025

       b) For `nixos-rebuild`, set
         { nixpkgs.config.cudaSupport = true; }
       in `configuration.nix`.

As someone who uses this on her main system, this is an awful recommendation. This triggers massive rebuilds for packages that the user probably doesn't even want to use with CUDA.

This might be too big to add to the error message, but it's probably about time the wiki page about CUDA teaches users to make an overlay for using with the packages they want (instead of only mentioning nix shells), such as:

  nixpkgs.overlays = [
    (final: prev: {
      pkgsCu = import inputs.nixpkgs {  config.allowUnfree = config.nixpkgs.config.allowUnfree;  localSystem.system = final.stdenv.hostPlatform.system; localSystem.config = "${final.stdenv.hostPlatform.config}"; config.cudaSupport = true; config.cudaVersion = "12";};
    })
  ];

I could even make a PR adding something like that to the wiki page.

Not sure what the guidelines are for this kind of error messages, but maybe it could then link to the wiki page on the last line ("For more information check the wiki page: https://wiki.nixos.org/wiki/CUDA" or something)?

While I do have it the other way around in my config, this is honestly an awful experience and the only reason I have nixpkgs.config.cudaSupport = true; for the entire system on my PC is a non-nixed program that I couldn't get to run with cuda otherwise.

@SomeoneSerge
Copy link
Contributor Author

@LuNeder hi, yes the idea precisely is to have two different instances of Nixpkgs, one CPU-only and one CUDA-only, this way achieving consistency while skipping most of the false-positive rebuilds triggered by early binding. You're probably right that the documentation around this sucks, and in particular that the suggestion about the NixOS option should be removed (I copy-pasted it from the original remediation template)

@ConnorBaker
Copy link
Contributor

       b) For `nixos-rebuild`, set
         { nixpkgs.config.cudaSupport = true; }
       in `configuration.nix`.

As someone who uses this on her main system, this is an awful recommendation. This triggers massive rebuilds for packages that the user probably doesn't even want to use with CUDA.

This might be too big to add to the error message, but it's probably about time the wiki page about CUDA teaches users to make an overlay for using with the packages they want (instead of only mentioning nix shells), such as:

  nixpkgs.overlays = [
    (final: prev: {
      pkgsCu = import inputs.nixpkgs {  config.allowUnfree = config.nixpkgs.config.allowUnfree;  localSystem.system = final.stdenv.hostPlatform.system; localSystem.config = "${final.stdenv.hostPlatform.config}"; config.cudaSupport = true; config.cudaVersion = "12";};
    })
  ];

I could even make a PR adding something like that to the wiki page.

Not sure what the guidelines are for this kind of error messages, but maybe it could then link to the wiki page on the last line ("For more information check the wiki page: https://wiki.nixos.org/wiki/CUDA" or something)?

While I do have it the other way around in my config, this is honestly an awful experience and the only reason I have nixpkgs.config.cudaSupport = true; for the entire system on my PC is a non-nixed program that I couldn't get to run with cuda otherwise.

Have you checked the Nixpkgs docs for CUDA? Seems like what you want is covered by pkgsCuda and cudaPackages.pkgs.

@ruffsl
Copy link
Contributor

ruffsl commented Nov 2, 2025

Have you checked the Nixpkgs docs for CUDA? Seems like what you want is covered by pkgsCuda and cudaPackages.pkgs.

I think I found the one sentence that reference this, but are there any examples one can read? I think I'm only finding 5 examples at the moment in the wild:

https://github.com/search?q=language%3Anix+%2Fpkgs%5C.pkgsCuda%2F+-path%3Apkgs%2Ftop-level%2F*&type=code

@SomeoneSerge
Copy link
Contributor Author

I'ma close this for now: we're postponing the "mandatory consistency" change, which was the reason for urgency.

@LuNeder @ruffsl you bring up important points, but they were offtopic for this PR; I encourage you to open a tracking issue with the documentation and cuda labels, and/or prepare respective PRs.

I hope to revisit this at the start of the next release cycle, because I feel quite skeptical about the idea of rolling out a full-blown and performance-optimized RFC implementation in a single step, which is where #338267 might have been throttling

@github-project-automation github-project-automation bot moved this to Done in Stdenv Nov 13, 2025
@github-project-automation github-project-automation bot moved this from New to ✅ Done in CUDA Team Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.status: merge conflict This PR has merge conflicts with the target branch 6.topic: cuda Parallel computing platform and API 6.topic: python Python is a high-level, general-purpose programming language. 6.topic: stdenv Standard environment 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux.

Projects

Status: ✅ Done
Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants