cudaPackages: build redists from manifests and add CUDA 13#437723
cudaPackages: build redists from manifests and add CUDA 13#437723ConnorBaker merged 99 commits intoNixOS:masterfrom
Conversation
b6dcba3 to
c49efbc
Compare
|
I can't seem to build anything in this PR without allowing broken packages. I have |
8569221 to
f0ec09c
Compare
|
What do you mean by "system-wide"? |
f0ec09c to
217ba0f
Compare
|
I've set |
|
Do you have an example flake you can post? Also, have you set With this PR, I can do where {
allowUnfree = true;
cudaCapabilities = [ "8.9" ];
cudaSupport = true;
} |
3175ff8 to
78e5800
Compare
|
I’m worried about the number of non‐trivial breaking changes that seem to have been caused by this PR merged ten days after the freeze, that have led to proposals for further breaking changes like #457120 or last‐minute implementations of complex features in core Nixpkgs components like #456908. I think we should reconsider shipping a change this drastic at this stage in the release unless the regressions can be fixed in a more simple manner. cc @leona-ya @jopejoe1 Edit: Actually #457120 isn’t a breaking change in itself, just a way to work around the |
| allowlist = config.allowlistedLicenses or config.whitelistedLicenses or [ ]; | ||
| blocklist = config.blocklistedLicenses or config.blacklistedLicenses or [ ]; | ||
| allowlist = config.allowlistedLicenses; | ||
| blocklist = config.blocklistedLicenses; |
There was a problem hiding this comment.
This is a breaking change, right? Is there any error message for someone who previously set config.blacklistedLicenses? I don’t mind removing it with a throw, but it seems like this will silently start permitting licences a user was specifically trying to opt out of.
To be honest I am not sure why release managers were not pinged in advance for a 99‐commit rework PR with many breaking changes merged this late after the freeze…
There was a problem hiding this comment.
Yes thats indeed breaking, the more i look at it the more i think we should just revert it, and reintroduce it after branch of.
There was a problem hiding this comment.
reintroduce it after branch off
With an mkRenamedOption for {black,white}listedLicenses
There was a problem hiding this comment.
In hindsight, I should have used mkRenamedOption. I did not do so because they are undocumented attributes.
There was a problem hiding this comment.
Is there any error message for someone who previously set
config.blacklistedLicenses? I don’t mind removing it with a throw, but it seems like this will silently start permitting licences a user was specifically trying to opt out of.
Since this possibly has legal consequences for those using this option, this should be dealt with ASAP.
|
This indeed looks to me like a pr that should not have been merged this late into the release cycle. |
For the rest of the PR, my review is still in progress, but I feel fairly confident there are no other "breakers" or other impact on other parts of Nixpkgs. The CUDA-specific change is somewhat inherently massive, due to schema updates upstream and due to our tech debt. OTOH it was necessitated by the python stuff. Splitting up into smaller PRs would've been possible, but surely at the cost of another burnout. I suggest that we choose from options 1 or 2 |
|
Right now I don't really care any more, sorry for any trouble this caused. In particular I should have been more careful with the changes to @SomeoneSerge you have my blessing to revert or change whatever you see fit. I see fundamental issues with the way several things are architected in Nixpkgs and have been carrying out the fool's errand of trying to fix them in a localized manner to avoid the absurd cost of addressing the root problem. I apologize for the work that created for you. |
|
@ConnorBaker, I haven't been making it easier for you lately either, and didn't keep myself from being annoying in the few reviews that I sent. I can't quite put my finger at why, but I find just not being toxic is already hard enough.
We know, we all do...
If I had to name just one, I couldn't. I hope you can soon care again |
@emilazy In case it helps at all, I've just updated the release note for that PR to suggest a change to users' configuration that allows the same configuration to work in both stable and unstable. For my particular concern, #456994, I don't have any particular preference between backing out the relevant changes from this PR and revisiting them in 26.05, taking #457038 as a safe fix and considering #457120 in 26.05, or taking #457120 now. I'll leave that decision to folks much better versed in the release processes and cycles and preferences! |
|
Collecting everything that was reported as a breaking change or negative impact:
This is a huge PR: 99 commits, 226 files changed, 9k lines changed. It was merged without approvals, while review was still on going. It was merged 10 days after the feature freeze. The main addition is CUDA 13, but the description mentions:
That means that CUDA 13 will possibly stay unusable on the stable release? I guess the CUDA 13 part is not required at this time. Everything else looks mostly like refactoring. I assume the biggest problem down the road will be, that this massive changeset will make backports near impossible, if this thing is only merged after branch-off? This was not given as a reason, though, the reason for merge was:
Frankly, merging this massive thing at once is not iterating. Splitting this up, especially the changes to non-cuda packages (by-name etc.), looking at the impact for much smaller parts of this is the right way to do it, I think. Unfortunately, due to the sheer size and me being entirely unfamiliar with all the CUDA packaging, I can't judge whether it's reasonable to fix this quickly. The sensible option from my perspective is to revert now and reapply after branch-off. The longer we wait with the revert, the more likely we are going to have to deal with conflicts on the revert itself. |
|
@wolfgangwalther I can later prepare a more detailed post-mortem explaining the context behind the PR, and in particular the back'n'forth we've been having in the weekly meetings which complement the review process, but my bottom line is: the process was justified-enough for CUDA-internal changes, the other changes are SciComp leaf packages and the changes to |
|
The @NixOS/cuda-maintainers have decided to revert the changes surrounding
@emilazy @wolfgangwalther @jopejoe1 if you have any questions or concerns about the CUDA packaging, we’re more than happy to schedule time to hear them! |
|
Thank you! |
|
I suspect 7122833 caused this minor eval of an obscure attribute: |
|
I could not find any reference to |
Important
The addition of CUDA 13 does not mean packages will suddenly work with CUDA 13. Expect breakages.
Largely based on work done in https://github.com/ConnorBaker/cuda-packages.
Big changes:
_cuda.manifests_cuda.fixupsinto full package expressions which exist as files, in-tree_cuda.lib.licenses_cuda.lib.allowUnfreeCudaPredicateto use_cuda.lib.licenses_cuda.lib.allowUnfreeCudaPredicateby havingconfigtake an attribute set containingpkgsas an argumentEnablingconfig.cudaSupportautomatically adds NVIDIA's licenses toconfig.allowlistedLicenses, avoiding the need to setconfig.allowUnfreebuildRedisthelper functionoverrideAttrstests.redists-unpackedandtests.redists-installedto verify downloading all sources and unpacking and installing all redistributables work as intended_cuda.lib.{_mkMetaBroken,_mkMetaBadPlatforms}have been updated to usebuiltins.traceVerboseto avoid polluting output when doing whole-tree evaluations (e.g., duringnixpkgs-reviewruns) and now take one argument instead of twocudaPackages.tests.{cmake,cudnn-frontend,onnx-tensorrt}In the long run, I absolutely think that a database-backed approach like #406740 is the right way to handle package creation and generation. Unfortunately, it's not quite ready. In the interim, an expression per-redistributable package enables easily finding the relevant file in-tree (no flattening of deeply nested JSON), consistent package set members across versions (instead of adding attributes on the fly for supported systems), and avoiding runtime generation of package expressions.
Things done
passthru.tests.nixpkgs-reviewon this PR. See nixpkgs-review usage../result/bin/.As always with
nixpkgs-review, part of the problem is the unfree packages aren't built or cached and so they're typically rebuilt. UsingallowUnfreeenables many more packages to be built than just the CUDA packages.Add a 👍 reaction to pull requests you find important.