Allow missing binary caches by default#7188
Conversation
|
Friendly ping. I am constantly seeing this come up, most recently in #7424 (comment). |
|
Oh yes, things like this should not languish. |
|
@edolstra based on our very brief convo this morning, I think there is a misunderstanding. The goal is not to fallback on building from source, but just to try all the substituters before giving up. |
|
Thanks @Ericson2314 :) Just for completeness: From #7424:
If we're ok with the changes in this PR, then rewriting this codepath to check substituters in parallel looks fairly straightforward. I'd be willing to rewrite this PR to include fetching the substituters in parallel if that's desired. Also happy to wait and make that change separately (or at least log it for later), which is probably the better option. |
FWIW the same is true IMHO for remote builders. |
|
@patricksjackson thank you very much for that offer, but I would say let's leave that for a follow-up PR? This is good a a minimal thing to discuss what behavior we want (which I am not yet sure is uncontroversial, in case I am misunderstanding what @edolstra was saying), after which doing things in parallel is mostly an optimization. |
|
You're free to work on PR concurrently if you want, though :) |
|
Cachix was down 10 days ago for 30min, half of the time was due to this not being fixed in Nix yet. Ironically, users of Cachix had the same issue because of which Cachix was down. Hope that helps motivate merging/fixing this finally. |
|
@roberth @Ericson2314 could you cover this one for the next Nix team meeting? |
|
How can there be no test change and no test failure? |
|
It's likely that @patricksjackson is a first time contributor and needs a bit of guidance. |
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-01-02-nix-team-meeting-minutes-20/24403/1 |
|
My understanding of the semantics: Ignoring Backgroundquery path from substitutior results:
TodayThe semantics on Nix Note that this means that if one subsitutor is down ( New semanticsThese are presented in a a more "declarative" and order-independent way. New definitions for querying on sets of substitutors (note that sets are order-independent!)
Note the "any" vs "all" duality.
New policy based on the above defs: |
|
Discussed in Nix team meeting on 2023-01-09: @patricksjackson thanks for your contribution! We agree that we want to fix this, but haven't decided yet what the overall process of picking substituters should look like. Therefore no decision to merge, since we don't know if that would incur changing behavior down the line, or other redundant work. Complete discussion
|
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-01-09-nix-team-meeting-minutes-22/24577/1 |
|
The thing I wrote above is the proposal. We should be able to resume progress on this. |
|
@fricklerhandwerk Can you move this back to "in discussion" since I wrote the above proposal? I no longer have the perms to do so. |
|
Discussed in the Nix team meeting:
|
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-03-06-nix-team-meeting-minutes-38/26056/1 |
|
It should be obvious, but just to be clear about my feelings as the original PR author, and since I wasn't there to discuss things with you: I don't feel any ownership over this change, and I expect it's easier for any one of you to handle this PR yourself anyways. Feel free to steal/close/whatever you want with this PR, since it looks like @Ericson2314 is planning to write the agreed solution. Thanks for thinking about a holistic solution! |
|
@patricksjackson So I was requested to make this proposal by the rest of the Nix Team to help us clarify our own thoughts. It wasn't supposed to be in contrast to the PR -- quite on the contrary it was your PR description that helped me write that post! I think this PR is close to doing that (by "today" I meant not your PR but Nix Sorry again this got stuck in the team backlog for so long before we returned to it. |
|
I'll take a look soon. |
|
Glad to hear it! |
|
Alright, I've now had a chance to come back and reload the context here. I have a couple of questions before I can make progress again: @Ericson2314 First, it's unclear to me if your design proposal is describing how you want this code to work at a high level or if you are describing what you want the actual code flow to look like.
Lastly, as I mentioned in the original comment, I'm unable to find how to test this in the current codebase. There appear to be no unit tests covering these modules, so do you want me to write the unit test infrastructure to test this? There are a few integration tests but even those I'm not sure how to trigger a network error within them, especially to trigger all the paths that this change would use. I suspect I'm missing something here, but I'm not very familiar with the codebase still. |
I said screw it and wrote an MVP for querying them all async, since I had this problem on my mind. It's in a separate branch on my fork: https://github.com/patricksjackson/nix/tree/substitution-set I won't be offended if it's not needed/wanted/feasible, I felt like writing it anyways. |
|
@patricksjackson Sorry I missed this!! :( Feel free -- encoraged -- to ping me on matrix if I am not responding to a PR to which I am assigned.
Just at a high level :). It is supposed to be a specification which admits many compliant implementations.
I believe I have described your intent, but the code doesn't quite do what it says. I think the divergence is just an accident.
Oh that's very cool! I think we can wrap this up, but then try to do that after. How's that sound? |
| return; | ||
| } | ||
| throw; | ||
| tryNext(); |
There was a problem hiding this comment.
So I think what we need to do is stash the error (in a new substitution goal variable) here. And then at the end of the function, we re-throw the error instead of returning if nothing is found.
Does it make sense how that brings us in alignment with the spec I wrote?
| return; | ||
| } | ||
| throw; | ||
| logError(e.info()); |
There was a problem hiding this comment.
ditto, not sure if we should log here, or log at the end if we don't end up re-throwing.
| logError(e.info()); | ||
| else | ||
| throw; | ||
| logError(e.info()); |
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/a-common-public-nix-cache/26998/8 |
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
|
bump |
|
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-07-24-nix-team-meeting-minutes-75/31112/1 |
|
I'm picking up this change at #8983 |
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/announcing-ncps-a-nix-cache-proxy-server-for-faster-builds/58166/10 |
… are still enabled (#13301) ## Motivation Nix currently hard fails if a substituter is inaccessible, even when they are other substituters available, unless `fallback = true`. This breaks nix build, run, shell et al entirely. This would modify the default behaviour so that nix would actually use the other available substituters and not hard error. Here is an example before vs after when using dotenv where I have manually stopped my own cache to trigger this issue, before and after the patch. The initial error is really frustrating because there is other caches available.   ## Context #3514 (comment) is the earliest issue I could find, but there are many duplicates. There is an initial PR at #7188, but this appears to have been abandoned - over 2 years with no activity, then a no comment review in jan. There was a subsequent PR at #8983 but this was closed without merge - over a year without activity. <!-- Non-trivial change: Briefly outline the implementation strategy. --> I have visualised the current and proposed flows. I believe my logic flows line up with what is suggested in #7188 (comment) but correct me if I am wrong. Current behaviour:  Proposed behaviour:  [Charts in lucid](https://lucid.app/lucidchart/1b51b08d-6c4f-40e0-bf54-480df322cccf/view) <!-- Invasive change: Discuss alternative designs or approaches you considered. --> Possible issues to think about: - I could not figure out where the curl error is created... I can't figure out how to swallow it and turn it into a warn or better yet, a debug log. - Unfortunately, in contrast with the previous point, I'm not sure how verbose we want to warns/traces to be - personally I think that the warn that a substituter has been disabled (when it happens) is sufficient, and that the next one is being used, but this is personal preference.
… are still enabled (NixOS#13301) Nix currently hard fails if a substituter is inaccessible, even when they are other substituters available, unless `fallback = true`. This breaks nix build, run, shell et al entirely. This would modify the default behaviour so that nix would actually use the other available substituters and not hard error. Here is an example before vs after when using dotenv where I have manually stopped my own cache to trigger this issue, before and after the patch. The initial error is really frustrating because there is other caches available.   There is an initial PR at NixOS#7188, but this appears to have been abandoned - over 2 years with no activity, then a no comment review in jan. There was a subsequent PR at NixOS#8983 but this was closed without merge - over a year without activity. <!-- Non-trivial change: Briefly outline the implementation strategy. --> I have visualised the current and proposed flows. I believe my logic flows line up with what is suggested in NixOS#7188 (comment) but correct me if I am wrong. Current behaviour:  Proposed behaviour:  [Charts in lucid](https://lucid.app/lucidchart/1b51b08d-6c4f-40e0-bf54-480df322cccf/view) <!-- Invasive change: Discuss alternative designs or approaches you considered. --> Possible issues to think about: - I could not figure out where the curl error is created... I can't figure out how to swallow it and turn it into a warn or better yet, a debug log. - Unfortunately, in contrast with the previous point, I'm not sure how verbose we want to warns/traces to be - personally I think that the warn that a substituter has been disabled (when it happens) is sufficient, and that the next one is being used, but this is personal preference.
|
Closed because replaced by #13301 |
This change allows for individual substituters to be unavailable even when
fallback = false(default). Prior to this change, any unavailable substituters would cause a build failure even if others were reachable. This causes issues when people try to setup occasionally accessible binary caches.The fallback setting states that it allows Nix to fallback to building from source if a binary substitute fails, and this change makes the behavior now match the description. Prior to this change it actually did two things:
Allowing unreachable binary substituters should probably be the expected behavior with or without the
fallbackflag, and is what this change allows.I've tested simple scenarios locally, but it's not clear to me how to write a test for this. I think it needs a network failure to trigger the new paths, and I'm not sure how to set that up in the tests.
Closes #6901, Closes #7127, Closes #4383