Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to make IFD work properly #954

Open
copumpkin opened this issue Jul 1, 2016 · 34 comments
Open

Figure out how to make IFD work properly #954

copumpkin opened this issue Jul 1, 2016 · 34 comments
Assignees

Comments

@copumpkin
Copy link
Member

copumpkin commented Jul 1, 2016

See the whole thread at NixOS/nixpkgs#16130 plus the ensuing IRC discussion.

As a quick recap: I and several other people see IFD as a pretty important prerequisite for Nix's continued growth, but there are a few technical issues standing in its way right now that we don't have clean answers for.

What we'd like to be able to do:

  1. Write a Nix codegen that takes in locked-down inputs from (presumably) a fixed-output derivation
  2. Incorporate the output of that codegen into nixpkgs
  3. Be able to use nixpkgs sensibly in that state, and have it lazily reach out to a cached output of the autogenerated nix expressions with little-to-no degradation of user experience

What hinders it right now:

  1. Restricted mode prevents network access which stops this from working on Hydra today
  2. Full understanding of ramifications of this approach:
    1. Is UI unfriendly when we have to build stdenv or similar during evaluation?
    2. Should channel bundles include subchannels or allow fetching them on the fly?
    3. Is it now harder to understand what a nixpkgs evaluation involves, and whether a rebuild will be necessary? Definitely, but we could write tooling...?
  3. Do substitutes work properly during evaluation?
  4. What happens with multiple levels of evaluation->building->evaluation->building? This is unlikely to be a real scenario but it might happen by accident if someone's codegen depends on someone else's ecosystem.
  5. What happens when a stdenv itself depends on one of these things? The darwin stdenv uses llvm, which depends on python. It currently doesn't use any pythonPackages but who knows what might happen in future?
  6. Policy: when do we want this? When do we not want it? How do we make sure it's used in reasonable ways?
  7. [More things I haven't thought of]
@vcunat
Copy link
Member

vcunat commented Jul 1, 2016

I'd prefer to use substitutes instead of sub-channels, i.e. solve 3. and thereby bypass 2.ii.

Different people will be interested in different autogenerated parts. A least intrusive mode might just abort during evaluation if the autogenerated data isn't present locally (with a useful message), and users could regenerate them explicitly for particular nixpkgs revisions and/or there could be hooks for such to be done during nix-channel --update. Of course, the regeneration would be implemented just as a nix build and typically simply substituted.

For my personal usage, it's acceptable for nix to fetch not-too-big data during evaluation but not to auto-run some complex generator.

@Ericson2314
Copy link
Member

If @shlevy's work is hard to rebase or review, then yes I would do the abort-generate thing in a heartbeat to unstick this.

@shlevy
Copy link
Member

shlevy commented Jul 12, 2016

Note that unless we are OK with pegging a specific version of hackage/npm/whatever (and updating that manually when we want to update it), we'll have to go beyond just improving IFD into abandoning the idea that nix evaluation is deterministic (while keeping nix building deterministic). One possible way to do that is to have some way to specify non-deterministic inputs to the top level eval, and have hydra extract those as inputs and have command line nix fetch the latest and have arbitrary callers able to pass in fixed revs or fetch the latest as they wish.

@shlevy
Copy link
Member

shlevy commented Jul 12, 2016

(I have some ideas on how to do that well, if I get a pre-approval from @edolstra I can get started on that after the perl stuff is done)

@copumpkin
Copy link
Member Author

I was proposing to peg specifically to an exact version, since I don't think we should abandon the determinism (except in my #520 thing which should behave quite differently). My ideal would be as follows:

  1. We lock down whatever minimal inputs are for our expression producer (like that git repo containing the hackage dump, or something similar for other package systems)
  2. We write a process that can take those minimal inputs from step 1 and produce Nix expressions from it
  3. We run the process in step 2 over the fixed-output derivation that produces step 1 to get an exact and cacheable set of expressions
  4. We import the result of step 3, allowing caching to work properly (so not everyone needs to build expressions from scratch) and expose it as a Nix value
  5. We then build the actual packages we care about
  6. ???
  7. Profit?

@copumpkin
Copy link
Member Author

copumpkin commented Jul 12, 2016

To clarify, I'd then see the periodic updates that @peti makes to haskellPackages today as either bumping the fixed-output derivation that produces the input to cabal/hacakge2nix (updating the git rev and sha256) or bumping the source code for cabal/hackage2nix, or possibly just changing some of the overrides. That would switch the massive diffs into tiny diffs that explain in fairly minimal terms what changed.

@peti
Copy link
Member

peti commented Jul 12, 2016

Please note that we have a concrete use-case in Nixpkgs master today that can serve as an example what the problem is that we need to solve. Users can generate Nix expressions for any Haskell package on-the-fly using the callHackage function. For example:

$ nix-shell -p 'haskellPackages.ghcWithPackages (p: [
  (p.callHackage "hsdns" "1.6.1" {})
])' --run "ghc-pkg list hsdns" 
/nix/store/zp1j6fz2nk7g07qvizx6lzym6lnhn7l2-ghc-8.0.1/lib/ghc-8.0.1/package.conf.d
    hsdns-1.6.1

The expression used to build that hsdns library is generated automatically and imported at evaluation time:

$ cat /nix/store/gz6qlvwdhm1b9v64i3xkzhzfd0g6r3qb-cabal2nix-hsdns-1.6.1/default.nix 
{ mkDerivation, adns, base, containers, network, stdenv }:
mkDerivation {
  pname = "hsdns";
  version = "1.6.1";
  sha256 = "64c1475d7625733c9fafe804ae809d459156f6a96a922adf99e5d8e02553c368";
  libraryHaskellDepends = [ base containers network ];
  librarySystemDepends = [ adns ];
  homepage = "https://github.com/peti/hsdns";
  description = "Asynchronous DNS Resolver";
  license = stdenv.lib.licenses.lgpl3;
}

This feature allows to support, basically, all of Hackage without having to check all of Hackage into the Nixpkgs repository. The only drawback is that -- as of now -- Hydra won't build a single binary that depends on this feature.

@PierreR
Copy link

PierreR commented Jul 23, 2016

[aside question]
@peti Do callHackage look at the stack.yaml file of the package ? I have just tried

nix-shell -p 'haskellPackages.ghcWithPackages (p: [ (p.callHackage "language-puppet" "1.3" {}) ])' --run "ghc-pkg list language-puppet"

and get:

Setup: Encountered missing dependencies:
http-client ==0.5.*, servant ==0.8.*, servant-client ==0.8.*

This is correct because these deps are not available in stackage nightly yet but there are in hackage
(PS: building `language-puppet-1.2 works flawlessly which is quite impressive).

@peti
Copy link
Member

peti commented Jul 25, 2016

@PierreR, I'd rather not get into discussions here that are off-topic for the issue since I'm worried it might derail the thread.

@edolstra
Copy link
Member

@peti AFAIK, Hydra will build such packages. The Hydra evaluator does not prevent import-from-derivation, in fact it's used by the RPM/Debian closure generation functions. However, it's probably not a good idea to use this "feature", since the build (including its dependencies) will be done by the evaluator rather than the queue runner.

@Ericson2314
Copy link
Member

Sounds like the next step sending such derivations to the queue runner?

@vcunat
Copy link
Member

vcunat commented Aug 6, 2016

@edolstra: it's been reported NOT to work due to restricted mode: NixOS/nixpkgs#16130 (comment)

@FRidh
Copy link
Member

FRidh commented Aug 30, 2016

The problem I encountered (NixOS/nixpkgs#15480) is that it doesn't allow network access during evaluation, and therefore cannot retrieve the repo with hashes.

@Ericson2314
Copy link
Member

Ericson2314 commented Sep 18, 2016

I'd really like to see this worked out. From reading around the associated issues, the most egregious problem with restricted mode is that while network access is allowed in fixed-output derivations used normally, network access is not allowed in fixed-output derivations being imported, right?

I consider removing this restriction priority number 1 here because there is no impact to purity or other such downside.

@Ericson2314
Copy link
Member

(@peti I fixed the typos if that was whats confusing—sorry there were so many in the first place.)

@peti
Copy link
Member

peti commented Sep 18, 2016

I reacted with confusion because I felt your summary of the situation does not represent very well what's been discussed in the thread.

@Ericson2314
Copy link
Member

@peti Oh! Well I wrote my summery because I didn't see one so far and this was the best I could come up with. Is the restricted-mode issue that prevents callHackage binaries from being built something different?

@timbertson
Copy link
Contributor

Putting in a big 👍 for this. opam2nix is probably not yet in a state to be merged in nixpkgs proper, but this issue is one of my big worries - reworking all of my code to live inside nixpkgs and manage it there instead of in its own tree is a big switch to make, and it's not clear how I'd maintain both going forwards.

I assume any big self-contained work on bulk-importing third party language dependencies could benefit from this approach, instead of having to be managed in-tree.

@michalrus
Copy link
Member

Garbage collections seems to delete import-from-derivation sources. 😕

Are there any workarounds applicable right now or do we have to wait for a fix in Nix?

@shlevy
Copy link
Member

shlevy commented Feb 13, 2018

@michalrus They get gc'd during an evaluation?

@michalrus
Copy link
Member

michalrus commented Feb 13, 2018

@shlevy, no, no. I have a few import (import (import …s in a Haskell project (built by Cabal in a nix-shell) and also, in configuration.nix of that developer notebook:

{
    nix.gc = {
      automatic = true;
      dates = "daily";
      options = "--delete-older-than 30d";
    };
}

After I added this auto GC, each morning, when starting nix-shell for that project, I have to redownload all Nixpkgs versions that it’s pinning (they got GC’d during the night) and some other sources that we add by callCabal2Nix (fetchFromGitHub …).

That nix-shell does create its own GC root, being called like nix-shell --add-root /home/m/thatProject/dist/nix/shell.drv --indirect --pure --run $cabalCommand, which is noticeable, because Nix is only re-downloading sources, and not rebuilding the deps. So the final deps in binary form are indeed cached. I also set gc-keep-outputs = true in /etc/nix/nix.conf.

This feels a bit awful. :C What if I want to go for a semi-vacation to a place where there’s no or very limited Internet, and I forget to turn off that auto GC the day before? 😭

Please, these are real use cases, however grotesque they might seem. 🙏

@shlevy
Copy link
Member

shlevy commented Feb 13, 2018

I see. There's no easy way currently to do this, sadly, you'll need to manually lift up all the imported derivations to add them as roots somewhere. Would be a good feature though.

@michalrus
Copy link
Member

Okay, thank you for this idea. 😢

@Warbo
Copy link
Contributor

Warbo commented Feb 13, 2018

I also make heavy use of import-from-derivation, so this would be very nice to solve/fix (I hadn't noticed this myself since I garbage collect so infrequently).

There's no easy way currently to do this, sadly, you'll need to manually lift up all the imported derivations to add them as roots somewhere. Would be a good feature though.

It seems like there are two conflicting desires here:

Importing from a derivation lets us avoid depending on the expression used to define something

For example, we can use import (runCommand ...) to generate a .nix file and import it. The resulting Nix value will not depend on that runCommand invocation, i.e. it will not affect any subsequent hashes. This is useful if multiple runCommand derivations produce the same output.

I personally use this to check for the latest commit in a git repo: I put builtins.currentTime in the environment of the runCommand derivation so that a new derivation get built each time. Of course, most of the time there have been no new commits, so the generated .nix file will stay the same, and any derivations which use that git repo will have the same hash as before and be taken from the cache.

If imported values did depend on the derivations that generated them, then my projects would keep getting rebuilt all the time, despite having no changes, due to the currentTime propagating through the hashes.

Importing from a derivation should cause the generated files (and their dependents) to last as long as dependencies do, in terms of garbage collection

This is the issue @michalrus has just described. Note that there are two parts to consider: surviving garbage collection when a resulting derivation is referenced by a GC root; and getting garbage collected when those derivations aren't being kept any more.

I don't know too much about the GC mechanism, but I wouldn't want old derivations to build up in the store due to "stale" auto-generated GC roots keeping them alive.

For example, if I my user profile depends on a package built using this latest-git-revision function, then I update to a new profile and GC the old one, I would want that old .nix revision and its dependencies (e.g. old versions of git, etc.) to get GCed at the same time.

@michalrus
Copy link
Member

michalrus commented Feb 13, 2018

@Warbo ++

@shlevy, it also seems that neither overrideCabal nor callCabal2nix cause their inputs to survive GC. E.g. if I have:

haskellPackages.override {
    overrides = self: super: {
      cryptonite = haskell.lib.overrideCabal (super.cryptonite) ();
    };
}

… then super.cryptonite will be GC’d.

Similarly for:

haskellPackages.override {
    overrides = self: super: {
      steeloverseer = self.callCabal2nix "steeloverseer" sources.steeloverseer {};
    };
}

… here, after GC, sources.steeloverseer won’t be re-downloaded (because I add everything in sources.* to a GC root), but all build deps of SteelOverseer will be re-downloaded. 😭

@copumpkin
Copy link
Member Author

Perhaps IFD could "infect" .drvs derived from them, without changing the output hashes, somehow?

@shlevy
Copy link
Member

shlevy commented Feb 13, 2018

#1052 ;)

@copumpkin
Copy link
Member Author

@shlevy I think that's a slightly different sort of infection that I also wouldn't mind contracting 😄

@shlevy
Copy link
Member

shlevy commented Feb 13, 2018

But the same mechanism could be used.

@Warbo
Copy link
Contributor

Warbo commented Feb 13, 2018

I like the look of 1052. Note that it can be useful to generate-and-import more than just derivations, e.g. we might generate an attrset of names/hashes from some other tool (cabal, npm, etc.), or we might have a script which checks for some condition and outputs a boolean; etc.

The ability to "poison" arbitrary values would presumably require trickier changes in Nix. Maybe an easier approach would be to only support attrsets: this would include derivations "for free", and for other values we can just wrap them up, e.g. to get a boolean we could wrap it up as { my-condition = true; } and the import might "poison" this set to get something like { my-condition = true; auto-generated-poison-attrs = { imported-from-dependencies = [ /nix/store/... ]; ... }; }

@srghma
Copy link

srghma commented Dec 23, 2018

Re

Importing from a derivation should cause the generated files (and their dependents) to last as long as dependencies do, in terms of garbage collection

here is the fix for this problem

{ pkgs, ... }:

{
  # This function makes a copy of package and adds `.nix_runtime_deps_references` file containing links to other packages

  # This function can be used to prevent garbage-collection of packages that were generated/downloaded during import from derivation (IFD) and make them last as long as the imported package do.

  # Check https://github.com/NixOS/nix/issues/954#issuecomment-365281661 for more.
  # Check https://stackoverflow.com/questions/34769296/build-versus-runtime-dependencies-in-nix how runtime dependencies work.

  # Example:

  # {
  #   environment.systemPackages =
  #     let
  #       packageSrc = fetchFromGitHub { owner = ..., .... };
  #       package = import "${packageSrc}/release.nix";
  #     in
  #       [
  #         package
  #       ]
  # }

  # Here the `package` will not be garbage-collected on next `sudo nix-collect-garbage -d`, but `packageSrc` do

  # But with

  # {
  #   environment.systemPackages =
  #     let
  #       packageSrc = fetchFromGitHub { owner = ..., .... };
  #       package = import "${packageSrc}/release.nix";
  #       imrovedPackage = addAsRuntimeDeps [packageSrc] package;
  #     in
  #       [
  #         imrovedPackage
  #       ]
  # }

  # neither `package`, not `packageSrc` will not be garbage-collected

  # TODO:
  # make addAsRuntimeDeps composable with itself by appending links if drv already contains nix_runtime_deps_references
  # E.g. addAsRuntimeDeps [src2] (addAsRuntimeDeps [src1] drv)

  addAsRuntimeDeps = deps: drv:
    let
      fileWithLinks = pkgs.writeText "fileWithLinks" (
        pkgs.lib.concatMapStringsSep "\n" toString deps + "\n"
      );

      drvName = drv: builtins.unsafeDiscardStringContext (pkgs.lib.substring 33 (pkgs.lib.stringLength (builtins.baseNameOf drv)) (builtins.baseNameOf drv));
    in
      pkgs.runCommand (drvName drv) { } ''
        ${pkgs.coreutils}/bin/mkdir -p $out
        ${pkgs.coreutils}/bin/cp ${fileWithLinks} $out/.nix_runtime_deps_references
        ${pkgs.rsync}/bin/rsync -a ${drv}/ $out/
      '';
}

Usage example

# it's utils, not lib, because nixpkgs lib doesn't depend on pkgs

pkgs: pkgsOld:

let
  callUtil = file: import file { inherit pkgs; };
in

(callUtil ./addAsRuntimeDeps.nix)
  nixpkgs     = {
    overlays = [
      (import ../utils/overlay.nix)
    ];
  };

@stale
Copy link

stale bot commented Feb 15, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 15, 2021
@stale
Copy link

stale bot commented May 1, 2022

I closed this issue due to inactivity. → More info

@Ericson2314
Copy link
Member

Given the emphasis of the original issue on Nixpkgs, I think it is worth mentioning NixOS/rfcs#109

@stale stale bot removed the stale label Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests