Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions rfcs/0089-collect-non-source-package-meta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
feature: collect-non-source-package-meta
start-date: 2021-03-14
author: Robert Scott
co-authors: (find a buddy later to help out with the RFC)
shepherd-team: (names, to be nominated and accepted by RFC steering committee)
shepherd-leader: (name to be appointed by RFC steering committee)
related-issues: (will contain links to implementation PRs)
---

# Summary
[summary]: #summary

Collect and maintain a new `meta` attribute in packages allowing users to easily
identify and manage their preference for binary (more broadly "non-source")
packages.

# Motivation
[motivation]: #motivation

Different users have different expectations from a software distribution. We
acknowledge that much with the collection of license information and the
existence of the `allowUnfree` nixpkgs option, much as Debian maintains a
separate `-nonfree` repository.

Similarly, there are a number of different reasons users may have to disfavour
those packages not built-from-source:

- Transparency: an ever-growing concern with more focus than ever on
supply-chain attacks.
- Malleability: being able to conveniently override packages with patches or an
altered build process is a key advantage of Nix, and for nixpkgs maintainers
it's not generally possible to backport security fixes to binary packages.

For some users, these concerns are enough to deter them from using Nix entirely.

# Detailed design
[design]: #detailed-design

Add a new `meta` attribute to non-source-built packages, `fromSource = false`.

@dotlambda dotlambda Mar 14, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem flexible enough:

  • Along the lines of the unfreeRedistributableFirmware license, there should be a way to label binary packages used in bootstrapping as such. Most users who don't want binary packages will be okay with their use in bootstrapping.
  • Binary fonts (mentioned below) or graphics might also be acceptable.

Do you think allowNonSourcePredicate is enough for the latter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this hints slightly towards making it non-boolean, but if so, what should the options be? I think I considered finer classifications but saw it as something this could evolve into if/once we come to understand the problem better.

@samueldr samueldr Mar 14, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A list of "types"? Because a package might contain different types!

{
  meta.sourceProvenance = with lib.sourceTypes; [
    sourceCode
    sourceFirmware
  ];
}

For an hypothetical driver with a code part, and a binary firmware part.

{
  meta.sourceProvenance = with lib.sourceTypes; [
    sourcePrebuiltFont
  ];
}

For a pre-built font.

(The naming here is clunky, I hope if gets the point across.)


EDIT: as a bonus, collecting what provenances a closure uses is a matter of adding all the lists together, and then getting the unique elements. And it is closer to the mechanisms used for licenses.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a list of types is usually too complex. Hence I suggest usually not using them, even though some packages might require them.
We sometimes have lists of licenses, though without clear semantics. For this attribute, a list should be interpreted as being built from all of those types combined.

I would call the attribute meta.builtFrom.
A normal xyz-bin package would, even if parts of it are built from source, have

meta.builtFrom = lib.builtFrom.binary

A binary font could have

meta.builtFrom = lib.builtFrom.binaryAsset

And a bootstrapping package would have

meta.builtFrom = lib.builtFrom.binaryBootstrap

The default would be

meta.builtFrom = lib.builtFrom.source

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that they are more complex, I don't think it's an issue.

Otherwise we're pushing the complexity into pre-declaring all possible combinations as part of the library representing the different types.

We need granularity here. Otherwise we'll have issues deciding which descriptor to use, and end up using the wrong generic "it's a binary", which will make filtering against or for a specific descriptor needlessly harder.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do something "weird" and have lib.builtFrom.source = ["source"] so that you can easily do one or ++ two together. Or maybe better is lib.buildFrom.source = {source = true;} and then you can // them together but it is a bit easier for a predicate to filter the ones that you are okay with (and removes the irrelevant ordering).

Also I think instead of builtFrom it probably makes sense to call it nonSourceComponents and list the types of things that were not built from source where "types" could be things like code, assets, docs and similar.

I see this is somewhat against the "Why not isBinary?" below but I think it kinda agrees. nonSourceComponents = {} is the "purest" form and well understood. Then you just have to document the deviations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like samueldr's idea here. Another benefit of it would be that I'd be able to say "I don't want proprietary software on my computer, but it's okay if data files are CC-BY-ND" or something.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice, my worry is just what we'd actually end up with given humans being humans, most people probably not that invested in understanding a complex tagging scheme well enough to perfectly represent the situation for a particular package.

This concern probably comes from my background in openstreetmap and the long, heated discussions that take place on the "tagging" mailing list debating a scheme that can perfectly represent most every situation, yet which ends up bearing little relation to what people actually map with in reality, because it's too complex and verbose.

I certainly think it's important to make the common cases have very concise representations, and also allow both coarse and fine granularity. If only fine granularities are allowed, it will deter most people from bothering to add an annotation at all. Coarse data is better than nothing and can always be refined.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @risicle and @dotlambda here, in that I think we should at least start with a rougher schema, and wait until we've seen if we actually need a more refined one. I resonate with the risk @risicle sees, otherwise.

@dotlambda's proposal matches what I would do:

  • If a package has both binary and source it will be marked as binary (i.e. we prefer a rougher schema, and err on the side of overapplying the "binary" label)
  • it special cases bootstrap binaries and fonts
  • Adds an allowNonSourcePredicate

I think this would be a great first step, and it would address the 2 points raised in the RFC:

  • supply chain attacks
  • patching

Other concerns, like wanting to only run free software + special casing some data files based on their license, I would leave out of this RFC, and potentially address in a follow-up one.

Leave other packages as-is with the assumption of a missing attribute meaning
`true`.

Add a mechanism to allow `.nixpkgs/config.nix` to specify
`allowNonSource = false` to prevent use of these packages in a similar manner
to `allowUnfree`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should also be allowNonSourcePredicate.


# Alternatives
[alternatives]: #alternatives

I might have been tempted to collect the inverse, i.e. `isBinary = true` but
this runs into problems with clunky terminology. In my mind, the kind of package
that fails the transparency/malleability tests goes beyond what many people
would argue is "a binary". For instance, many (most?) java packages in nixpkgs
simply pull opaque `.jar`s - if not for their own app, they pull `.jar`
dependencies from maven. These are not transparent or malleable, but it's quite
an obtuse and disputable use of the term "binary" to describe them as such.

I decided that those packages which _did_ pass these transparency/malleability
tests had more in common than those that don't: that they are "from source", a
form where users have as much ability to inspect and alter the result as the
original author did.

There already exists a rather informally-applied convention of adding a `-bin`
suffix to the package names of "binary packages". This is non-ideal because:

- It doesn't allow a user to filter the use of these packages in a better way
than simply not requesting a package with a `-bin` suffix. Binary-package
_dependencies_ of non-`-bin` packages will still be installed regardless.
- It falls into the terminology trap over the term "binary", and if we expanded
the definition of what a "binary" package is, *very many* packages in nixpkgs
would have to be renamed, causing not only visual clutter but possible
breakage and churn.

If we _don't_ do anything about this, then I think we continue to signal to
users who have such concerns over the source of their software that
nixpkgs/NixOS isn't for them. Far from being a concern just for obscure
extremists, most Debian users would probably balk at our appetite for binary
packages.

# Drawbacks
[drawbacks]: #drawbacks

- Some maintainers may be upset by having their packages marked as
`fromSource = false`.
- It could spur us to disappear into endless navel-gazing conversations about
what really counts as "from source" and what doesn't.
- On the other hand, _not_ discussing where the line stands thoroughly enough
could cause the flag to be over-applied and thus become useless. Should we be
compiling all our fonts where e.g. fontforge files are available? If all of
these got marked as `fromSource = false`, all of a sudden users with
`allowNonSource = false` set may end up with no installable desktop.

# Unresolved questions
[unresolved]: #unresolved-questions

Exact attribute names are open for debate.

# Future work
[future]: #future-work

The author is willing to spend a significant amount of time finding and marking
non-source packages in nixpkgs.