Fix three zero days #13

oxij · 2018-10-02T05:08:46Z

... and I think I know about the fourth one, similar to the third, but I'm too tired for now already.

First 5 commits are more or less cleanup. Then there are 3 pairs with each pair first adding a failing test that should have worked and then the next commit fixing git-signatures so that it would actually work.

I also strongly suggest:

rewriting this for bash -e, I frequently find errors this way,
using four spaces for indent :)

lrvick · 2018-10-02T08:45:37Z

Looks solid at first glance. Will look over in more detail tomorrow.

Also:

using four spaces for indent :)

Tab indentation is actually "standard" in bash because of heredocs. Indented heredocs are only valid with leading tabs, but not leading spaces. Switching between tabs and spaces mid code is just a mess, so I just resolve that bash is like Go and mandates tabs. In python I use 4 spaces though because PEP8 ;)

rewriting this for bash -e, I frequently find errors this way,

I would welcome a PR for that after we get this and the patch-id PR merged

oxij · 2018-10-02T21:41:54Z

I got a bit tired regenerating keys by hand while continuing this branch. Added 0f816f8 in the middle of history here and changed later commits to respect it.

> rewriting this for bash -e, I frequently find errors this way, I would welcome a PR for that after we get this and the patch-id PR merged

Ok.

> using four spaces for indent :) Tab indentation is actually "standard" in bash because of heredocs. Indented heredocs are only valid with leading tabs, but not leading spaces. Switching between tabs and spaces mid code is just a mess, so I just resolve that bash is like Go and mandates tabs. In python I use 4 spaces though because PEP8 ;)

I'd prefer `usage` with heredocs being unindented than tabs everywhere, tbh.

oxij · 2018-10-02T22:00:34Z

Added another test that shows that my fourth zero day didn't work out. Give this another 24h before merging just in case but I think this is now LGTM.

oxij · 2018-10-03T18:35:52Z

Refactored tests, rebased to make the order beautiful, added two more failing tests and a fix. Judging by GPG's docs there are two more non-tested cases: BADSIG and EXPSIG (EXPKEYSIG is tested now). EXPSIG I'll manage later, but I tried and failed making it to produce BADSIG status no matter what I do. (I tried to break a signature in a PGP packet, maybe trying to make a fake key would be easier, but I have no idea how to generate those cheaply enough for tests).

lrvick · 2018-10-03T21:40:30Z

@oxij So a lot of work was going on in the sign-patch-id branch which just merged. Most of it should not touch your work, but can you rebase?

lrvick · 2018-10-03T21:41:55Z

Also most of us involved in the project are on irc.hashbang.sh/#! 6697 if you want to keep in sync at any point.

lrvick · 2018-10-03T21:51:18Z

I got a bit tired regenerating keys by hand while continuing this branch. Added 0f816f8 in the middle of history here and changed later commits to respect it.

I actually went this route initially, but realized it makes most CI systems hang until timeout as they become entropy starved. If you want to go this route that is totally cool, but you may have to update the CI jobs to setup haveged or something to keep the entropy pool full, or stub in something to spam to /dev/random to fill the pool before generating keys.

lrvick · 2018-10-03T21:44:42Z

Makefile

 prefix=$$HOME/.local
 bindir=$(prefix)/bin

 all: test

 test:
-	bats test/test.bats
+	make -C test


What is the reasoning for breaking out test to have its own makefile?

oxij · 2018-10-03T23:43:10Z

What is the reasoning for breaking out test to have its own makefile?

Tradition, simplicity, I guess. I made the tests directory mostly self-sufficient and it became easier to add a Makefile there than `cd tests` everywhere else.

CI and randomness

Yeah, I'll add writes to /dev/random after I hit the issue, works for me now.

So a lot of work was going on in the sign-patch-id branch which just merged. Most of it should not touch your work, but can you rebase?

I understand that this now got awkward, and all, but I read the changes and I disagree with that direction. I just fixed a bunch of security issues here (the replay and "silly names" tests were the funniest) and that branch introduced yet another one by allowing out-of-order application of patches =/. As I pointed out (albeit in question form) in NixOS rfcs thread, IMO patch-id signing should, if at all, be implemented on top of revhash signing in opt-in manner (that is, revhash signature should permit reapplying it to a patch with the same patch-id), not vice versa. I want revhash-based semantics 99.999% of the time and I don't see any elegant solutions for going from patch-id-based system to revhash based system. Hence, I disagree. I also don't like the "JSON" direction you outlined in NixOS thread. I think that JSON is cancer and that signed message format should be fixed in stone, and changes to it should be versioned on per-notes-branch basis, not per-revhash basis. Again, for security reasons. I'm sorry, but I'm very opinionated on such topics and I can't make myself see any point in rebasing this on top of current master now. I'm going to continue by myself, let the community judge the results, I guess. However, feel free to take anything you like from here, and, later, from the rest of my fork.

IRC

Believe me, you don't want me on your IRC. :)

RyanSquared · 2018-10-04T01:26:14Z

Believe me, you don't want me on your IRC. :)

We've seen worse. Feel free to join us: ircs://irc.hashbang.sh:6697/#!,#!social - TLS required.

lrvick · 2018-10-04T01:42:40Z

@oxij I don't think forking is required, or that JSON is required. If there is another simple to parse key/value format you have in mind, let's hear it. I would like to see an alternative suggestion rather than just declaring the current suggested format cancer :)

I also agree that we need revhash on top of patch-id. I was going to address that in my next revision, but wanted to get patch-id in first now that we proves that out as a standalone solution. It also has very clear security issues (can merge things out of order) thus me wanting to get your changes in that don't pertain to the actual signature format, then we can address that in a follow up PR where we have a key value map that we sign that contains the master head ref, the patch-id, and the current ref, with room to easily add more future key/value pairs for some future use cases that are out of scope atm.

I really value your input here. If you can elaborate on a specific attack, then lets solve for it. Otherwise we are going to end up with 2 basically identical projects with slightly different signature formats, which is still.

lrvick · 2018-10-04T02:20:45Z

To elaborate further, lets say we sign and base64 key/value map that contains:

patch_id: patch-id
head_hash: HEAD hash of this branch
target_hash: HEAD hash of the branch we want to merge on top of

This map could be msgpack, json, some tupl thing. I don't actually care that much as long as it is easy to parse in a wide range of programming languages.

Wiith that assumption we end up with multiple possible verification modes with different tradeoffs, and I think we can easily support all of them in future git signatures verify flags if the above set of 3 items are signed all the time.

Commit Ref

Verify every commit ref is signed by m-of-n signers

Pros

Most simple/secure and where we started

Cons

Rebases invalidate signatures
Merge commits invalidate signatures

Patch-id

Verify the patch-id for a given set of changes is signed

Pros

Signatures can survive a rebase/squash/merge as long as no LOC changes happen

Cons

security issue: patches could be merged out of order or summoned from the past

Patch-id + master HEAD

Verify the patch-id is signed -and- that it was merged on the designated master HEAD

Pros

Signatures can survive a rebase/squash/merge as long as no LOC changes happen

Cons

Only one set of changes can be signed at a time. Once one merges the others are invalidated.

Hybrid Verification

Reviewers sign the lastest LOC change
Reviewer signatures are only verified in patch-id mode
CI system signatures are created certifying tests passed etc
CI system only has -read- access to repo
CI automatically re-signs afer all cahnges if tests pass etc
CI automatically re-signs if the master HEAD ref changes
CI signatures are verified in patch-id + master head mode

Pros

Signatures can survive a rebase/squash/merge as long as no LOC changes happen
Redordering attack only possible if CI signing key -and- write access to repo are obtained

Cons

More complex to implement and setup initially

Ekleog · 2018-10-04T02:31:39Z

@oxij Just in case because I was the one who pointed out JSON, using the JSON format for serialization doesn't mean the format cannot be fixed in stone. It just means we can (/must) use a JSON parser for handling it, in exchange for a few bytes of storage (basically, the {} and "", which are the only superfluous characters here). Actually, I think base64 storage would lose much more storage space than JSON, so even the storage argument isn't that noticeable (but if comparing with eg. key1:value1;key2:value2 we could indeed optimize for space)

So it's mostly a “do we want to be able to use a JSON library in exchange for being more or less forced to use a JSON library?” question. I personally think that if deciding that the format must be an object with only string values without escapes and no superfluous spaces, then it's not harder to parse than git objects: if I can allow myself to mix syntaxes for hopeful clarity, it gives this parser:

(\x -> x[1..-1]) |> split ',' |> map (split ':' |> map (\x -> x[1..-1]))

So to sum up, my comparison between base64-ed git objects (hereafter b64go) vs. restricted JSON (hereafter rjson) formats yields:

b64go needs a library for sane base64 decoding, rjson can be decoded with the same amount of processing than after base64-decoding of b64go without any library (well, yeah, with adding the two \x -> x[1..-1], I deem this cost negligible though)
rjson can be (encoded, decoded) easily with a library, b64go cannot (still require parsing git objects by hand)
rjson takes less storage space than b64go

All in all, I don't see a single reason to prefer b64go over rjson, apart from the fact that I agree it's very annoying that people use JSON everywhere without thinking of it. Do I miss a point? :)

(sorry, I managed to restrict myself on the RFC thread, but it looks like it actually got out here… well, at least here it's somehow on-topic :))

Ekleog · 2018-10-04T02:48:27Z

@lrvick Patch-id, in my opinion, brings much more complexity (and thus occasions for vulnerabilities) for almost no benefit. From your list:

The commit-ref is already possible without it
The patch-id is flawed as you yourself mention, so unusable for any security purpose, and I can't see any point that's not security-related
The patch-id + master HEAD doesn't bring anything over commit-ref : if you just consider people will sign after merging (and not before), then all that is handled by patch-id + master HEAD is also handled by commit-ref
The hybrid verification system is the only case where there is a marginal benefit from using patch-id + master HEAD

I personally think that supporting a highly counter-intuitive signature scheme is not worth the minor benefit from hybrid verification systems, as anyway it's not more secure than commit-ref (actually, it's less) and the convenience advantages are limited (anyway, a committer will likely merge around the same time as they sign, and multiple-signature schemes need to be handled at the fetching end anyway so there's no TOCTOU)

Ekleog · 2018-10-04T02:55:24Z

Actually, an additional concern about hybrid signing mode: it can be used by a maintainer to put the blame on another, by using TOCTOU.

Example:

Maintainer A signs patch A for master, that adds a free
Maintainer B notices and sees patch A isn't merged yet. Maintainer B signs patch B for master, prepones signature to earlier than patch A, and patch B adds another free at a place that doesn't conflict
Maintainer B tricks CI into accepting their patch before maintainer A's
Now there's a double-free and maintainer A's patch introduced it

So even it is not secure, so I cannot see a single advantage in using patch-id signing.

oxij · 2018-10-04T03:33:44Z

@lrvick @Ekleog

JSON

I'm not like super-biased against JSON, I am a bit biased. But my thinking here goes like this: when in Rome do like Romans. git objects are "<type> <length>\0" header followed by "[<key> <value>]*" fields. I see no reason to use something different here (except "<type> <length>\0" is not needed as the first is obvious and the second is provided by the PGP packet), parsing those in bash is easy enough. @lrvick

patch-id

@Ekleog pretty much wrote my thoughts on the subject. - Counter-intuitive for git users. - Simple stuff gets complicated. Your scheme with two more hashes feels like you made git-signatures for darcs VCS and now trying to adapt it to git. Ugly. - Computation of patch-ids themselves is complicated, merged PR itself changed diff-tree options several times. - I see no good uses for them. Signed patch-id literally has the following semantics: "I sign any repository state that can be made with this applied". I.e. an infinite number of repository states. This is ridiculous for any sufficiently large code base. - It interacts badly with lots of other stuff. E.g., I can make a commit with renames, pick a patch-id-signed commit from any other place in history, revert the renames. I just applied your change with your signature to a random similar-enough file. Fill in a bunch of out-of-context commits in between and you got a nice way to introduce backdoors, all you need to do is find a series of patches in history that can be chained to make a change you want and make maintainers sign some trivial renames. In a long enough history like nixpkgs you could then find a series of changes for anything. I'm sure there are more examples like this. If you need that feature, it's much simpler to just make a separate signature type for it. But I don't think I would ever need it. Backports need to be checked and signed independently anyway.

daurnimator · 2018-10-04T03:37:09Z

@oxij we had a chat on IRC: I think @Ekleog and I can agree that at least a minimum that we sign the tree object. I think @lrvick is now on board with this.
i.e. signing patch ids is no longer to be a thing.

Whether the parents should be signed is what myself and @Ekleog disagree on: we could make that configurable.

Need `bash -e` ASAP.

Also produce cleaner status codes.

This only adds overhead as git compresses its objects anyway.

Ekleog · 2018-10-04T17:43:14Z

So I guess we now only have to agree on a data format, so that all our tools end up being interoperable :)

First, the non-controversial parts: what fields should we include? I can think of (also taking ideas from crev, which has the difference that it's attempting to review the whole state of the repo instead of split it by commit):

commit-id
tree-id (optional)
date (optional)
serial number (required if no date)
review comment (optional, multiline UTF-8)
review result (+, -, neutral -- equivalent to “trust” from crev)
review thoroughness (0-100% ? low-medium-high ? -- indicates understanding of the diff)
context understanding (0-100% ? low-medium-high ? -- indicates understanding of the context of the diff)

I wonder whether we should have some default values for review thoroughness and context understanding, or whether it's reasonable to force them being present in the signed blob.

And the second likely non-controversial choice: where should we put the signatures? I would say each signature on one line, sorted in a file named commit-id (potentially copied to tree-id if defined) in a refs/signatures branch (ie. like git-notes, as I understand it)

Once we get that agreed, we will be able to reasonably go to the question of rfc822 vs git-object-like vs JSON, which will be very happy I'm sure :D

oxij · 2018-10-05T00:33:11Z

So I guess we now only have to agree on a data format, so that all our tools end up being interoperable :)

Yes. So, I've read crev source and I agree with their idea that trust level in the key and trust level in the judgment are different things. So, I guess, we also need that. The simplest solution I see is just use two gpg trustdbs: one for "keys trust" and a second one for "judgment trust". I think there should be no arguments against using `--export-ownertrust` format for that, right? We would also need to agree on `git config` options for those `--trustdb-name` and related things. So to verify a repo in nixpkgs case you would do something like ``` cd nixpkgs-keys for key in keys/*.key; do gpg --import "$key" done # trust in keys belonging to whom they claim cat bootstrap | sed 's/$/:6:/' | gpg --trustdb-name nixpkgs-keys --import-ownertrust # trust in those people understanding what they are doing cat bootstrap | sed 's/$/:6:/' | gpg --trustdb-name nixpkgs-judgments --import-ownertrust cat trusted-contributors | sed 's/$/:6:/' | gpg --trustdb-name nixpkgs-keys --import-ownertrust cat trusted-contributors | sed 's/$/:4:/' | gpg --trustdb-name nixpkgs-judgments --import-ownertrust # and etc ... cd nixpkgs # wotr for "WoT reviews", subject to change, but I like it git config wotr.keys-trustdb nixpkgs-keys git config wotr.judgments-trustdb nixpkgs-judgments git config wotr.trust-model pgp # the default, can also be tofu or whatever git config wotr.completes-needed 2 git config wotr.marginals-needed 4 ``` Hydra only does the above, users may add more keys and more/less trust as they wish.

First, the non-controversial parts: what fields should we include?

Btw, `crev-data/src/proof/review/mod.rs` has "thoroughness", "understanding", "trust", and "distrust". I'm like "WTF"? The "distrust" is clearly an artifact of their "low", "medium", "high" enum, but what does "trust" in my own review even mean? I disagree with that terminology. I agree, however, that splitting my "level" into "thoroughness" and "understanding" might be useful, though. So, here comes a draft of my design RFC:

---- BEGIN RFC ---- # Definitions and data formats The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 when, and only when, they appear in all capitals, as shown here. The plain-text data thing in yet undecided format described below is a "review", review is signed by gpg and turned into a "packet" (PGP packet of the review and signature together), that is then base64 encoded and is appended to "packet-list", which is stored in a separate git branch called "review-branch" indexed by commit-id (before patch-id changes were merged, currently managed by git-notes, but that might change). All date values are UNIX timestamps. # Verification Process Commits in review-branch MUST be signed, ALL commits MUST be validated before interpreting, invalid signature on review-branch commit MUST cause a critical error. Commits in review-branch MUST monotonically increase their "author" and "commiter" dates by alternating the two types. I.e. the following MUST be observed: 0 <= commit_1_author_date <= commit_1_commiter_date <= commit2_author_date <= commit2_commiter_date <= ... Any other sequence MUST cause a critical error. Packets MUST always appended to the end of the packet-list, insertion anywhere except at the end of the packet-list MUST cause a critical error. Packets with signatures by the same key in review-list MUST monotonically increase their packet's "date" field, failure to observe that MUST cause a critical error. The "packet's-commit-date" is the earliest of "commiter" or "author" dates of the commit introducing the given packet into the review-branch. Packet's "date" field MUST be <= than its "packet's-commit-date", failure to observe that MUST cause a critical error. Note that the above still allows packet's "date" field to be interpreted as a "serial-number" as long as the packet itself was not signed with "expire date" field set, which is discussed below. When interpreting packet-list only the packet with largest "date" field (serial-number) MUST be considered for each signing key, all other packets for the same key MUST be ignored. When verifying a packet "GOODSIG" gpg status means the packet SHOULD be accepted as authoritative, "EXPKEYSIG" is explained below, "EXPSIG", "REVKEYSIG" and "ERRSIG with NO_PUBKEY" means the packet MUST be ignored, all other statuses, including "BADSIG" and "ERRSIG without NO_PUBKEY", MUST cause a critical error (can be made into a warning, in case gpg guys break something, I guess). "REVKEYSIG" MAY cause a warning. Note that by using "expire date" in the packet and "EXPSIG" gpg status code you can have reviews that automatically rescind themselves. I doubt this would be used much, but it can be useful in some use cases, no need to prohibit that, I think. "EXPKEYSIG" gpg status code means the key's own signature has expired, it SHOULD be interpreted as "GOODSIG" if packet's-commit-date is <= than key's expiration date, MUST be interpreted as "ERRSIG with NO_PUBKEY" otherwise. To reiterate, the following (among other things, i.e. not "iff") MUST be observed for packet to be accepted as authoritative: packet's "date" field <= packet's-commit-date <= author's key own "expire date" Failure to observe the first comparison causes a critical error, failure of the second only causes the packet to be ignored. (Because expired key can be extended after the fact by re-self-signing.) Failure to interpret the contents of "GOODSIG" review MUST cause a warning and MUST cause such a review to be ignored. Side note: If we allow a review encoded in the packet to have its own "date" field (which I don't like for the following), it MUST be <= packet's-commit-date and >= packet's "date" field, failure to observe that MUST cause a critical error. I.e. the following MUST be observed: packet's "date" field <= review's "date" field <= packet's-commit-date <= author's key own "expire date" The problem with allowing that is you would have to interpret non-authoritative packets too to keep the above semantics of "everything that looks like MITM causes a critical error so that everyone would immediately notice". Hence, I'd rather not have these dates there. ## Review format (given in git-object-like form) Things after "#" and whitespace before are not in the grammar (comments) ``` <header> <metadata fields> <comment> ``` Should be read as "By signing this text I certify that I validated the following ...". ### Header Either ``` commit <commit-id> diff [with <commit-id> diff with <commit-id> diff with <commit-id> diff] ``` meaning that the reviewer certifies the "diff" of this "commit" when applied to the parent of the first "<commit-id>" together "with" commit "diff"s of the later "commit-id"s or ``` commit <commit-id> state [of <filepath relative to repository root> of <filepath> ... of <filepath>] # empty list means "the whole thing" ``` meaning that the reviewer certifies the "state" "of" the following "filepaths" in "<commit-id>" tree. Can be similarly extended for `tree <tree-id> diff`, `tree <tree-id> state`, `patch <patch-id> diff`. As you see, the difference between "diff" and "patch" is intentional here, "diff" is "change" in the abstract sense (I can agree to replace "diff" with "change" even), while "patch" is "exactly this patch file". ## Metadata fields ``` # "while having archived the following" when reading "... I certify ..." [context-understanding (medium|high|author)] # low by default, indicates understanding of the context of the diff [diff-understanding (medium|high)] # low by default, indicates understanding of the diff itself [thoroughness (medium|high)] # low by default, indicates the effort spent checking it does what it claims to do # "with the following" when reading "... I certify ..." [result (!|-|+)] # 0 by default, this rough equivalent of "trust" of crev [result-otherwise (!|-|+)] # "!" by default with header conditions, "0" without header conditions, this line MUST not be present without header conditions, this is the "result" when not all header conditions are met # "and I think this should be assigned the following" [priority (medium|high|panic)] # low by default, doesn't influence anything, should be used for grabbing attention of other reviewers ``` "context-understanding": - "low" = "I might have seen this subsystem before.", - "medium" = "I looked into this subsystem intentionally and maybe made little changes there before.", - "high" = "I made big changes there before.", - "author" = "I wrote a good chunk of this subsystem myself, I know it in and out". "diff-understanding": self-explanatory. "thoroughness": - "low" = "glanced over", - "medium" = "read, evaluated/tested", - "high" = "spent a bunch of time playing with it". "result": - "!" = "has a security issue!", - "-" = "I disapprove", - "0" = "FYI", - "+" = "I approve". "priority": self-explanatory. ## Examples "Nothing looks bad, but don't trust me" would be ``` <header> #context-understanding low #diff-understanding low #thoroughness low result + LGTM. ``` Drive-by LGTM in a system you know well would be ``` <header> context-understanding high diff-understanding high #thoroughness low result + LGTM. ``` Thorough LGTM in a your own subsystem would be ``` <header> context-understanding author diff-understanding high thoroughness high result + Running on top of this for a month. ``` "Nothing looks bad, but please can someone else review this" would be ``` <header> #context-understanding low #diff-understanding low #thoroughness low #result 0 priority high Please, can somebody else review this? ``` "This looks very bad!" could be ``` <header> #context-understanding low #or something else #diff-understanding low #or something else #thoroughness low result ! priority panic Backdoor? ``` "This looks a bit bad" could be ``` <header> #context-understanding low #or something else #diff-understanding low #or something else #thoroughness low result ! #priority low Combined with X can lead to a low-impact security issue Y. ``` "WTF is this? This should be reconsidered!" could be ``` <header> context-understanding high diff-understanding medium #thoroughness low result - priority panic Please, revert! I think this will break X, which would be very high-impact! ``` CVE: ``` commit <commit-id> diff with <commit-id with a fix> context-understanding high diff-understanding high thoroughness high result + # if "with" commit is applied #result-otherwise ! priority panic This commit has a very high-impact open CVE! Fixed in <commit-id with a fix>, apply that immediately! ``` "I disapprove, here is a fix in my repo": ``` commit <commit-id> diff with <commit-id with a fix> context-understanding high diff-understanding high thoroughness high result + result-otherwise - priority high I think this should be reverted and done via another approach. See <commit-id with a fix> in my repo. ``` ---- END RFC ----

I wonder whether we should have some default values for review thoroughness and context understanding, or whether it's reasonable to force them being present in the signed blob.

I opted in for "low". But this can be reconsidered, indeed.

Ekleog · 2018-10-05T05:08:46Z

Whoops, looks like message went out too early… please disregard previous message if you received it by email. Comments about the RFC coming in later when I've finished writing them.

The simplest solution I see is just use two gpg trustdbs: one for "keys trust" and a second one for "judgment trust".

Hmm… Actually, I'm not really sure we need keys trust? If judgement trust is associated to a public key anyway, then there have been manual vetting of said public key (to attribute it a judgement trust), so we wouldn't need to also verify the key (actually, I may decide to trust your public key if I see you doing good reviews, without ever having checked your identity and thus not having signed your key, and even less given it ownertrust-in-the-WoT-sense)

I think there should be no arguments against using --export-ownertrust format for that, right?

Actually… GnuPG is not the only OpenPGP client, and sequoia-pgp is getting closer to being a usable OpenPGP client that may end up sucking less, so I'm not really sure about that.

Also, I wonder: anyway we won't be able to re-use GnuPG's WoT computation code for signatures… would we? If we won't, then why not just store them in a fingerprint:judgement trust: format? This way it's both trivial to parse and yet be trivial to import into an ownertrust db if need be (we could also decide to remove the trailing :, potentially)

wotr for "WoT reviews"

I first read w-OTR (off-the-record)… though I don't have any better idea, apart from just “wot” which would be potentially ambiguous.

lrvick · 2018-10-05T06:00:03Z

So while I don't intend on implementing this for a few weeks or want to further derail things, I want to point out that I do have one strong use case for hierarchical data: projects that yeild compiled artifact or artifacts.

I would want to support an optional flag to include a hash map of a directory full of output artifacts.

If multiple CI systems or reviewers agrees on the output binary hashes we now can attest 2 things strongly:

this code builds deterministically
these binaries came from this code

Those signatures could be detached with the repo and exported to allow the binaries to be verified independantly downstream and anchored back to the repo ref they were compiled from.

I experimented with the following format today as a single json line incorperating ideas from @oxij with mine:

{
  "body":{
    "commit":"d27e15a09812241167a61c3222cb6e97c5b0d6b0",
    "tree":"58215f173d6ef6892a07c9073d55145f387d1968",
    "patch":"068a37e54586de8339f13ec980d31f6c30b6f6e7",
    "date": "2018-10-05T03:01:46Z",
    "review": {
      "context-understanding": "author",
      "diff-understanding": "high",
      "thoroughness":"medium",
      "result":"+"
    },
    "artifacts": {
      "out.tar.gz":"2ad6f470b5398251018a4c25f5eb35686481681f"
    }
  },
"sig":"iQJFBAABCAAvFiEEZ1U/vaRrtxq9LgsLjkeh7DWhVR0FAlu2h3IRHGxhbmNlQGxydmljay5uZXQACgkQjkeh7DWhVR0K9Q//T+UleZFWlbiIFoUBp1hX0xzg/eEaTFnnlCdWWaa5f1E+TI80Pg/wXhpEUd98ghZThXwaWJwnPp34BpZ55GU5Qr1cXjY00Yt7I0p3a5TUsdk5FP8tiUWNQ9kA6npUOIVixa8HFkhz0SvHvXNAvWsasqw6nb+SNfA+aQYIP/wAY71ZpMLkJsMQssaFWsjL/hsSmOpi7VyLEFEuUmEM1CA6VPiMvp4rvP75Bt4DnEouyFPxk4WbVIqX4DIeG8+5W8jQCLlxV6HDH4g+POdZbD7/Yg6XjDKEfD2vM/OYrBczPZX0Tu1WhVyYOtQ/WsAr9wzZjgl50/9UkP6E4pt9ft42jdNhGcO1sEwzlC5W1efC2izjL4medEiLe00zIM0L42X10HebH22j4gKyb4EErSHo0H5eLisf1xAwAHiOE2wJou1SR5dVmKydXicVN3wuLqhzc52awmoOaEts1Q3zFyB8MfDIi39CACwAvOTrKw54raoGgzxDEFEP3sKWF1FuZQJreG1Ufg57Oxv5f1lOqvaoMN7ZNjcQkPUAqj7G7e/fwyXGCG2mvnz++D5/2JAMeeO4k8iwxGXEsF0qD3QveEZPFoIpPZX+uOGqM9iIPDJd9bGu0xI5yWeqLpbdXdm4ClsiRWOmavUvjm8RK99o7I9EvctCzrIrGnwlySZ0+Po856I="
}

I suggest of the above only commit and date be included by default via "git signatures add" with support for the rest with future flags and verification modes.

Simple and high security by default with flexibility and some degree of future-proofness for real world use cases.

Now for the most controversial part of the above, is my re-introduction of (optional) patch id support.

@tylerlevine made some strong points on the #! IRC today. It made me realize we actually have 3 verification levels possible which each have different use cases and tradeoffs. I have taken to calling this the "anchor" status. The signature will be valid, but the anchor status will tell you at what security level it is anchored to the current tree we are validating it in.

Stick with me as I try to lay down some context...

So there are three descending levels of verification possible and only the highest level needs to be reported.

Trust Anchor levels

commit

Signature valid against the commit hash of the ref we are verifying
Signature invalid after squashing
Signature invalid after changing commit messages
Signature invalid after rebasing
Signature invalid if changeset is moved to a new location in this repo or another repo

tree

Signature valid against tree hash of the ref we are verifying
Signature valid after squashing
Signature valid after changing commit messages
Signature invalid after rebasing
Signature invalid if changeset is moved to a new location in this repo or another repo

patch

Signature valid against patch id hash of the changeset we are verifying
Signature valid after squashing
Signature valid after changing commit messages
Signature valid after rebasing
Signature valid if changeset is moved to a new location in this repo or another repo

Use Cases

Small team willing to allow only one "ready to merge" changeset at a time

Use "commit" level anchoring and truck on. It is the default and you need
special flags to verify any other way.

Above small team that also wants to be able to squash and amend comments

Consider "tree" level anchoring which offers greater flexibility.

Understand the tradeoff here is that a bad actor can squash all commits into
one and change the commit message to offensive ascii art, but can't actually
change the code.

An organization with many in-flight diffs and reviews merging out of order

Consider "patch" level anchoring for authors/reviewers.

Understand that this can only certify that a given changeset was authored and
reviewed in isolation.

You -must- pair this with a CI system or maintainer(s) that will do one of:

Do only signed merge commits to master branch
Sign trusted HEADs/tag with "commit" or "tree" level anchoring post-merge

If you fail to do one of the latter steps to prevent history mutation you are
knowingly creating a security hole and are a terrible person.

lrvick · 2018-10-05T06:10:53Z

Re Licensing:
Most companies in the US avoid GPL like a cancer. Legal teams panic. Seen it many times. I don't think that panic is justified but GPL suit FUD makes it really hard to get GPL tools approved in companies.

Apache2/MIT are near universally automatically okayed so I favor that approach. Can just copy/pasta from Rust on this one imo.

Ekleog · 2018-10-05T06:25:38Z

RFC

Packets MUST always appended to the end of the packet-list, insertion anywhere except at the end of the packet-list MUST cause a critical error.

I… don't really like this, because it means we can't, by design, do the cat_uniq_sort merge -- which would likely turn out to be useful for simplifying merges. Unless we forbid merge commits on “review-branch” (which seems consistent with the date requirement above?) and have the tool handle the merge itself, which would start to complexify either the use or the implementation… potentially more than just saying the list is unordered and must be ordered by rfc4880 signature creation time (or date, see below).

About dates

Packets with signatures by the same key in review-list MUST monotonically increase their packet's "date" field, failure to observe that MUST cause a critical error.

Suggestion: make monotony strict. I'm not 100% sure whether the english usage for “monotony” already is strict, but we lose nothing by mentioning it explicitly.

"EXPKEYSIG" gpg status code means the key's own signature has expired, it SHOULD be interpreted as "GOODSIG" if packet's-commit-date is <= than key's expiration date, MUST be interpreted as "ERRSIG with NO_PUBKEY" otherwise.

Suggestion: put this at the top. While reading I had started writing a long paragraph about dates and how your scheme was uselessly complex, until I came upon this and understood the reason of it all.

Also, I think the date handling paragraph could be split off and made explicit with the 4 dates that would end up present: git commit date, git author date, RFC4880 creation date, date inside the signed blob. This is a non-trivial part of the RFC.

If we allow a review encoded in the packet to have its own "date" field (which I don't like for the following), it MUST be <= packet's-commit-date and >= packet's "date" field, failure to observe that MUST cause a critical error.

This part makes me think you only want 3 dates, ie. remove the date inside the data. Is that so? Sounds good to me. But TBH I'm not sure I understand all of the date setup. It sounds like a part you've really well thought out, but it's interspersed with the rest of the text and I can't easily imagine the big picture (and IMO the RFC should be easily understandable, otherwise we're bound to have interoperability or security issues).

packet's "date" field <= review's "date" field <= packet's-commit-date <= author's key own "expire date"

If we're going to keep the signed-data-date-field (assuming you mean “review's date field” = signed-data-date-field and “packet's date” = RFC4880 creation date), then I think we should swap the first two members: first one would create the review, and only after sign, so the naive and simplest behaviour would be incorrect as per your specification.

Signature verification

When interpreting packet-list only the packet with largest "date" field (serial-number) MUST be considered for each signing key, all other packets for the same key MUST be ignored.

Hmm… there may be ambiguity here for a masterkey with two signing subkeys. I think we agree we want to allow only one signature per masterkey, so maybe something like “When interpreting packet-list, only the packet with the highest packet date MUST be considered for each RFC4880 master key. All other packets by the same key MUST be ignored. In addition, packets by keys for which no judgement trust has been assigned SHOULD be ignored.”

GOODSIG, EXPSIG, …

This is GnuPG jargon. Maybe use RFC4880 terms instead? Then it's kind of obvious what these means wrt. RFC4880, so I guess that's not an important change.

all other statuses, including "BADSIG" and "ERRSIG without NO_PUBKEY", MUST cause a critical error (can be made into a warning, in case gpg guys break something, I guess).

-> SHOULD and remove the parenthesis?

Failure to interpret the contents of "GOODSIG" review MUST cause a warning and MUST cause such a review to be ignored.

-> SHOULD for the first MUST: we may want fully automated setups where no one would be there to check the warning… and failure to interpret the contents of GOODSIG is not a security issue by itself, just proof of a bug in either the sender or the recipient.

Format

meaning that the reviewer certifies the "diff" of this "commit" when applied to the parent of the first "" together "with" commit "diff"s of the later "commit-id"s

OK, so after a few minutes of thought, I think I understand what you mean (undertone: this needs clarification :p). It's the “trust this commit iff. you're evaluating the security of a commit that already includes these other commits”, right?

meaning that the reviewer certifies the "state" "of" the following "filepaths" in "" tree.

I… am not sure <filepath> is a route we want to go. It's quite hard to evaluate a file all by itself, and if I have eg. this history:

commit 1 (filepath A state-signed)
commit 2 (touches B only) (filepath B state-signed)

then I'm not sure it would make sense to consider the security of commit 2 as equivalent to “filepaths A and B state-signed”.

Also, handling these in a non-counter-intuitive way different from just ignoring them sounds like it'd add quite a lot of complexity.

In my mind, we're better off leaving this to crev and concentrating on reviewing commits only, and not try to merge the two concerns into a single tool… but maybe I'm missing a use case?

That said, we do need state signatures that sign the whole repository at least.

[context-understanding (medium|high|author)] # low by default, indicates understanding of the context of the diff

I think we should allow an explicit low (same for all other fields)

[result-otherwise (!|-|+)] # "!" by default with header conditions, "0" without header conditions, this line MUST not be present without header conditions, this is the "result" when not all header conditions are met

s/MUST not/MUST NOT/

Also, are header conditions the with <commit-id> tags? (I don't see what of <filepath> would integrate here) If so, it'd likely be better to make it explicit :)

[about priorities] I opted in for "low". But this can be reconsidered, indeed.

Well, I know for my tool, if I get around to doing it (not really good at doing stuff nowadays, too much things flying around…), I'll just output the values all the time. IMO default values would be a UI question that would thus be left implementation-defined.

Ekleog · 2018-10-05T06:42:43Z

@lrvick

hierarchical data for artifacts

For signing artifacts, you need a setup that permits reproducible builds (and not things that will depend on eg. the version of the headers installed on your system). For this you need nix or a similar reproducible build tool. I will talk about nix because it's the one I know.

Nix builds by taking the nix expression, building a .drv of it, and then building the package. The point is, you only need to verify the .drv <-> built package association to have complete commit <-> package security, because the .drv is derived from data present in the commit and is enough to ensure that the whole build setup is the same. In particular, .drv's already contain the repository commit. (when they're written to be reproducible, it's also possible to not write them this way but then it'll likely not be reproducible)

tl;dr: I think signing build artifacts is orthogonal to signing the code.

patch-id again

I have another solution for the organization with many in-flight diffs (described on IRC, but I don't think I've described it here). That's the setup I'm seeing for nixpkgs:

Committer merges the commit and pushes it to the pending branch
Committer signs the (now-on-pending) commit ref
CI advances master by following pending once both tests and signature checks pass

This is, actually, similar to what hydra is doing with tests. Obviously, CI could also run before the merge on pending, because a test breaking on pending would be a PITA: every committer who would have pushed a commit after the failing one would likely have to re-sign, if the commit is backed-out and not fixed.

An alternative setup, a bit more complex, that would avoid this issue:

Committer merges the commit and pushes it to the pending-test branch
CI job 1 advances pending-signature by fast-forwarding to the last commit of pending-test that passes tests
Committers sign the commits on pending-signature
CI job 2 advances master by fast-forwarding to the last commit of pending-signature for which all commits have been signed

Ekleog · 2018-10-05T06:57:24Z

@oxij Now, an old concern coming back: performance.

How do we handle verification? My understanding is we would start from the latest commit .* state-signed commit that's trusted according to the judgement trust database (without of <filename> restrictions), and then advance commit by commit checking each commit. (BTW, given each discovered CVE would potentially require re-signing the state commits, I don't think we would actually have many state commits)

First question: How do we find the latest trusted state-signed commit? git log --oneline on nixpkgs already takes ~2s on my machine, if we need to search all of these files, removing a layer of base64 and parsing for state commits, I fear already this will take relatively long.

But the important question is the second question: once we have the latest trust-signed commit, how do we check all commits until the latest? It's basically running a state-machine from the latest trust-signed commit to the latest commit. Which isn't reasonable if my guess is correct and the only state-signed commit we see in practice is trust initialization. So we would need a way to cache results between calls (at least finding the latest state-signed commit and maybe a dump of the state of the state machine along with the commit on the refs/signatures branch at which it was computed). This would make the implementation… very complex, I fear :/

That said, apart from completely removing the with <commit-id> possibility, I don't see any way of reducing complexity, so… (and even then, having to store the state across invocations would be required to avoid having to re-check all signatures all the time)

oxij · 2018-10-05T06:58:38Z

@lrvick

strong use case for hierarchical data: projects that yeild compiled artifact or artifacts.

IMHO if you want to consider artifacts as part of the signature then you would have to make the tool verify said artifacts... For something like this I would just stick JSON/YAML/etc into the comment field.

I would want to support an optional flag to include a hash map of a directory full of output artifacts.

Similarly.

If multiple CI systems or reviewers agrees on the output binary hashes we now can attest 2 things strongly: * this code builds deterministically * these binaries came from this code

Yes, but that has almost nothing to do with code signatures. So, similarly.

we actually have 3 verification levels possible which each have different use cases and tradeoffs

Correct, but my argument is that 2 out of 3 make no real sense. I strongly prefer to start with commit-id-only system and keep it there as long as possible (it doesn't mean we should not consider it, I'm just saying it a very low priority thing).

## Use Cases ### Single contributor willing to mere one changeset at a time Use "commit" level anchoring and truck on. It is the default and you need special flags to verify any other way.

Sounds good.

### Single contributor that wants to be able to squash and ammend comments Consider "tree" level anchoring which offers greater flexibility.

Why? If you are the author, you can squash and resign.

### An organization with many in-flight diffs and reviews merging out of order Consider "patch" level anchoring for authors/reviewers.

Why? Original patches would be signed by the original author, merge commits would be signed by maintainers (and, later, maybe by the original author).

Apache2/MIT are near universally automatically okayed so I favor that approach. Can just copy/pasta from Rust on this one imo.

But this is not a library, this is a tool.

Most companies avoid GPL like a cancer. Legal teams panic. Seen it many times. I don't think that panic is justified but GPL suit FUD makes it really hard to get GPL tools approved in companies.

Then why do they even use GCC, coreutils, bash, grep, utillinux, ..., Linux, GPG, git, etc? They are always welcome to the BSDs.

oxij · 2018-10-05T07:55:59Z

> Packets MUST always appended to the end of the packet-list, insertion anywhere except at the end of the packet-list MUST cause a critical error. Unless we forbid merge commits on “review-branch” (which seems consistent with the date requirement above?) and have the tool handle the merge itself, which would start to complexify either the use or the implementation… potentially more than just saying the list is unordered and must be ordered by rfc4880 signature creation time (or date, see below).

Yes, my idea here is that the history would be linear and the tool would do the merges locally, similarly to subversion. As I said before, I want `git wotr add` to add a review like `git add` adds a file. You would then pull, `git wotr commit` (which just appends packets) and push. Something like `git wotr sync` could do all three together. This can be relaxed to a less ordered thing you want, but then you would have to verify the correctness and dates of merge commits... (see the note on "blockchain" thing below)

> Packets with signatures by the same key in review-list MUST monotonically increase their packet's "date" field, failure to observe that MUST cause a critical error. Suggestion: make monotony strict.

Right, fixed.

> "EXPKEYSIG" gpg status code means the key's own signature has expired, it SHOULD be interpreted as "GOODSIG" if packet's-commit-date is <= than key's expiration date, MUST be interpreted as "ERRSIG with NO_PUBKEY" otherwise. Suggestion: put this at the top. While reading I had started writing a long paragraph about dates and how your scheme was uselessly complex, until I came upon this and understood the reason of it all.

Hm, I'll think about it.

Also, I think the date handling paragraph could be split off and made explicit with the 4 dates that would end up present: git commit date, git author date, RFC4880 creation date, date inside the signed blob. This is a non-trivial part of the RFC.

Okay.

> If we allow a review encoded in the packet to have its own "date" field (which I don't like for the following), it MUST be <= packet's-commit-date and >= packet's "date" field, failure to observe that MUST cause a critical error. This part makes me think you only want 3 dates, ie. remove the date inside the data. Is that so?

Yes, I see no point in it, and ...

But TBH I'm not sure I understand all of the date setup. It sounds like a part you've really well thought out, but it's interspersed with the rest of the text and I can't easily imagine the big picture (and IMO the RFC should be easily understandable, otherwise we're bound to have interoperability or security issues). > packet's "date" field <= review's "date" field <= packet's-commit-date <= author's key own "expire date" If we're going to keep the signed-data-date-field (assuming you mean “review's date field” = signed-data-date-field and “packet's date” = RFC4880 creation date), then I think we should swap the first two members: first one would create the review, and only after sign, so the naive and simplest behaviour would be incorrect as per your specification.

... that's exactly the problem. If you want to interpret packet dates as "serial numbers" you need

> packet's "date" field <= review's "date" field <= packet's-commit-date <= author's key own "expire date"

to meet the monotonicity requirements, but semantically

> review's "date" field <= packet's "date" field <= packet's-commit-date <= author's key own "expire date"

makes more sense because you first review, then sign. (Which, of course just means that you actually want `review's "date" field == packet's "date" field`, hence no need for two.) So, yes, let's just drop the second date.

Sounds good to me.

Good. :) Btw, the alternative to the described setup is to do review's "date" field <= packet's "date" field <= author's key own "expire date" and ignore the commits, but then we would get problems with agreeing on the set of reviews. All that linearity of the history is to make sure everyone agrees on the state of the common state, i.e. literally the "blockchain". (I should do an ICO and ask for VC funding somewhere around now, yes.) The key point is that by commiting to a common chain with the outlined rules of verification you can be sure that nobody censors your reviews, which is kinda the point for "!" reviews. The same result can be archived by verifying that merge commits don't add or drop any packets and preserve the partial ordering on packets in packet-lists and dates, but that's way too hard for a first version, especially when it's written in bash.

> When interpreting packet-list only the packet with largest "date" field (serial-number) MUST be considered for each signing key, all other packets for the same key MUST be ignored. Hmm… there may be ambiguity here for a masterkey with two signing subkeys. I think we agree we want to allow only one signature per masterkey, so maybe something like “When interpreting packet-list, only the packet with the highest packet date MUST be considered for each RFC4880 master key. All other packets by the same key MUST be ignored. In addition, packets by keys for which no judgement trust has been assigned SHOULD be ignored.”

Agreed.

> GOODSIG, EXPSIG, … This is GnuPG jargon. Maybe use RFC4880 terms instead? Then it's kind of obvious what these means wrt. RFC4880, so I guess that's not an important change.

Okay.

> all other statuses, including "BADSIG" and "ERRSIG without NO_PUBKEY", MUST cause a critical error (can be made into a warning, in case gpg guys break something, I guess). -> SHOULD and remove the parenthesis?

Something like

SHOULD cause a critical error which MAY be turned into a warning, in case gpg guys break something, I guess

?

> Failure to interpret the contents of "GOODSIG" review MUST cause a warning and MUST cause such a review to be ignored. -> SHOULD for the first MUST: we may want fully automated setups where no one would be there to check the warning… and failure to interpret the contents of GOODSIG is not a security issue by itself, just proof of a bug in either the sender or the recipient.

Yeah, but I'd rather see those warnings if that's the case so that I could fix the tool immediately. But okay.

> meaning that the reviewer certifies the "diff" of this "commit" when applied to the parent of the first "<commit-id>" together "with" commit "diff"s of the later "commit-id"s OK, so after a few minutes of thought, I think I understand what you mean (undertone: this needs clarification :p). It's the “trust this commit iff. you're evaluating the security of a commit that already includes these other commits”, right?

Yes. Though I read it as "trust this commit iff (with) those later commits are applied too".

> meaning that the reviewer certifies the "state" "of" the following "filepaths" in "<commit-id>" tree. I… am not sure `<filepath>` is a route we want to go. It's quite hard to evaluate a file all by itself, ...

Hm, agreed. But I kinda want a way to specify "how much" of the context did you check.

That said, we do need state signatures that sign the whole repository at least.

Agreed. In that case the following grammar also makes sense ``` commit <commit-id> (state|diff) [with <commit-id> diff with <commit-id> diff with <commit-id> diff] ``` I.e. now you can sign the state, assuming the diffs are applied. Which might be useful.

> [context-understanding (medium|high|author)] # low by default, indicates understanding of the context of the diff I think we should allow an explicit `low` (same for all other fields)

Okay. We can also make all of them required for simplicity, git will compress them away anyway when packing into deltas.

> [result-otherwise (!|-|+)] # "!" by default with header conditions, "0" without header conditions, this line MUST not be present without header conditions, this is the "result" when not all header conditions are met s/MUST not/MUST NOT/

Fixed.

Also, are header conditions the `with <commit-id>` tags? (I don't see what `of <filepath>` would integrate here) If so, it'd likely be better to make it explicit :)

Yes, they didn't integrate together.

> [about priorities] I opted in for "low". But this can be reconsidered, indeed. Well, I know for my tool, if I get around to doing it (not really good at doing stuff nowadays, too much things flying around…), I'll just output the values all the time. IMO default values would be a UI question that would thus be left implementation-defined.

I see, okay. Thanks for a review. I'll massage the RFC some more and publish the new version when I'll have a big enough slot of free time.

oxij · 2018-10-05T08:13:10Z

> The simplest solution I see is just use two gpg trustdbs: one for "keys trust" and a second one for "judgment trust". Hmm… Actually, I'm not really sure we need keys trust? If judgement trust is associated to a public key anyway, then there have been manual vetting of said public key (to attribute it a judgement trust), so we wouldn't need to also verify the key (actually, I may decide to trust your public key if I see you doing good reviews, without ever having checked your identity and thus not having signed your key, and even less given it ownertrust-in-the-WoT-sense)

Makes sense. Agreed.

How do we handle verification? My understanding is we would start from the latest `commit .* state`-signed commit that's trusted according to the judgement trust database (without `of <filename>` restrictions), and then advance commit by commit checking each commit.

Yes.

(BTW, given each discovered CVE would potentially require re-signing the state commits, I don't think we would actually have many state commits) First question: How do we find the latest trusted state-signed commit? `git log --oneline` on nixpkgs already takes ~2s on my machine, if we need to search all of these files, removing a layer of base64 and parsing for state commits, I fear already this will take relatively long.

I guess you can try verifying all commits from HEAD down, making holes for all commits with "with <commit-id>", until you hit a required depth or encounter a "state", then you would do a second pass in the opposite direction to cover the "with" holes.

But the important question is the second question: once we have the latest trust-signed commit, how do we check all commits until the latest? It's basically running a state-machine from the latest trust-signed commit to the latest commit. Which isn't reasonable if my guess is correct and the only state-signed commit we see in practice is trust initialization. So we would need a way to cache results between calls (at least finding the latest state-signed commit and maybe a dump of the state of the state machine along with the commit on the refs/signatures branch at which it was computed). This would make the implementation… very complex, I fear :/

I have no complete ideas yet.

That said, apart from completely removing the `with <commit-id>` possibility, I don't see any way of reducing complexity, so… (and even then, having to store the state across invocations would be required to avoid having to re-check all signatures all the time)

True.

Ekleog · 2018-10-05T08:40:49Z

Something like

SHOULD cause a critical error which MAY be turned into a warning, in case gpg guys break something, I guess

?

Agreed :)

Yes, my idea here is that the history would be linear and the tool would do the merges locally, similarly to subversion. As I said before, I want git wotr add to add a review like git add adds a file. You would then pull, git wotr commit (which just appends packets) and push. Something like git wotr sync could do all three together. This can be relaxed to a less ordered thing you want, but then you would have to verify the correctness and dates of merge commits... (see the note on "blockchain" thing below)

Hmm… This works well in a centralized system, where everyone pushes the reviews on the same branch. This works medium-well in a fully decentralized system (because the blockchain benefit is quite reduced, given that blocking a ! review only requires blocking reviews from one user -- in exchange it's likely harder to MitM people). But I wonder how that'd work for a semi-centralized system like nixpkgs.

Actually, thinking it over, there likely won't be that many issues. The biggest question will likely be thinking of how to handle multiple remotes, so long as we agree that each public repository would have its independent refs/signature branch (and I think we do)… but anyway we'd have to.

So, OK for me :)

In that case the following grammar also makes sense
commit <commit-id> (state|diff)
[with <commit-id> diff
with <commit-id> diff
with <commit-id> diff]
I.e. now you can sign the state, assuming the diffs are applied. Which might be useful.

Sounds good to me! indeed, signing the state assuming diffs are applied would likely be useful for when we find a serious CVE in the trust-initialization state commit, so as not to have to re-do the trust-initialization from scratch or live with an unsafe signed state commit :)

Okay. We can also make all of them required for simplicity, git will compress them away anyway when packing into deltas.

SGTM

I'll massage the RFC some more and publish the new version when I'll have a big enough slot of free time.

Great, thanks! Also, maybe consider giving it a git repository or whatever so that we can both easily refer to it henceforth and discuss on specific patches? :)

I guess you can try verifying all commits from HEAD down, making holes for all commits with "with ", until you hit a required depth or encounter a "state", then you would do a second pass in the opposite direction to cover the "with" holes.

Good idea, indeed! and actually the with <commit-id> could be checked by keeping in memory the list of already-met commit IDs as we'd be going down we'd have seen the <commit-id> commits before seeing the with <commit-id> commits.

Ekleog · 2018-10-05T08:59:04Z

OK, so now that, I think, we agree on almost everything (except the question of whether tree-id or patch-id should be supported or not, but I guess that can be considered an extension by git-signatures so long as it's compatible with the message format), there is the major point (:D) to solve: what technical format?

Currently, three proposals have been raised:

JSON
Git commit-like objects
RFC822

I personally see no reason for RFC822: it's like the git commit-like objects, except with random whitespaces to make parsing harder.

This leaves us with either JSON and git-like commit objects.

To give an example of what the result would look like with the two tools:

commit 1234567890123456789012345678901234567890 diff
with 0987654321098765432109876543210987654321 diff
with 0987654321123456789009876543211234567890 diff
context-understanding high
diff-understanding high
thoroughness high
result +
result-otherwise -
priority high

My comment.

vs. JSON (which would be minified in practice)

{
    "type": "commit-diff",
    "oid": "1234567890123456789012345678901234567890",
    "with": [
        "1234567890123456789012345678901234567890",
        "0987654321123456789009876543211234567890"
    ],
    "context-understanding": "high",
    "diff-understanding": "high",
    "thoroughness": "high",
    "result": "+",
    "result-otherwise": "-",
    "priority": "high",
    "message": "My comment."
}

I… don't really care which format we use. Rust (the language I love nowadays) allows both to write a parser and to parse from JSON quite easily, so…

My “elegance” internal barometer prefers the git commit-like object. My “ease of use” internal barometer tells me most languages would find it much easier to just parse from JSON using a library than to write a full parser from scratch -- and if the language is not memory-safe, parsers are a well-known source of security issues. Also, JSON has the advantage of being more easily extensible should we want to extend the signature format later… but I don't really see why we would.

So… meh. Both ways work for me. But I'll note that we're already using nested structures with the with keyword, that is implicitly a list.

lrvick · 2018-10-05T16:50:00Z

For signing artifacts, you need a setup that permits reproducible builds (and not things that will depend on eg. the version of the headers installed on your system). For this you need nix or a similar reproducible build tool. I will talk about nix because it's the one I know.

I have a number of repos I have deterministic building on. I type "make bin" and get the same binary from the same code every time. Then other people build those binaries and confirm they get the same hashes. We then have to certify those hashes came from that code by signing the code commit hashes with the binary hashes. This is imo a pretty good fit as extra metadata to put in the review signature for these use cases to do this as part of the review, as that is the stage that we do it.

Thanks for a review. I'll massage the RFC some more and publish the new version when I'll have a big enough slot of free time.

My comments were mostly the same as @Ekleog but overall this is a fantastic start. Thanks for that. Please feel free to make a new PR with this to maybe the "docs" folder in this repo so we can isolate spec discussions and review away from implementation concerns in this PR.

I am happy to create a "git-signature-spec" repo if we really want another repo this early. I granted you write permissions to this one for now.

lrvick · 2018-10-05T17:01:56Z

Actually, thinking it over, there likely won't be that many issues. The biggest question will likely be thinking of how to handle multiple remotes, so long as we agree that each public repository would have its independent refs/signature branch (and I think we do)… but anyway we'd have to.

The 'patch' level anchored author/review + maintiainer certifying order with 'commit' level anchoring per my last use case under anchoring, would be a good fit for cherry picking across origins where contributors can only put changes in tree A but maintainers control tree B and certify patch ordering there and how changes interact with each other. I actually know a number of companies that do this for strong isolation of security domains and repo hosting with one public and one internal.

lrvick · 2018-10-05T17:24:53Z

IMHO if you want to consider artifacts as part of the signature then you would have to make the tool verify said artifacts... For something like this I would just stick JSON/YAML/etc into the comment field.

Verifying the artifacts is trivial if you assume deterministic builds. (IMO any non deterministic builds are a security bug and one I spend weeks fixing for some projects).

If m-of-n reviewers (or CI servers) agree on the same hashes, then the given hashes are considered reviewed/trusted. Any downstream systems now know that is what was reviewed and if a new hash pops up saying it was for that given release/tag, it should be regarded as an imposter.

I have another solution for the organization with many in-flight diffs (described on IRC, but I don't think I've described it here). That's the setup I'm seeing for nixpkgs:

If I am following you: committer signs in a PR, reviewer signs, merge to a pending branch (that everyone somehow has access to without that being a problem?), commiter -re-signs-, then later maintainer signs as a part of a release tag etc. This seems much more complex than the workflow I presented without any security gain other than avoiding different anchoring levels for as far as I can tell no gain. In effect it seems like you are re-creating the same security levels at the branch level vs signature level which I am sure there are use cases for. That certianly can't meet mine or that of most orgs I am aware of in silicon valley.

Asking companies to totally abandon the git-flow it took them yaers to adopt is pretty rough. Asking them to add signatures at their regular steps in their usual flow I expect can be more widely and readily adopted and with my mitigations I don't think adds any attack surface over your presented flow.

I am however totally in favor of documenting all these potential workflows and security assumptions they make so people can make the right decision for their project as there are clearly multiple workflows that will end with the same security level. We do seem in strong agreement that 'commit' level signing is strongest and should be the thing that downstreams should trust be it on master branch, tags, etc.

Ekleog · 2018-10-05T19:51:49Z

I have a number of repos I have deterministic building on. I type "make bin" and get the same binary from the same code every time. Then other people build those binaries and confirm they get the same hashes. We then have to certify those hashes came from that code by signing the code commit hashes with the binary hashes. This is imo a pretty good fit as extra metadata to put in the review signature for these use cases to do this as part of the review, as that is the stage that we do it.

So, your make bin downloads a fixed gcc (changes in versions of gcc will likely change optimized output code) and you vendor all your dependencies? If you don't do this, there are chances your builds have been reproducible only by chance. Then, this is only a side-track in the discussion.

Overall, the git commit signatures are designed to sign the contents of the git repository. I'm really not convinced that artifacts can be reviewed: signing the two in the same step is conflating the “a human verified this code” and “a computer generated this artifact from this code” concerns. Also, you're anyways going to provide a CONTENTS.digest{,.asc}, right? If so, you can just include the hash of the commit from which the build was made in this file, and now the signature for the artifact lives along the artifact, which IMO makes much more sense than it living along the source code.

Please feel free to make a new PR with this to maybe the "docs" folder in this repo so we can isolate spec discussions and review away from implementation concerns in this PR.

That's a good idea :) Though in order to have a vendor-neutral repository, I've created a repository at a created-for-this-purpose org: git-wotr/spec. @oxij, feel free to use this repository to push the RFC (or not), or to tell me you'd rather use the git-wotr name for your bash project, in which case I'll hand over the org to you (you came up with the name after all :p). I'm sending you two invitations to the org.

For the time being, I've initialized the repository with the GFDL 1.3 (which will require a header at the top of the RFC), and have restricted to only signed commits and only through reviewed PRs. All this is subject to change.

Actually, thinking it over, there likely won't be that many issues. The biggest question will likely be thinking of how to handle multiple remotes, so long as we agree that each public repository would have its independent refs/signature branch (and I think we do)… but anyway we'd have to.

The 'patch' level anchored author/review + maintiainer certifying order with 'commit' level anchoring per my last use case under anchoring, would be a good fit for cherry picking across origins where contributors can only put changes in tree A but maintainers control tree B and certify patch ordering there and how changes interact with each other. I actually know a number of companies that do this for strong isolation of security domains and repo hosting with one public and one internal.

I was thinking of having multiple remotes for the refs/signatures branch (ie. not have all the signatures centralized in one repo but be able to have non-committers users push reviews to their own repos, and other users to use them), so that's unrelated with patch-id signing :)

If m-of-n reviewers (or CI servers) agree on the same hashes, then the given hashes are considered reviewed/trusted.

This sentence shows clearly the conflation of concerns. Reviews are done by reviewers. Artifact signing is done by CI servers. Having reviews include artifacts requires people willing to sign an artifact to also sign a review, which is potentially false.

Also, there is a problem of trust. The fact that a review-including-artifacts hash is signed by enough people means:

For the review, that the diff between the previous commit and this commit is correct
For the artifact, that the state of the repository at this commit produces this artefact.

Overall, I can't find any similarity in the handling of the two things, so I don't understand why they should be put in the same signature: it just limits flexibility for everyone.

If I am following you: committer signs in a PR, reviewer signs, merge to a pending branch (that everyone somehow has access to without that being a problem?), commiter -re-signs-, then later maintainer signs as a part of a release tag etc

You're making it much more complex than it needs be. My setup is exclusively the three bullet points, preceded by an implicit “contributor sends PR”.

The contributor doesn't need to sign the PR, because no one would trust their key anyway, so it doesn't matter. And the pending branch can be accessed by any committer without any problem, because anyway it's not used by anyone and master will move forward only when the review signatures will be checked.

Asking companies to totally abandon the git-flow it took them yaers to adopt is pretty rough. Asking them to add signatures at their regular steps in their usual flow I expect can be more widely and readily adopted and with my mitigations I don't think adds any attack surface over your presented flow.

… if you read my flow, it's exactly the git-flow, except after the merge to pending the user adds in a signature to the commit. And sorry but the patch-id method without anchoring is flawed and I haven't seen any mitigation to prevent it yet.

oxij · 2018-10-05T20:32:32Z

@Ekleog Yay, GFDL! I pushed the original RFC to git-wotr/spec#1.

lrvick · 2018-10-05T21:13:20Z

There are a lot of interesting subthreads going on here so I am going to file issues on git-wotr so we can discuss them in their own threads, because this is getting silly and important stuff gets lost :)

oxij mentioned this pull request Oct 2, 2018

[RFC 0034] Expression Integrity NixOS/rfcs#34

Closed

lrvick requested review from KellerFuchs and lrvick October 2, 2018 08:41

oxij force-pushed the three-zero-days branch 2 times, most recently from d601e33 to a5ec96c Compare October 2, 2018 21:34

oxij force-pushed the three-zero-days branch 2 times, most recently from 6ac3023 to ecbf2d2 Compare October 2, 2018 21:57

oxij force-pushed the three-zero-days branch 5 times, most recently from fa74809 to 8ab4140 Compare October 3, 2018 18:24

oxij force-pushed the three-zero-days branch from 8ab4140 to 391dc3c Compare October 3, 2018 18:43

lrvick reviewed Oct 3, 2018

View reviewed changes

oxij added 8 commits October 4, 2018 14:37

fix loss of said error status codes in a couple of cases

27f3223

Need `bash -e` ASAP.

refactor generate_files.bash

0fbc19c

add expired key test (will fail)

5d667df

add unknown key test (will fail)

2b09c52

add broken signature test (will fail)

3537831

fix tests by properly parsing all of gpg's status-fd output

8222125

Also produce cleaner status codes.

stop compressing signatures

5d93c41

This only adds overhead as git compresses its objects anyway.

check status codes in tests where possible

135cdbc

oxij force-pushed the three-zero-days branch from e58f7d0 to 135cdbc Compare October 4, 2018 14:37

lrvick requested a review from daurnimator October 5, 2018 06:06

lrvick mentioned this pull request Oct 6, 2018

Reproducible build support git-wotr/spec#2

Open

Fix three zero days #13

Are you sure you want to change the base?

Fix three zero days #13

Conversation

oxij commented Oct 2, 2018 • edited Loading

lrvick commented Oct 2, 2018 • edited Loading

oxij commented Oct 2, 2018 via email

oxij commented Oct 2, 2018 via email

oxij commented Oct 3, 2018 via email

lrvick commented Oct 3, 2018

lrvick commented Oct 3, 2018

lrvick commented Oct 3, 2018

lrvick Oct 3, 2018

Choose a reason for hiding this comment

oxij commented Oct 3, 2018 via email

RyanSquared commented Oct 4, 2018

lrvick commented Oct 4, 2018

lrvick commented Oct 4, 2018 • edited Loading

Commit Ref

Pros

Cons

Patch-id

Pros

Cons

Patch-id + master HEAD

Pros

Cons

Hybrid Verification

Pros

Cons

Ekleog commented Oct 4, 2018

Ekleog commented Oct 4, 2018

Ekleog commented Oct 4, 2018

oxij commented Oct 4, 2018 via email

daurnimator commented Oct 4, 2018

Ekleog commented Oct 4, 2018

oxij commented Oct 5, 2018 via email

Ekleog commented Oct 5, 2018

lrvick commented Oct 5, 2018 • edited Loading

Trust Anchor levels

commit

tree

patch

Use Cases

Small team willing to allow only one "ready to merge" changeset at a time

Above small team that also wants to be able to squash and amend comments

An organization with many in-flight diffs and reviews merging out of order

lrvick commented Oct 5, 2018 • edited Loading

Ekleog commented Oct 5, 2018

RFC

About dates

Signature verification

Format

Ekleog commented Oct 5, 2018

Ekleog commented Oct 5, 2018

oxij commented Oct 5, 2018 via email

oxij commented Oct 5, 2018 via email

oxij commented Oct 5, 2018 via email

Ekleog commented Oct 5, 2018

Ekleog commented Oct 5, 2018

lrvick commented Oct 5, 2018 • edited Loading

lrvick commented Oct 5, 2018

lrvick commented Oct 5, 2018

Ekleog commented Oct 5, 2018

oxij commented Oct 5, 2018

lrvick commented Oct 5, 2018

oxij commented Oct 2, 2018 •

edited

Loading

lrvick commented Oct 2, 2018 •

edited

Loading

lrvick commented Oct 4, 2018 •

edited

Loading

lrvick commented Oct 5, 2018 •

edited

Loading

lrvick commented Oct 5, 2018 •

edited

Loading

lrvick commented Oct 5, 2018 •

edited

Loading