Skip to content
This repository has been archived by the owner on Feb 16, 2022. It is now read-only.

Fix three zero days #13

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

Fix three zero days #13

wants to merge 22 commits into from

Conversation

oxij
Copy link

@oxij oxij commented Oct 2, 2018

... and I think I know about the fourth one, similar to the third, but I'm too tired for now already.

First 5 commits are more or less cleanup. Then there are 3 pairs with each pair first adding a failing test that should have worked and then the next commit fixing git-signatures so that it would actually work.

I also strongly suggest:

  • rewriting this for bash -e, I frequently find errors this way,
  • using four spaces for indent :)

@lrvick
Copy link
Member

lrvick commented Oct 2, 2018

Looks solid at first glance. Will look over in more detail tomorrow.

Also:

using four spaces for indent :)

Tab indentation is actually "standard" in bash because of heredocs. Indented heredocs are only valid with leading tabs, but not leading spaces. Switching between tabs and spaces mid code is just a mess, so I just resolve that bash is like Go and mandates tabs. In python I use 4 spaces though because PEP8 ;)

rewriting this for bash -e, I frequently find errors this way,

I would welcome a PR for that after we get this and the patch-id PR merged

@oxij oxij force-pushed the three-zero-days branch 2 times, most recently from d601e33 to a5ec96c Compare October 2, 2018 21:34
@oxij
Copy link
Author

oxij commented Oct 2, 2018 via email

@oxij oxij force-pushed the three-zero-days branch 2 times, most recently from 6ac3023 to ecbf2d2 Compare October 2, 2018 21:57
@oxij
Copy link
Author

oxij commented Oct 2, 2018 via email

@oxij oxij force-pushed the three-zero-days branch 5 times, most recently from fa74809 to 8ab4140 Compare October 3, 2018 18:24
@oxij
Copy link
Author

oxij commented Oct 3, 2018 via email

@oxij oxij force-pushed the three-zero-days branch from 8ab4140 to 391dc3c Compare October 3, 2018 18:43
@lrvick
Copy link
Member

lrvick commented Oct 3, 2018

@oxij So a lot of work was going on in the sign-patch-id branch which just merged. Most of it should not touch your work, but can you rebase?

@lrvick
Copy link
Member

lrvick commented Oct 3, 2018

Also most of us involved in the project are on irc.hashbang.sh/#! 6697 if you want to keep in sync at any point.

@lrvick
Copy link
Member

lrvick commented Oct 3, 2018

I got a bit tired regenerating keys by hand while continuing this branch. Added 0f816f8 in the middle of history here and changed later commits to respect it.

I actually went this route initially, but realized it makes most CI systems hang until timeout as they become entropy starved. If you want to go this route that is totally cool, but you may have to update the CI jobs to setup haveged or something to keep the entropy pool full, or stub in something to spam to /dev/random to fill the pool before generating keys.

prefix=$$HOME/.local
bindir=$(prefix)/bin

all: test

test:
bats test/test.bats
make -C test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning for breaking out test to have its own makefile?

@oxij
Copy link
Author

oxij commented Oct 3, 2018 via email

@RyanSquared
Copy link
Member

Believe me, you don't want me on your IRC. :)

We've seen worse. Feel free to join us: ircs://irc.hashbang.sh:6697/#!,#!social - TLS required.

@lrvick
Copy link
Member

lrvick commented Oct 4, 2018

@oxij I don't think forking is required, or that JSON is required. If there is another simple to parse key/value format you have in mind, let's hear it. I would like to see an alternative suggestion rather than just declaring the current suggested format cancer :)

I also agree that we need revhash on top of patch-id. I was going to address that in my next revision, but wanted to get patch-id in first now that we proves that out as a standalone solution. It also has very clear security issues (can merge things out of order) thus me wanting to get your changes in that don't pertain to the actual signature format, then we can address that in a follow up PR where we have a key value map that we sign that contains the master head ref, the patch-id, and the current ref, with room to easily add more future key/value pairs for some future use cases that are out of scope atm.

I really value your input here. If you can elaborate on a specific attack, then lets solve for it. Otherwise we are going to end up with 2 basically identical projects with slightly different signature formats, which is still.

@lrvick
Copy link
Member

lrvick commented Oct 4, 2018

To elaborate further, lets say we sign and base64 key/value map that contains:

  • patch_id: patch-id
  • head_hash: HEAD hash of this branch
  • target_hash: HEAD hash of the branch we want to merge on top of

This map could be msgpack, json, some tupl thing. I don't actually care that much as long as it is easy to parse in a wide range of programming languages.

Wiith that assumption we end up with multiple possible verification modes with different tradeoffs, and I think we can easily support all of them in future git signatures verify flags if the above set of 3 items are signed all the time.

Commit Ref

  • Verify every commit ref is signed by m-of-n signers

Pros

  • Most simple/secure and where we started

Cons

  • Rebases invalidate signatures
  • Merge commits invalidate signatures

Patch-id

  • Verify the patch-id for a given set of changes is signed

Pros

  • Signatures can survive a rebase/squash/merge as long as no LOC changes happen

Cons

  • security issue: patches could be merged out of order or summoned from the past

Patch-id + master HEAD

  • Verify the patch-id is signed -and- that it was merged on the designated master HEAD

Pros

  • Signatures can survive a rebase/squash/merge as long as no LOC changes happen

Cons

  • Only one set of changes can be signed at a time. Once one merges the others are invalidated.

Hybrid Verification

  • Reviewers sign the lastest LOC change
  • Reviewer signatures are only verified in patch-id mode
  • CI system signatures are created certifying tests passed etc
  • CI system only has -read- access to repo
  • CI automatically re-signs afer all cahnges if tests pass etc
  • CI automatically re-signs if the master HEAD ref changes
  • CI signatures are verified in patch-id + master head mode

Pros

  • Signatures can survive a rebase/squash/merge as long as no LOC changes happen
  • Redordering attack only possible if CI signing key -and- write access to repo are obtained

Cons

  • More complex to implement and setup initially

@Ekleog
Copy link

Ekleog commented Oct 4, 2018

@oxij Just in case because I was the one who pointed out JSON, using the JSON format for serialization doesn't mean the format cannot be fixed in stone. It just means we can (/must) use a JSON parser for handling it, in exchange for a few bytes of storage (basically, the {} and "", which are the only superfluous characters here). Actually, I think base64 storage would lose much more storage space than JSON, so even the storage argument isn't that noticeable (but if comparing with eg. key1:value1;key2:value2 we could indeed optimize for space)

So it's mostly a “do we want to be able to use a JSON library in exchange for being more or less forced to use a JSON library?” question. I personally think that if deciding that the format must be an object with only string values without escapes and no superfluous spaces, then it's not harder to parse than git objects: if I can allow myself to mix syntaxes for hopeful clarity, it gives this parser:

(\x -> x[1..-1]) |> split ',' |> map (split ':' |> map (\x -> x[1..-1]))

So to sum up, my comparison between base64-ed git objects (hereafter b64go) vs. restricted JSON (hereafter rjson) formats yields:

  • b64go needs a library for sane base64 decoding, rjson can be decoded with the same amount of processing than after base64-decoding of b64go without any library (well, yeah, with adding the two \x -> x[1..-1], I deem this cost negligible though)
  • rjson can be (encoded, decoded) easily with a library, b64go cannot (still require parsing git objects by hand)
  • rjson takes less storage space than b64go

All in all, I don't see a single reason to prefer b64go over rjson, apart from the fact that I agree it's very annoying that people use JSON everywhere without thinking of it. Do I miss a point? :)

(sorry, I managed to restrict myself on the RFC thread, but it looks like it actually got out here… well, at least here it's somehow on-topic :))

@Ekleog
Copy link

Ekleog commented Oct 4, 2018

@lrvick Patch-id, in my opinion, brings much more complexity (and thus occasions for vulnerabilities) for almost no benefit. From your list:

  • The commit-ref is already possible without it
  • The patch-id is flawed as you yourself mention, so unusable for any security purpose, and I can't see any point that's not security-related
  • The patch-id + master HEAD doesn't bring anything over commit-ref : if you just consider people will sign after merging (and not before), then all that is handled by patch-id + master HEAD is also handled by commit-ref
  • The hybrid verification system is the only case where there is a marginal benefit from using patch-id + master HEAD

I personally think that supporting a highly counter-intuitive signature scheme is not worth the minor benefit from hybrid verification systems, as anyway it's not more secure than commit-ref (actually, it's less) and the convenience advantages are limited (anyway, a committer will likely merge around the same time as they sign, and multiple-signature schemes need to be handled at the fetching end anyway so there's no TOCTOU)

@Ekleog
Copy link

Ekleog commented Oct 4, 2018

Actually, an additional concern about hybrid signing mode: it can be used by a maintainer to put the blame on another, by using TOCTOU.

Example:

  • Maintainer A signs patch A for master, that adds a free
  • Maintainer B notices and sees patch A isn't merged yet. Maintainer B signs patch B for master, prepones signature to earlier than patch A, and patch B adds another free at a place that doesn't conflict
  • Maintainer B tricks CI into accepting their patch before maintainer A's
  • Now there's a double-free and maintainer A's patch introduced it

So even it is not secure, so I cannot see a single advantage in using patch-id signing.

@oxij
Copy link
Author

oxij commented Oct 4, 2018 via email

@daurnimator
Copy link
Member

@oxij we had a chat on IRC: I think @Ekleog and I can agree that at least a minimum that we sign the tree object. I think @lrvick is now on board with this.
i.e. signing patch ids is no longer to be a thing.

Whether the parents should be signed is what myself and @Ekleog disagree on: we could make that configurable.

@oxij oxij force-pushed the three-zero-days branch from e58f7d0 to 135cdbc Compare October 4, 2018 14:37
@Ekleog
Copy link

Ekleog commented Oct 4, 2018

So I guess we now only have to agree on a data format, so that all our tools end up being interoperable :)

First, the non-controversial parts: what fields should we include? I can think of (also taking ideas from crev, which has the difference that it's attempting to review the whole state of the repo instead of split it by commit):

  • commit-id
  • tree-id (optional)
  • date (optional)
  • serial number (required if no date)
  • review comment (optional, multiline UTF-8)
  • review result (+, -, neutral -- equivalent to “trust” from crev)
  • review thoroughness (0-100% ? low-medium-high ? -- indicates understanding of the diff)
  • context understanding (0-100% ? low-medium-high ? -- indicates understanding of the context of the diff)

I wonder whether we should have some default values for review thoroughness and context understanding, or whether it's reasonable to force them being present in the signed blob.

And the second likely non-controversial choice: where should we put the signatures? I would say each signature on one line, sorted in a file named commit-id (potentially copied to tree-id if defined) in a refs/signatures branch (ie. like git-notes, as I understand it)

Once we get that agreed, we will be able to reasonably go to the question of rfc822 vs git-object-like vs JSON, which will be very happy I'm sure :D

@oxij
Copy link
Author

oxij commented Oct 5, 2018 via email

@Ekleog
Copy link

Ekleog commented Oct 5, 2018

Whoops, looks like message went out too early… please disregard previous message if you received it by email. Comments about the RFC coming in later when I've finished writing them.

The simplest solution I see is just use two gpg trustdbs: one for "keys trust" and a second one for "judgment trust".

Hmm… Actually, I'm not really sure we need keys trust? If judgement trust is associated to a public key anyway, then there have been manual vetting of said public key (to attribute it a judgement trust), so we wouldn't need to also verify the key (actually, I may decide to trust your public key if I see you doing good reviews, without ever having checked your identity and thus not having signed your key, and even less given it ownertrust-in-the-WoT-sense)

I think there should be no arguments against using --export-ownertrust format for that, right?

Actually… GnuPG is not the only OpenPGP client, and sequoia-pgp is getting closer to being a usable OpenPGP client that may end up sucking less, so I'm not really sure about that.

Also, I wonder: anyway we won't be able to re-use GnuPG's WoT computation code for signatures… would we? If we won't, then why not just store them in a fingerprint:judgement trust: format? This way it's both trivial to parse and yet be trivial to import into an ownertrust db if need be (we could also decide to remove the trailing :, potentially)

wotr for "WoT reviews"

I first read w-OTR (off-the-record)… though I don't have any better idea, apart from just “wot” which would be potentially ambiguous.

@lrvick
Copy link
Member

lrvick commented Oct 5, 2018

So while I don't intend on implementing this for a few weeks or want to further derail things, I want to point out that I do have one strong use case for hierarchical data: projects that yeild compiled artifact or artifacts.

I would want to support an optional flag to include a hash map of a directory full of output artifacts.

If multiple CI systems or reviewers agrees on the output binary hashes we now can attest 2 things strongly:

  • this code builds deterministically
  • these binaries came from this code

Those signatures could be detached with the repo and exported to allow the binaries to be verified independantly downstream and anchored back to the repo ref they were compiled from.

I experimented with the following format today as a single json line incorperating ideas from @oxij with mine:

{
  "body":{
    "commit":"d27e15a09812241167a61c3222cb6e97c5b0d6b0",
    "tree":"58215f173d6ef6892a07c9073d55145f387d1968",
    "patch":"068a37e54586de8339f13ec980d31f6c30b6f6e7",
    "date": "2018-10-05T03:01:46Z",
    "review": {
      "context-understanding": "author",
      "diff-understanding": "high",
      "thoroughness":"medium",
      "result":"+"
    },
    "artifacts": {
      "out.tar.gz":"2ad6f470b5398251018a4c25f5eb35686481681f"
    }
  },
"sig":"iQJFBAABCAAvFiEEZ1U/vaRrtxq9LgsLjkeh7DWhVR0FAlu2h3IRHGxhbmNlQGxydmljay5uZXQACgkQjkeh7DWhVR0K9Q//T+UleZFWlbiIFoUBp1hX0xzg/eEaTFnnlCdWWaa5f1E+TI80Pg/wXhpEUd98ghZThXwaWJwnPp34BpZ55GU5Qr1cXjY00Yt7I0p3a5TUsdk5FP8tiUWNQ9kA6npUOIVixa8HFkhz0SvHvXNAvWsasqw6nb+SNfA+aQYIP/wAY71ZpMLkJsMQssaFWsjL/hsSmOpi7VyLEFEuUmEM1CA6VPiMvp4rvP75Bt4DnEouyFPxk4WbVIqX4DIeG8+5W8jQCLlxV6HDH4g+POdZbD7/Yg6XjDKEfD2vM/OYrBczPZX0Tu1WhVyYOtQ/WsAr9wzZjgl50/9UkP6E4pt9ft42jdNhGcO1sEwzlC5W1efC2izjL4medEiLe00zIM0L42X10HebH22j4gKyb4EErSHo0H5eLisf1xAwAHiOE2wJou1SR5dVmKydXicVN3wuLqhzc52awmoOaEts1Q3zFyB8MfDIi39CACwAvOTrKw54raoGgzxDEFEP3sKWF1FuZQJreG1Ufg57Oxv5f1lOqvaoMN7ZNjcQkPUAqj7G7e/fwyXGCG2mvnz++D5/2JAMeeO4k8iwxGXEsF0qD3QveEZPFoIpPZX+uOGqM9iIPDJd9bGu0xI5yWeqLpbdXdm4ClsiRWOmavUvjm8RK99o7I9EvctCzrIrGnwlySZ0+Po856I="
}

I suggest of the above only commit and date be included by default via "git signatures add" with support for the rest with future flags and verification modes.

Simple and high security by default with flexibility and some degree of future-proofness for real world use cases.

Now for the most controversial part of the above, is my re-introduction of (optional) patch id support.

@tylerlevine made some strong points on the #! IRC today. It made me realize we actually have 3 verification levels possible which each have different use cases and tradeoffs. I have taken to calling this the "anchor" status. The signature will be valid, but the anchor status will tell you at what security level it is anchored to the current tree we are validating it in.

Stick with me as I try to lay down some context...

So there are three descending levels of verification possible and only the highest level needs to be reported.

Trust Anchor levels

commit

  • Signature valid against the commit hash of the ref we are verifying
  • Signature invalid after squashing
  • Signature invalid after changing commit messages
  • Signature invalid after rebasing
  • Signature invalid if changeset is moved to a new location in this repo or another repo

tree

  • Signature valid against tree hash of the ref we are verifying
  • Signature valid after squashing
  • Signature valid after changing commit messages
  • Signature invalid after rebasing
  • Signature invalid if changeset is moved to a new location in this repo or another repo

patch

  • Signature valid against patch id hash of the changeset we are verifying
  • Signature valid after squashing
  • Signature valid after changing commit messages
  • Signature valid after rebasing
  • Signature valid if changeset is moved to a new location in this repo or another repo

Use Cases

Small team willing to allow only one "ready to merge" changeset at a time

Use "commit" level anchoring and truck on. It is the default and you need
special flags to verify any other way.

Above small team that also wants to be able to squash and amend comments

Consider "tree" level anchoring which offers greater flexibility.

Understand the tradeoff here is that a bad actor can squash all commits into
one and change the commit message to offensive ascii art, but can't actually
change the code.

An organization with many in-flight diffs and reviews merging out of order

Consider "patch" level anchoring for authors/reviewers.

Understand that this can only certify that a given changeset was authored and
reviewed in isolation.

You -must- pair this with a CI system or maintainer(s) that will do one of:

  • Do only signed merge commits to master branch
  • Sign trusted HEADs/tag with "commit" or "tree" level anchoring post-merge

If you fail to do one of the latter steps to prevent history mutation you are
knowingly creating a security hole and are a terrible person.

@lrvick lrvick requested a review from daurnimator October 5, 2018 06:06
@lrvick
Copy link
Member

lrvick commented Oct 5, 2018

Re Licensing:
Most companies in the US avoid GPL like a cancer. Legal teams panic. Seen it many times. I don't think that panic is justified but GPL suit FUD makes it really hard to get GPL tools approved in companies.

Apache2/MIT are near universally automatically okayed so I favor that approach. Can just copy/pasta from Rust on this one imo.

@Ekleog
Copy link

Ekleog commented Oct 5, 2018

RFC

Packets MUST always appended to the end of the packet-list, insertion anywhere except at the end of the packet-list MUST cause a critical error.

I… don't really like this, because it means we can't, by design, do the cat_uniq_sort merge -- which would likely turn out to be useful for simplifying merges. Unless we forbid merge commits on “review-branch” (which seems consistent with the date requirement above?) and have the tool handle the merge itself, which would start to complexify either the use or the implementation… potentially more than just saying the list is unordered and must be ordered by rfc4880 signature creation time (or date, see below).

About dates

Packets with signatures by the same key in review-list MUST monotonically increase their packet's "date" field, failure to observe that MUST cause a critical error.

Suggestion: make monotony strict. I'm not 100% sure whether the english usage for “monotony” already is strict, but we lose nothing by mentioning it explicitly.

"EXPKEYSIG" gpg status code means the key's own signature has expired, it SHOULD be interpreted as "GOODSIG" if packet's-commit-date is <= than key's expiration date, MUST be interpreted as "ERRSIG with NO_PUBKEY" otherwise.

Suggestion: put this at the top. While reading I had started writing a long paragraph about dates and how your scheme was uselessly complex, until I came upon this and understood the reason of it all.

Also, I think the date handling paragraph could be split off and made explicit with the 4 dates that would end up present: git commit date, git author date, RFC4880 creation date, date inside the signed blob. This is a non-trivial part of the RFC.

If we allow a review encoded in the packet to have its own "date" field (which I don't like for the following), it MUST be <= packet's-commit-date and >= packet's "date" field, failure to observe that MUST cause a critical error.

This part makes me think you only want 3 dates, ie. remove the date inside the data. Is that so? Sounds good to me. But TBH I'm not sure I understand all of the date setup. It sounds like a part you've really well thought out, but it's interspersed with the rest of the text and I can't easily imagine the big picture (and IMO the RFC should be easily understandable, otherwise we're bound to have interoperability or security issues).

packet's "date" field <= review's "date" field <= packet's-commit-date <= author's key own "expire date"

If we're going to keep the signed-data-date-field (assuming you mean “review's date field” = signed-data-date-field and “packet's date” = RFC4880 creation date), then I think we should swap the first two members: first one would create the review, and only after sign, so the naive and simplest behaviour would be incorrect as per your specification.

Signature verification

When interpreting packet-list only the packet with largest "date" field (serial-number) MUST be considered for each signing key, all other packets for the same key MUST be ignored.

Hmm… there may be ambiguity here for a masterkey with two signing subkeys. I think we agree we want to allow only one signature per masterkey, so maybe something like “When interpreting packet-list, only the packet with the highest packet date MUST be considered for each RFC4880 master key. All other packets by the same key MUST be ignored. In addition, packets by keys for which no judgement trust has been assigned SHOULD be ignored.”

GOODSIG, EXPSIG, …

This is GnuPG jargon. Maybe use RFC4880 terms instead? Then it's kind of obvious what these means wrt. RFC4880, so I guess that's not an important change.

all other statuses, including "BADSIG" and "ERRSIG without NO_PUBKEY", MUST cause a critical error (can be made into a warning, in case gpg guys break something, I guess).

-> SHOULD and remove the parenthesis?

Failure to interpret the contents of "GOODSIG" review MUST cause a warning and MUST cause such a review to be ignored.

-> SHOULD for the first MUST: we may want fully automated setups where no one would be there to check the warning… and failure to interpret the contents of GOODSIG is not a security issue by itself, just proof of a bug in either the sender or the recipient.

Format

meaning that the reviewer certifies the "diff" of this "commit" when applied to the parent of the first "" together "with" commit "diff"s of the later "commit-id"s

OK, so after a few minutes of thought, I think I understand what you mean (undertone: this needs clarification :p). It's the “trust this commit iff. you're evaluating the security of a commit that already includes these other commits”, right?

meaning that the reviewer certifies the "state" "of" the following "filepaths" in "" tree.

I… am not sure <filepath> is a route we want to go. It's quite hard to evaluate a file all by itself, and if I have eg. this history:

commit 1 (filepath A state-signed)
commit 2 (touches B only) (filepath B state-signed)

then I'm not sure it would make sense to consider the security of commit 2 as equivalent to “filepaths A and B state-signed”.

Also, handling these in a non-counter-intuitive way different from just ignoring them sounds like it'd add quite a lot of complexity.

In my mind, we're better off leaving this to crev and concentrating on reviewing commits only, and not try to merge the two concerns into a single tool… but maybe I'm missing a use case?

That said, we do need state signatures that sign the whole repository at least.

[context-understanding (medium|high|author)] # low by default, indicates understanding of the context of the diff

I think we should allow an explicit low (same for all other fields)

[result-otherwise (!|-|+)] # "!" by default with header conditions, "0" without header conditions, this line MUST not be present without header conditions, this is the "result" when not all header conditions are met

s/MUST not/MUST NOT/

Also, are header conditions the with <commit-id> tags? (I don't see what of <filepath> would integrate here) If so, it'd likely be better to make it explicit :)

[about priorities] I opted in for "low". But this can be reconsidered, indeed.

Well, I know for my tool, if I get around to doing it (not really good at doing stuff nowadays, too much things flying around…), I'll just output the values all the time. IMO default values would be a UI question that would thus be left implementation-defined.

@Ekleog
Copy link

Ekleog commented Oct 5, 2018

@lrvick

hierarchical data for artifacts

For signing artifacts, you need a setup that permits reproducible builds (and not things that will depend on eg. the version of the headers installed on your system). For this you need nix or a similar reproducible build tool. I will talk about nix because it's the one I know.

Nix builds by taking the nix expression, building a .drv of it, and then building the package. The point is, you only need to verify the .drv <-> built package association to have complete commit <-> package security, because the .drv is derived from data present in the commit and is enough to ensure that the whole build setup is the same. In particular, .drv's already contain the repository commit. (when they're written to be reproducible, it's also possible to not write them this way but then it'll likely not be reproducible)

tl;dr: I think signing build artifacts is orthogonal to signing the code.

patch-id again

I have another solution for the organization with many in-flight diffs (described on IRC, but I don't think I've described it here). That's the setup I'm seeing for nixpkgs:

  • Committer merges the commit and pushes it to the pending branch
  • Committer signs the (now-on-pending) commit ref
  • CI advances master by following pending once both tests and signature checks pass

This is, actually, similar to what hydra is doing with tests. Obviously, CI could also run before the merge on pending, because a test breaking on pending would be a PITA: every committer who would have pushed a commit after the failing one would likely have to re-sign, if the commit is backed-out and not fixed.

An alternative setup, a bit more complex, that would avoid this issue:

  • Committer merges the commit and pushes it to the pending-test branch
  • CI job 1 advances pending-signature by fast-forwarding to the last commit of pending-test that passes tests
  • Committers sign the commits on pending-signature
  • CI job 2 advances master by fast-forwarding to the last commit of pending-signature for which all commits have been signed

@Ekleog
Copy link

Ekleog commented Oct 5, 2018

@oxij Now, an old concern coming back: performance.

How do we handle verification? My understanding is we would start from the latest commit .* state-signed commit that's trusted according to the judgement trust database (without of <filename> restrictions), and then advance commit by commit checking each commit. (BTW, given each discovered CVE would potentially require re-signing the state commits, I don't think we would actually have many state commits)

First question: How do we find the latest trusted state-signed commit? git log --oneline on nixpkgs already takes ~2s on my machine, if we need to search all of these files, removing a layer of base64 and parsing for state commits, I fear already this will take relatively long.

But the important question is the second question: once we have the latest trust-signed commit, how do we check all commits until the latest? It's basically running a state-machine from the latest trust-signed commit to the latest commit. Which isn't reasonable if my guess is correct and the only state-signed commit we see in practice is trust initialization. So we would need a way to cache results between calls (at least finding the latest state-signed commit and maybe a dump of the state of the state machine along with the commit on the refs/signatures branch at which it was computed). This would make the implementation… very complex, I fear :/

That said, apart from completely removing the with <commit-id> possibility, I don't see any way of reducing complexity, so… (and even then, having to store the state across invocations would be required to avoid having to re-check all signatures all the time)

@oxij
Copy link
Author

oxij commented Oct 5, 2018 via email

@oxij
Copy link
Author

oxij commented Oct 5, 2018 via email

@oxij
Copy link
Author

oxij commented Oct 5, 2018 via email

@Ekleog
Copy link

Ekleog commented Oct 5, 2018

Something like

SHOULD cause a critical error which MAY be turned into a warning, in case gpg guys break something, I guess

?

Agreed :)

Yes, my idea here is that the history would be linear and the tool would do the merges locally, similarly to subversion. As I said before, I want git wotr add to add a review like git add adds a file. You would then pull, git wotr commit (which just appends packets) and push. Something like git wotr sync could do all three together. This can be relaxed to a less ordered thing you want, but then you would have to verify the correctness and dates of merge commits... (see the note on "blockchain" thing below)

Hmm… This works well in a centralized system, where everyone pushes the reviews on the same branch. This works medium-well in a fully decentralized system (because the blockchain benefit is quite reduced, given that blocking a ! review only requires blocking reviews from one user -- in exchange it's likely harder to MitM people). But I wonder how that'd work for a semi-centralized system like nixpkgs.

Actually, thinking it over, there likely won't be that many issues. The biggest question will likely be thinking of how to handle multiple remotes, so long as we agree that each public repository would have its independent refs/signature branch (and I think we do)… but anyway we'd have to.

So, OK for me :)

In that case the following grammar also makes sense

commit <commit-id> (state|diff)
[with <commit-id> diff
with <commit-id> diff
with <commit-id> diff]

I.e. now you can sign the state, assuming the diffs are applied. Which might be useful.

Sounds good to me! indeed, signing the state assuming diffs are applied would likely be useful for when we find a serious CVE in the trust-initialization state commit, so as not to have to re-do the trust-initialization from scratch or live with an unsafe signed state commit :)

Okay. We can also make all of them required for simplicity, git will compress them away anyway when packing into deltas.

SGTM

I'll massage the RFC some more and publish the new version when I'll have a big enough slot of free time.

Great, thanks! Also, maybe consider giving it a git repository or whatever so that we can both easily refer to it henceforth and discuss on specific patches? :)

I guess you can try verifying all commits from HEAD down, making holes for all commits with "with ", until you hit a required depth or encounter a "state", then you would do a second pass in the opposite direction to cover the "with" holes.

Good idea, indeed! and actually the with <commit-id> could be checked by keeping in memory the list of already-met commit IDs as we'd be going down we'd have seen the <commit-id> commits before seeing the with <commit-id> commits.

@Ekleog
Copy link

Ekleog commented Oct 5, 2018

OK, so now that, I think, we agree on almost everything (except the question of whether tree-id or patch-id should be supported or not, but I guess that can be considered an extension by git-signatures so long as it's compatible with the message format), there is the major point (:D) to solve: what technical format?

Currently, three proposals have been raised:

  • JSON
  • Git commit-like objects
  • RFC822

I personally see no reason for RFC822: it's like the git commit-like objects, except with random whitespaces to make parsing harder.

This leaves us with either JSON and git-like commit objects.

To give an example of what the result would look like with the two tools:

commit 1234567890123456789012345678901234567890 diff
with 0987654321098765432109876543210987654321 diff
with 0987654321123456789009876543211234567890 diff
context-understanding high
diff-understanding high
thoroughness high
result +
result-otherwise -
priority high

My comment.

vs. JSON (which would be minified in practice)

{
    "type": "commit-diff",
    "oid": "1234567890123456789012345678901234567890",
    "with": [
        "1234567890123456789012345678901234567890",
        "0987654321123456789009876543211234567890"
    ],
    "context-understanding": "high",
    "diff-understanding": "high",
    "thoroughness": "high",
    "result": "+",
    "result-otherwise": "-",
    "priority": "high",
    "message": "My comment."
}

I… don't really care which format we use. Rust (the language I love nowadays) allows both to write a parser and to parse from JSON quite easily, so…

My “elegance” internal barometer prefers the git commit-like object. My “ease of use” internal barometer tells me most languages would find it much easier to just parse from JSON using a library than to write a full parser from scratch -- and if the language is not memory-safe, parsers are a well-known source of security issues. Also, JSON has the advantage of being more easily extensible should we want to extend the signature format later… but I don't really see why we would.

So… meh. Both ways work for me. But I'll note that we're already using nested structures with the with keyword, that is implicitly a list.

@lrvick
Copy link
Member

lrvick commented Oct 5, 2018

For signing artifacts, you need a setup that permits reproducible builds (and not things that will depend on eg. the version of the headers installed on your system). For this you need nix or a similar reproducible build tool. I will talk about nix because it's the one I know.

I have a number of repos I have deterministic building on. I type "make bin" and get the same binary from the same code every time. Then other people build those binaries and confirm they get the same hashes. We then have to certify those hashes came from that code by signing the code commit hashes with the binary hashes. This is imo a pretty good fit as extra metadata to put in the review signature for these use cases to do this as part of the review, as that is the stage that we do it.

Thanks for a review. I'll massage the RFC some more and publish the new version when I'll have a big enough slot of free time.

My comments were mostly the same as @Ekleog but overall this is a fantastic start. Thanks for that. Please feel free to make a new PR with this to maybe the "docs" folder in this repo so we can isolate spec discussions and review away from implementation concerns in this PR.

I am happy to create a "git-signature-spec" repo if we really want another repo this early. I granted you write permissions to this one for now.

@lrvick
Copy link
Member

lrvick commented Oct 5, 2018

Actually, thinking it over, there likely won't be that many issues. The biggest question will likely be thinking of how to handle multiple remotes, so long as we agree that each public repository would have its independent refs/signature branch (and I think we do)… but anyway we'd have to.

The 'patch' level anchored author/review + maintiainer certifying order with 'commit' level anchoring per my last use case under anchoring, would be a good fit for cherry picking across origins where contributors can only put changes in tree A but maintainers control tree B and certify patch ordering there and how changes interact with each other. I actually know a number of companies that do this for strong isolation of security domains and repo hosting with one public and one internal.

@lrvick
Copy link
Member

lrvick commented Oct 5, 2018

IMHO if you want to consider artifacts as part of the signature then you would have to make the tool verify said artifacts... For something like this I would just stick JSON/YAML/etc into the comment field.

Verifying the artifacts is trivial if you assume deterministic builds. (IMO any non deterministic builds are a security bug and one I spend weeks fixing for some projects).

If m-of-n reviewers (or CI servers) agree on the same hashes, then the given hashes are considered reviewed/trusted. Any downstream systems now know that is what was reviewed and if a new hash pops up saying it was for that given release/tag, it should be regarded as an imposter.

I have another solution for the organization with many in-flight diffs (described on IRC, but I don't think I've described it here). That's the setup I'm seeing for nixpkgs:

If I am following you: committer signs in a PR, reviewer signs, merge to a pending branch (that everyone somehow has access to without that being a problem?), commiter -re-signs-, then later maintainer signs as a part of a release tag etc. This seems much more complex than the workflow I presented without any security gain other than avoiding different anchoring levels for as far as I can tell no gain. In effect it seems like you are re-creating the same security levels at the branch level vs signature level which I am sure there are use cases for. That certianly can't meet mine or that of most orgs I am aware of in silicon valley.

Asking companies to totally abandon the git-flow it took them yaers to adopt is pretty rough. Asking them to add signatures at their regular steps in their usual flow I expect can be more widely and readily adopted and with my mitigations I don't think adds any attack surface over your presented flow.

I am however totally in favor of documenting all these potential workflows and security assumptions they make so people can make the right decision for their project as there are clearly multiple workflows that will end with the same security level. We do seem in strong agreement that 'commit' level signing is strongest and should be the thing that downstreams should trust be it on master branch, tags, etc.

@Ekleog
Copy link

Ekleog commented Oct 5, 2018

I have a number of repos I have deterministic building on. I type "make bin" and get the same binary from the same code every time. Then other people build those binaries and confirm they get the same hashes. We then have to certify those hashes came from that code by signing the code commit hashes with the binary hashes. This is imo a pretty good fit as extra metadata to put in the review signature for these use cases to do this as part of the review, as that is the stage that we do it.

So, your make bin downloads a fixed gcc (changes in versions of gcc will likely change optimized output code) and you vendor all your dependencies? If you don't do this, there are chances your builds have been reproducible only by chance. Then, this is only a side-track in the discussion.

Overall, the git commit signatures are designed to sign the contents of the git repository. I'm really not convinced that artifacts can be reviewed: signing the two in the same step is conflating the “a human verified this code” and “a computer generated this artifact from this code” concerns. Also, you're anyways going to provide a CONTENTS.digest{,.asc}, right? If so, you can just include the hash of the commit from which the build was made in this file, and now the signature for the artifact lives along the artifact, which IMO makes much more sense than it living along the source code.

Please feel free to make a new PR with this to maybe the "docs" folder in this repo so we can isolate spec discussions and review away from implementation concerns in this PR.

That's a good idea :) Though in order to have a vendor-neutral repository, I've created a repository at a created-for-this-purpose org: git-wotr/spec. @oxij, feel free to use this repository to push the RFC (or not), or to tell me you'd rather use the git-wotr name for your bash project, in which case I'll hand over the org to you (you came up with the name after all :p). I'm sending you two invitations to the org.

For the time being, I've initialized the repository with the GFDL 1.3 (which will require a header at the top of the RFC), and have restricted to only signed commits and only through reviewed PRs. All this is subject to change.

Actually, thinking it over, there likely won't be that many issues. The biggest question will likely be thinking of how to handle multiple remotes, so long as we agree that each public repository would have its independent refs/signature branch (and I think we do)… but anyway we'd have to.

The 'patch' level anchored author/review + maintiainer certifying order with 'commit' level anchoring per my last use case under anchoring, would be a good fit for cherry picking across origins where contributors can only put changes in tree A but maintainers control tree B and certify patch ordering there and how changes interact with each other. I actually know a number of companies that do this for strong isolation of security domains and repo hosting with one public and one internal.

I was thinking of having multiple remotes for the refs/signatures branch (ie. not have all the signatures centralized in one repo but be able to have non-committers users push reviews to their own repos, and other users to use them), so that's unrelated with patch-id signing :)

If m-of-n reviewers (or CI servers) agree on the same hashes, then the given hashes are considered reviewed/trusted.

This sentence shows clearly the conflation of concerns. Reviews are done by reviewers. Artifact signing is done by CI servers. Having reviews include artifacts requires people willing to sign an artifact to also sign a review, which is potentially false.

Also, there is a problem of trust. The fact that a review-including-artifacts hash is signed by enough people means:

  • For the review, that the diff between the previous commit and this commit is correct
  • For the artifact, that the state of the repository at this commit produces this artefact.

Overall, I can't find any similarity in the handling of the two things, so I don't understand why they should be put in the same signature: it just limits flexibility for everyone.

If I am following you: committer signs in a PR, reviewer signs, merge to a pending branch (that everyone somehow has access to without that being a problem?), commiter -re-signs-, then later maintainer signs as a part of a release tag etc

You're making it much more complex than it needs be. My setup is exclusively the three bullet points, preceded by an implicit “contributor sends PR”.

The contributor doesn't need to sign the PR, because no one would trust their key anyway, so it doesn't matter. And the pending branch can be accessed by any committer without any problem, because anyway it's not used by anyone and master will move forward only when the review signatures will be checked.

Asking companies to totally abandon the git-flow it took them yaers to adopt is pretty rough. Asking them to add signatures at their regular steps in their usual flow I expect can be more widely and readily adopted and with my mitigations I don't think adds any attack surface over your presented flow.

… if you read my flow, it's exactly the git-flow, except after the merge to pending the user adds in a signature to the commit. And sorry but the patch-id method without anchoring is flawed and I haven't seen any mitigation to prevent it yet.

@oxij
Copy link
Author

oxij commented Oct 5, 2018

@Ekleog Yay, GFDL! I pushed the original RFC to git-wotr/spec#1.

@lrvick
Copy link
Member

lrvick commented Oct 5, 2018

There are a lot of interesting subthreads going on here so I am going to file issues on git-wotr so we can discuss them in their own threads, because this is getting silly and important stuff gets lost :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants