-
Notifications
You must be signed in to change notification settings - Fork 12
Fix three zero days #13
base: master
Are you sure you want to change the base?
Conversation
Looks solid at first glance. Will look over in more detail tomorrow. Also:
Tab indentation is actually "standard" in bash because of heredocs. Indented heredocs are only valid with leading tabs, but not leading spaces. Switching between tabs and spaces mid code is just a mess, so I just resolve that bash is like Go and mandates tabs. In python I use 4 spaces though because PEP8 ;)
I would welcome a PR for that after we get this and the patch-id PR merged |
d601e33
to
a5ec96c
Compare
I got a bit tired regenerating keys by hand while continuing this branch. Added 0f816f8 in the middle of history here and changed later commits to respect it.
> rewriting this for bash -e, I frequently find errors this way,
I would welcome a PR for that after we get this and the patch-id PR merged
Ok.
> using four spaces for indent :)
Tab indentation is actually "standard" in bash because of heredocs. Indented heredocs are only valid with leading tabs, but not leading spaces. Switching between tabs and spaces mid code is just a mess, so I just resolve that bash is like Go and mandates tabs. In python I use 4 spaces though because PEP8 ;)
I'd prefer `usage` with heredocs being unindented than tabs everywhere, tbh.
|
6ac3023
to
ecbf2d2
Compare
Added another test that shows that my fourth zero day didn't work out.
Give this another 24h before merging just in case but I think this is now LGTM.
|
fa74809
to
8ab4140
Compare
Refactored tests, rebased to make the order beautiful, added two more failing tests and a fix.
Judging by GPG's docs there are two more non-tested cases: BADSIG and EXPSIG (EXPKEYSIG is tested now).
EXPSIG I'll manage later, but I tried and failed making it to produce BADSIG status no matter what I do.
(I tried to break a signature in a PGP packet, maybe trying to make a fake key would be easier, but I have no idea how to generate those cheaply enough for tests).
|
@oxij So a lot of work was going on in the sign-patch-id branch which just merged. Most of it should not touch your work, but can you rebase? |
Also most of us involved in the project are on irc.hashbang.sh/#! 6697 if you want to keep in sync at any point. |
I actually went this route initially, but realized it makes most CI systems hang until timeout as they become entropy starved. If you want to go this route that is totally cool, but you may have to update the CI jobs to setup haveged or something to keep the entropy pool full, or stub in something to spam to /dev/random to fill the pool before generating keys. |
prefix=$$HOME/.local | ||
bindir=$(prefix)/bin | ||
|
||
all: test | ||
|
||
test: | ||
bats test/test.bats | ||
make -C test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning for breaking out test to have its own makefile?
What is the reasoning for breaking out test to have its own makefile?
Tradition, simplicity, I guess. I made the tests directory mostly self-sufficient and it became easier to add a Makefile there than `cd tests` everywhere else.
CI and randomness
Yeah, I'll add writes to /dev/random after I hit the issue, works for me now.
So a lot of work was going on in the sign-patch-id branch which just merged. Most of it should not touch your work, but can you rebase?
I understand that this now got awkward, and all, but I read the changes and I disagree with that direction.
I just fixed a bunch of security issues here (the replay and "silly names" tests were the funniest) and that branch introduced yet another one by allowing out-of-order application of patches =/. As I pointed out (albeit in question form) in NixOS rfcs thread, IMO patch-id signing should, if at all, be implemented on top of revhash signing in opt-in manner (that is, revhash signature should permit reapplying it to a patch with the same patch-id), not vice versa. I want revhash-based semantics 99.999% of the time and I don't see any elegant solutions for going from patch-id-based system to revhash based system. Hence, I disagree.
I also don't like the "JSON" direction you outlined in NixOS thread. I think that JSON is cancer and that signed message format should be fixed in stone, and changes to it should be versioned on per-notes-branch basis, not per-revhash basis. Again, for security reasons.
I'm sorry, but I'm very opinionated on such topics and I can't make myself see any point in rebasing this on top of current master now. I'm going to continue by myself, let the community judge the results, I guess.
However, feel free to take anything you like from here, and, later, from the rest of my fork.
IRC
Believe me, you don't want me on your IRC. :)
|
We've seen worse. Feel free to join us: ircs://irc.hashbang.sh:6697/#!,#!social - TLS required. |
@oxij I don't think forking is required, or that JSON is required. If there is another simple to parse key/value format you have in mind, let's hear it. I would like to see an alternative suggestion rather than just declaring the current suggested format cancer :) I also agree that we need revhash on top of patch-id. I was going to address that in my next revision, but wanted to get patch-id in first now that we proves that out as a standalone solution. It also has very clear security issues (can merge things out of order) thus me wanting to get your changes in that don't pertain to the actual signature format, then we can address that in a follow up PR where we have a key value map that we sign that contains the master head ref, the patch-id, and the current ref, with room to easily add more future key/value pairs for some future use cases that are out of scope atm. I really value your input here. If you can elaborate on a specific attack, then lets solve for it. Otherwise we are going to end up with 2 basically identical projects with slightly different signature formats, which is still. |
To elaborate further, lets say we sign and base64 key/value map that contains:
This map could be msgpack, json, some tupl thing. I don't actually care that much as long as it is easy to parse in a wide range of programming languages. Wiith that assumption we end up with multiple possible verification modes with different tradeoffs, and I think we can easily support all of them in future Commit Ref
Pros
Cons
Patch-id
Pros
Cons
Patch-id + master HEAD
Pros
Cons
Hybrid Verification
Pros
Cons
|
@oxij Just in case because I was the one who pointed out JSON, using the JSON format for serialization doesn't mean the format cannot be fixed in stone. It just means we can (/must) use a JSON parser for handling it, in exchange for a few bytes of storage (basically, the So it's mostly a “do we want to be able to use a JSON library in exchange for being more or less forced to use a JSON library?” question. I personally think that if deciding that the format must be an object with only string values without escapes and no superfluous spaces, then it's not harder to parse than git objects: if I can allow myself to mix syntaxes for hopeful clarity, it gives this parser:
So to sum up, my comparison between base64-ed git objects (hereafter b64go) vs. restricted JSON (hereafter rjson) formats yields:
All in all, I don't see a single reason to prefer b64go over rjson, apart from the fact that I agree it's very annoying that people use JSON everywhere without thinking of it. Do I miss a point? :) (sorry, I managed to restrict myself on the RFC thread, but it looks like it actually got out here… well, at least here it's somehow on-topic :)) |
@lrvick Patch-id, in my opinion, brings much more complexity (and thus occasions for vulnerabilities) for almost no benefit. From your list:
I personally think that supporting a highly counter-intuitive signature scheme is not worth the minor benefit from hybrid verification systems, as anyway it's not more secure than commit-ref (actually, it's less) and the convenience advantages are limited (anyway, a committer will likely merge around the same time as they sign, and multiple-signature schemes need to be handled at the fetching end anyway so there's no TOCTOU) |
Actually, an additional concern about hybrid signing mode: it can be used by a maintainer to put the blame on another, by using TOCTOU. Example:
So even it is not secure, so I cannot see a single advantage in using patch-id signing. |
JSON
I'm not like super-biased against JSON, I am a bit biased. But my thinking here goes like this: when in Rome do like Romans. git objects are "<type> <length>\0" header followed by "[<key> <value>]*" fields. I see no reason to use something different here (except "<type> <length>\0" is not needed as the first is obvious and the second is provided by the PGP packet), parsing those in bash is easy enough.
@lrvick
patch-id
@Ekleog pretty much wrote my thoughts on the subject.
- Counter-intuitive for git users.
- Simple stuff gets complicated. Your scheme with two more hashes feels like you made git-signatures for darcs VCS and now trying to adapt it to git. Ugly.
- Computation of patch-ids themselves is complicated, merged PR itself changed diff-tree options several times.
- I see no good uses for them. Signed patch-id literally has the following semantics: "I sign any repository state that can be made with this applied". I.e. an infinite number of repository states. This is ridiculous for any sufficiently large code base.
- It interacts badly with lots of other stuff. E.g., I can make a commit with renames, pick a patch-id-signed commit from any other place in history, revert the renames. I just applied your change with your signature to a random similar-enough file. Fill in a bunch of out-of-context commits in between and you got a nice way to introduce backdoors, all you need to do is find a series of patches in history that can be chained to make a change you want and make maintainers sign some trivial renames. In a long enough history like nixpkgs you could then find a series of changes for anything. I'm sure there are more examples like this.
If you need that feature, it's much simpler to just make a separate signature type for it. But I don't think I would ever need it. Backports need to be checked and signed independently anyway.
|
@oxij we had a chat on IRC: I think @Ekleog and I can agree that at least a minimum that we sign the tree object. I think @lrvick is now on board with this. Whether the parents should be signed is what myself and @Ekleog disagree on: we could make that configurable. |
Need `bash -e` ASAP.
Also produce cleaner status codes.
This only adds overhead as git compresses its objects anyway.
So I guess we now only have to agree on a data format, so that all our tools end up being interoperable :) First, the non-controversial parts: what fields should we include? I can think of (also taking ideas from crev, which has the difference that it's attempting to review the whole state of the repo instead of split it by commit):
I wonder whether we should have some default values for review thoroughness and context understanding, or whether it's reasonable to force them being present in the signed blob. And the second likely non-controversial choice: where should we put the signatures? I would say each signature on one line, sorted in a file named Once we get that agreed, we will be able to reasonably go to the question of rfc822 vs git-object-like vs JSON, which will be very happy I'm sure :D |
So I guess we now only have to agree on a data format, so that all our tools end up being interoperable :)
Yes.
So, I've read crev source and I agree with their idea that trust level in the key and trust level in the judgment are different things. So, I guess, we also need that. The simplest solution I see is just use two gpg trustdbs: one for "keys trust" and a second one for "judgment trust". I think there should be no arguments against using `--export-ownertrust` format for that, right? We would also need to agree on `git config` options for those `--trustdb-name` and related things.
So to verify a repo in nixpkgs case you would do something like
```
cd nixpkgs-keys
for key in keys/*.key; do
gpg --import "$key"
done
# trust in keys belonging to whom they claim
cat bootstrap | sed 's/$/:6:/' | gpg --trustdb-name nixpkgs-keys --import-ownertrust
# trust in those people understanding what they are doing
cat bootstrap | sed 's/$/:6:/' | gpg --trustdb-name nixpkgs-judgments --import-ownertrust
cat trusted-contributors | sed 's/$/:6:/' | gpg --trustdb-name nixpkgs-keys --import-ownertrust
cat trusted-contributors | sed 's/$/:4:/' | gpg --trustdb-name nixpkgs-judgments --import-ownertrust
# and etc ...
cd nixpkgs
# wotr for "WoT reviews", subject to change, but I like it
git config wotr.keys-trustdb nixpkgs-keys
git config wotr.judgments-trustdb nixpkgs-judgments
git config wotr.trust-model pgp # the default, can also be tofu or whatever
git config wotr.completes-needed 2
git config wotr.marginals-needed 4
```
Hydra only does the above, users may add more keys and more/less trust as they wish.
First, the non-controversial parts: what fields should we include?
Btw, `crev-data/src/proof/review/mod.rs` has "thoroughness", "understanding", "trust", and "distrust". I'm like "WTF"? The "distrust" is clearly an artifact of their "low", "medium", "high" enum, but what does "trust" in my own review even mean? I disagree with that terminology.
I agree, however, that splitting my "level" into "thoroughness" and "understanding" might be useful, though.
So, here comes a draft of my design RFC:
---- BEGIN RFC ----
# Definitions and data formats
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 when, and only when, they appear in all capitals, as shown here.
The plain-text data thing in yet undecided format described below is a "review", review is signed by gpg and turned into a "packet" (PGP packet of the review and signature together), that is then base64 encoded and is appended to "packet-list", which is stored in a separate git branch called "review-branch" indexed by commit-id (before patch-id changes were merged, currently managed by git-notes, but that might change). All date values are UNIX timestamps.
# Verification Process
Commits in review-branch MUST be signed, ALL commits MUST be validated before interpreting, invalid signature on review-branch commit MUST cause a critical error.
Commits in review-branch MUST monotonically increase their "author" and "commiter" dates by alternating the two types. I.e. the following MUST be observed:
0 <= commit_1_author_date <= commit_1_commiter_date <= commit2_author_date <= commit2_commiter_date <= ...
Any other sequence MUST cause a critical error.
Packets MUST always appended to the end of the packet-list, insertion anywhere except at the end of the packet-list MUST cause a critical error.
Packets with signatures by the same key in review-list MUST monotonically increase their packet's "date" field, failure to observe that MUST cause a critical error.
The "packet's-commit-date" is the earliest of "commiter" or "author" dates of the commit introducing the given packet into the review-branch.
Packet's "date" field MUST be <= than its "packet's-commit-date", failure to observe that MUST cause a critical error.
Note that the above still allows packet's "date" field to be interpreted as a "serial-number" as long as the packet itself was not signed with "expire date" field set, which is discussed below.
When interpreting packet-list only the packet with largest "date" field (serial-number) MUST be considered for each signing key, all other packets for the same key MUST be ignored.
When verifying a packet "GOODSIG" gpg status means the packet SHOULD be accepted as authoritative, "EXPKEYSIG" is explained below, "EXPSIG", "REVKEYSIG" and "ERRSIG with NO_PUBKEY" means the packet MUST be ignored, all other statuses, including "BADSIG" and "ERRSIG without NO_PUBKEY", MUST cause a critical error (can be made into a warning, in case gpg guys break something, I guess).
"REVKEYSIG" MAY cause a warning.
Note that by using "expire date" in the packet and "EXPSIG" gpg status code you can have reviews that automatically rescind themselves. I doubt this would be used much, but it can be useful in some use cases, no need to prohibit that, I think.
"EXPKEYSIG" gpg status code means the key's own signature has expired, it SHOULD be interpreted as "GOODSIG" if packet's-commit-date is <= than key's expiration date, MUST be interpreted as "ERRSIG with NO_PUBKEY" otherwise.
To reiterate, the following (among other things, i.e. not "iff") MUST be observed for packet to be accepted as authoritative:
packet's "date" field <= packet's-commit-date <= author's key own "expire date"
Failure to observe the first comparison causes a critical error, failure of the second only causes the packet to be ignored. (Because expired key can be extended after the fact by re-self-signing.)
Failure to interpret the contents of "GOODSIG" review MUST cause a warning and MUST cause such a review to be ignored.
Side note:
If we allow a review encoded in the packet to have its own "date" field (which I don't like for the following), it MUST be <= packet's-commit-date and >= packet's "date" field, failure to observe that MUST cause a critical error.
I.e. the following MUST be observed:
packet's "date" field <= review's "date" field <= packet's-commit-date <= author's key own "expire date"
The problem with allowing that is you would have to interpret non-authoritative packets too to keep the above semantics of "everything that looks like MITM causes a critical error so that everyone would immediately notice". Hence, I'd rather not have these dates there.
## Review format (given in git-object-like form)
Things after "#" and whitespace before are not in the grammar (comments)
```
<header>
<metadata fields>
<comment>
```
Should be read as "By signing this text I certify that I validated the following ...".
### Header
Either
```
commit <commit-id> diff
[with <commit-id> diff
with <commit-id> diff
with <commit-id> diff]
```
meaning that the reviewer certifies the "diff" of this "commit" when applied to the parent of the first "<commit-id>" together "with" commit "diff"s of the later "commit-id"s
or
```
commit <commit-id> state
[of <filepath relative to repository root>
of <filepath>
...
of <filepath>] # empty list means "the whole thing"
```
meaning that the reviewer certifies the "state" "of" the following "filepaths" in "<commit-id>" tree.
Can be similarly extended for `tree <tree-id> diff`, `tree <tree-id> state`, `patch <patch-id> diff`. As you see, the difference between "diff" and "patch" is intentional here, "diff" is "change" in the abstract sense (I can agree to replace "diff" with "change" even), while "patch" is "exactly this patch file".
## Metadata fields
```
# "while having archived the following" when reading "... I certify ..."
[context-understanding (medium|high|author)] # low by default, indicates understanding of the context of the diff
[diff-understanding (medium|high)] # low by default, indicates understanding of the diff itself
[thoroughness (medium|high)] # low by default, indicates the effort spent checking it does what it claims to do
# "with the following" when reading "... I certify ..."
[result (!|-|+)] # 0 by default, this rough equivalent of "trust" of crev
[result-otherwise (!|-|+)] # "!" by default with header conditions, "0" without header conditions, this line MUST not be present without header conditions, this is the "result" when not all header conditions are met
# "and I think this should be assigned the following"
[priority (medium|high|panic)] # low by default, doesn't influence anything, should be used for grabbing attention of other reviewers
```
"context-understanding":
- "low" = "I might have seen this subsystem before.",
- "medium" = "I looked into this subsystem intentionally and maybe made little changes there before.",
- "high" = "I made big changes there before.",
- "author" = "I wrote a good chunk of this subsystem myself, I know it in and out".
"diff-understanding": self-explanatory.
"thoroughness":
- "low" = "glanced over",
- "medium" = "read, evaluated/tested",
- "high" = "spent a bunch of time playing with it".
"result":
- "!" = "has a security issue!",
- "-" = "I disapprove",
- "0" = "FYI",
- "+" = "I approve".
"priority": self-explanatory.
## Examples
"Nothing looks bad, but don't trust me" would be
```
<header>
#context-understanding low
#diff-understanding low
#thoroughness low
result +
LGTM.
```
Drive-by LGTM in a system you know well would be
```
<header>
context-understanding high
diff-understanding high
#thoroughness low
result +
LGTM.
```
Thorough LGTM in a your own subsystem would be
```
<header>
context-understanding author
diff-understanding high
thoroughness high
result +
Running on top of this for a month.
```
"Nothing looks bad, but please can someone else review this" would be
```
<header>
#context-understanding low
#diff-understanding low
#thoroughness low
#result 0
priority high
Please, can somebody else review this?
```
"This looks very bad!" could be
```
<header>
#context-understanding low #or something else
#diff-understanding low #or something else
#thoroughness low
result !
priority panic
Backdoor?
```
"This looks a bit bad" could be
```
<header>
#context-understanding low #or something else
#diff-understanding low #or something else
#thoroughness low
result !
#priority low
Combined with X can lead to a low-impact security issue Y.
```
"WTF is this? This should be reconsidered!" could be
```
<header>
context-understanding high
diff-understanding medium
#thoroughness low
result -
priority panic
Please, revert! I think this will break X, which would be very high-impact!
```
CVE:
```
commit <commit-id> diff
with <commit-id with a fix>
context-understanding high
diff-understanding high
thoroughness high
result + # if "with" commit is applied
#result-otherwise !
priority panic
This commit has a very high-impact open CVE! Fixed in <commit-id with a fix>, apply that immediately!
```
"I disapprove, here is a fix in my repo":
```
commit <commit-id> diff
with <commit-id with a fix>
context-understanding high
diff-understanding high
thoroughness high
result +
result-otherwise -
priority high
I think this should be reverted and done via another approach. See <commit-id with a fix> in my repo.
```
---- END RFC ----
I wonder whether we should have some default values for review thoroughness and context understanding, or whether it's reasonable to force them being present in the signed blob.
I opted in for "low". But this can be reconsidered, indeed.
|
Whoops, looks like message went out too early… please disregard previous message if you received it by email. Comments about the RFC coming in later when I've finished writing them.
Hmm… Actually, I'm not really sure we need keys trust? If judgement trust is associated to a public key anyway, then there have been manual vetting of said public key (to attribute it a judgement trust), so we wouldn't need to also verify the key (actually, I may decide to trust your public key if I see you doing good reviews, without ever having checked your identity and thus not having signed your key, and even less given it ownertrust-in-the-WoT-sense)
Actually… GnuPG is not the only OpenPGP client, and sequoia-pgp is getting closer to being a usable OpenPGP client that may end up sucking less, so I'm not really sure about that. Also, I wonder: anyway we won't be able to re-use GnuPG's WoT computation code for signatures… would we? If we won't, then why not just store them in a fingerprint:judgement trust: format? This way it's both trivial to parse and yet be trivial to import into an ownertrust db if need be (we could also decide to remove the trailing :, potentially)
I first read w-OTR (off-the-record)… though I don't have any better idea, apart from just “wot” which would be potentially ambiguous. |
So while I don't intend on implementing this for a few weeks or want to further derail things, I want to point out that I do have one strong use case for hierarchical data: projects that yeild compiled artifact or artifacts. I would want to support an optional flag to include a hash map of a directory full of output artifacts. If multiple CI systems or reviewers agrees on the output binary hashes we now can attest 2 things strongly:
Those signatures could be detached with the repo and exported to allow the binaries to be verified independantly downstream and anchored back to the repo ref they were compiled from. I experimented with the following format today as a single json line incorperating ideas from @oxij with mine: {
"body":{
"commit":"d27e15a09812241167a61c3222cb6e97c5b0d6b0",
"tree":"58215f173d6ef6892a07c9073d55145f387d1968",
"patch":"068a37e54586de8339f13ec980d31f6c30b6f6e7",
"date": "2018-10-05T03:01:46Z",
"review": {
"context-understanding": "author",
"diff-understanding": "high",
"thoroughness":"medium",
"result":"+"
},
"artifacts": {
"out.tar.gz":"2ad6f470b5398251018a4c25f5eb35686481681f"
}
},
"sig":"iQJFBAABCAAvFiEEZ1U/vaRrtxq9LgsLjkeh7DWhVR0FAlu2h3IRHGxhbmNlQGxydmljay5uZXQACgkQjkeh7DWhVR0K9Q//T+UleZFWlbiIFoUBp1hX0xzg/eEaTFnnlCdWWaa5f1E+TI80Pg/wXhpEUd98ghZThXwaWJwnPp34BpZ55GU5Qr1cXjY00Yt7I0p3a5TUsdk5FP8tiUWNQ9kA6npUOIVixa8HFkhz0SvHvXNAvWsasqw6nb+SNfA+aQYIP/wAY71ZpMLkJsMQssaFWsjL/hsSmOpi7VyLEFEuUmEM1CA6VPiMvp4rvP75Bt4DnEouyFPxk4WbVIqX4DIeG8+5W8jQCLlxV6HDH4g+POdZbD7/Yg6XjDKEfD2vM/OYrBczPZX0Tu1WhVyYOtQ/WsAr9wzZjgl50/9UkP6E4pt9ft42jdNhGcO1sEwzlC5W1efC2izjL4medEiLe00zIM0L42X10HebH22j4gKyb4EErSHo0H5eLisf1xAwAHiOE2wJou1SR5dVmKydXicVN3wuLqhzc52awmoOaEts1Q3zFyB8MfDIi39CACwAvOTrKw54raoGgzxDEFEP3sKWF1FuZQJreG1Ufg57Oxv5f1lOqvaoMN7ZNjcQkPUAqj7G7e/fwyXGCG2mvnz++D5/2JAMeeO4k8iwxGXEsF0qD3QveEZPFoIpPZX+uOGqM9iIPDJd9bGu0xI5yWeqLpbdXdm4ClsiRWOmavUvjm8RK99o7I9EvctCzrIrGnwlySZ0+Po856I="
} I suggest of the above only Simple and high security by default with flexibility and some degree of future-proofness for real world use cases. Now for the most controversial part of the above, is my re-introduction of (optional) patch id support. @tylerlevine made some strong points on the #! IRC today. It made me realize we actually have 3 verification levels possible which each have different use cases and tradeoffs. I have taken to calling this the "anchor" status. The signature will be valid, but the anchor status will tell you at what security level it is anchored to the current tree we are validating it in. Stick with me as I try to lay down some context... So there are three descending levels of verification possible and only the highest level needs to be reported. Trust Anchor levelscommit
tree
patch
Use CasesSmall team willing to allow only one "ready to merge" changeset at a timeUse "commit" level anchoring and truck on. It is the default and you need Above small team that also wants to be able to squash and amend commentsConsider "tree" level anchoring which offers greater flexibility. Understand the tradeoff here is that a bad actor can squash all commits into An organization with many in-flight diffs and reviews merging out of orderConsider "patch" level anchoring for authors/reviewers. Understand that this can only certify that a given changeset was authored and You -must- pair this with a CI system or maintainer(s) that will do one of:
If you fail to do one of the latter steps to prevent history mutation you are |
Re Licensing: Apache2/MIT are near universally automatically okayed so I favor that approach. Can just copy/pasta from Rust on this one imo. |
RFC
I… don't really like this, because it means we can't, by design, do the cat_uniq_sort merge -- which would likely turn out to be useful for simplifying merges. Unless we forbid merge commits on “review-branch” (which seems consistent with the date requirement above?) and have the tool handle the merge itself, which would start to complexify either the use or the implementation… potentially more than just saying the list is unordered and must be ordered by rfc4880 signature creation time (or date, see below). About dates
Suggestion: make monotony strict. I'm not 100% sure whether the english usage for “monotony” already is strict, but we lose nothing by mentioning it explicitly.
Suggestion: put this at the top. While reading I had started writing a long paragraph about dates and how your scheme was uselessly complex, until I came upon this and understood the reason of it all. Also, I think the date handling paragraph could be split off and made explicit with the 4 dates that would end up present: git commit date, git author date, RFC4880 creation date, date inside the signed blob. This is a non-trivial part of the RFC.
This part makes me think you only want 3 dates, ie. remove the date inside the data. Is that so? Sounds good to me. But TBH I'm not sure I understand all of the date setup. It sounds like a part you've really well thought out, but it's interspersed with the rest of the text and I can't easily imagine the big picture (and IMO the RFC should be easily understandable, otherwise we're bound to have interoperability or security issues).
If we're going to keep the signed-data-date-field (assuming you mean “review's date field” = signed-data-date-field and “packet's date” = RFC4880 creation date), then I think we should swap the first two members: first one would create the review, and only after sign, so the naive and simplest behaviour would be incorrect as per your specification. Signature verification
Hmm… there may be ambiguity here for a masterkey with two signing subkeys. I think we agree we want to allow only one signature per masterkey, so maybe something like “When interpreting packet-list, only the packet with the highest packet date MUST be considered for each RFC4880 master key. All other packets by the same key MUST be ignored. In addition, packets by keys for which no judgement trust has been assigned SHOULD be ignored.”
This is GnuPG jargon. Maybe use RFC4880 terms instead? Then it's kind of obvious what these means wrt. RFC4880, so I guess that's not an important change.
-> SHOULD and remove the parenthesis?
-> SHOULD for the first MUST: we may want fully automated setups where no one would be there to check the warning… and failure to interpret the contents of GOODSIG is not a security issue by itself, just proof of a bug in either the sender or the recipient. Format
OK, so after a few minutes of thought, I think I understand what you mean (undertone: this needs clarification :p). It's the “trust this commit iff. you're evaluating the security of a commit that already includes these other commits”, right?
I… am not sure
then I'm not sure it would make sense to consider the security of commit 2 as equivalent to “filepaths A and B state-signed”. Also, handling these in a non-counter-intuitive way different from just ignoring them sounds like it'd add quite a lot of complexity. In my mind, we're better off leaving this to crev and concentrating on reviewing commits only, and not try to merge the two concerns into a single tool… but maybe I'm missing a use case? That said, we do need state signatures that sign the whole repository at least.
I think we should allow an explicit
s/MUST not/MUST NOT/ Also, are header conditions the
Well, I know for my tool, if I get around to doing it (not really good at doing stuff nowadays, too much things flying around…), I'll just output the values all the time. IMO default values would be a UI question that would thus be left implementation-defined. |
For signing artifacts, you need a setup that permits reproducible builds (and not things that will depend on eg. the version of the headers installed on your system). For this you need nix or a similar reproducible build tool. I will talk about nix because it's the one I know. Nix builds by taking the nix expression, building a .drv of it, and then building the package. The point is, you only need to verify the .drv <-> built package association to have complete commit <-> package security, because the .drv is derived from data present in the commit and is enough to ensure that the whole build setup is the same. In particular, .drv's already contain the repository commit. (when they're written to be reproducible, it's also possible to not write them this way but then it'll likely not be reproducible) tl;dr: I think signing build artifacts is orthogonal to signing the code.
I have another solution for the organization with many in-flight diffs (described on IRC, but I don't think I've described it here). That's the setup I'm seeing for nixpkgs:
This is, actually, similar to what hydra is doing with tests. Obviously, CI could also run before the merge on An alternative setup, a bit more complex, that would avoid this issue:
|
@oxij Now, an old concern coming back: performance. How do we handle verification? My understanding is we would start from the latest First question: How do we find the latest trusted state-signed commit? But the important question is the second question: once we have the latest trust-signed commit, how do we check all commits until the latest? It's basically running a state-machine from the latest trust-signed commit to the latest commit. Which isn't reasonable if my guess is correct and the only state-signed commit we see in practice is trust initialization. So we would need a way to cache results between calls (at least finding the latest state-signed commit and maybe a dump of the state of the state machine along with the commit on the refs/signatures branch at which it was computed). This would make the implementation… very complex, I fear :/ That said, apart from completely removing the |
strong use case for hierarchical data: projects that yeild compiled artifact or artifacts.
IMHO if you want to consider artifacts as part of the signature then you would have to make the tool verify said artifacts... For something like this I would just stick JSON/YAML/etc into the comment field.
I would want to support an optional flag to include a hash map of a directory full of output artifacts.
Similarly.
If multiple CI systems or reviewers agrees on the output binary hashes we now can attest 2 things strongly:
* this code builds deterministically
* these binaries came from this code
Yes, but that has almost nothing to do with code signatures. So, similarly.
we actually have 3 verification levels possible which each have different use cases and tradeoffs
Correct, but my argument is that 2 out of 3 make no real sense. I strongly prefer to start with commit-id-only system and keep it there as long as possible (it doesn't mean we should not consider it, I'm just saying it a very low priority thing).
## Use Cases
### Single contributor willing to mere one changeset at a time
Use "commit" level anchoring and truck on. It is the default and you need
special flags to verify any other way.
Sounds good.
### Single contributor that wants to be able to squash and ammend comments
Consider "tree" level anchoring which offers greater flexibility.
Why? If you are the author, you can squash and resign.
### An organization with many in-flight diffs and reviews merging out of order
Consider "patch" level anchoring for authors/reviewers.
Why? Original patches would be signed by the original author, merge commits would be signed by maintainers (and, later, maybe by the original author).
Apache2/MIT are near universally automatically okayed so I favor that approach. Can just copy/pasta from Rust on this one imo.
But this is not a library, this is a tool.
Most companies avoid GPL like a cancer. Legal teams panic. Seen it many times. I don't think that panic is justified but GPL suit FUD makes it really hard to get GPL tools approved in companies.
Then why do they even use GCC, coreutils, bash, grep, utillinux, ..., Linux, GPG, git, etc? They are always welcome to the BSDs.
|
> Packets MUST always appended to the end of the packet-list, insertion anywhere except at the end of the packet-list MUST cause a critical error.
Unless we forbid merge commits on “review-branch” (which seems consistent with the date requirement above?) and have the tool handle the merge itself, which would start to complexify either the use or the implementation… potentially more than just saying the list is unordered and must be ordered by rfc4880 signature creation time (or date, see below).
Yes, my idea here is that the history would be linear and the tool would do the merges locally, similarly to subversion. As I said before, I want `git wotr add` to add a review like `git add` adds a file. You would then pull, `git wotr commit` (which just appends packets) and push. Something like `git wotr sync` could do all three together.
This can be relaxed to a less ordered thing you want, but then you would have to verify the correctness and dates of merge commits... (see the note on "blockchain" thing below)
> Packets with signatures by the same key in review-list MUST monotonically increase their packet's "date" field, failure to observe that MUST cause a critical error.
Suggestion: make monotony strict.
Right, fixed.
> "EXPKEYSIG" gpg status code means the key's own signature has expired, it SHOULD be interpreted as "GOODSIG" if packet's-commit-date is <= than key's expiration date, MUST be interpreted as "ERRSIG with NO_PUBKEY" otherwise.
Suggestion: put this at the top. While reading I had started writing a long paragraph about dates and how your scheme was uselessly complex, until I came upon this and understood the reason of it all.
Hm, I'll think about it.
Also, I think the date handling paragraph could be split off and made explicit with the 4 dates that would end up present: git commit date, git author date, RFC4880 creation date, date inside the signed blob. This is a non-trivial part of the RFC.
Okay.
> If we allow a review encoded in the packet to have its own "date" field (which I don't like for the following), it MUST be <= packet's-commit-date and >= packet's "date" field, failure to observe that MUST cause a critical error.
This part makes me think you only want 3 dates, ie. remove the date inside the data. Is that so?
Yes, I see no point in it, and ...
But TBH I'm not sure I understand all of the date setup. It sounds like a part you've really well thought out, but it's interspersed with the rest of the text and I can't easily imagine the big picture (and IMO the RFC should be easily understandable, otherwise we're bound to have interoperability or security issues).
> packet's "date" field <= review's "date" field <= packet's-commit-date <= author's key own "expire date"
If we're going to keep the signed-data-date-field (assuming you mean “review's date field” = signed-data-date-field and “packet's date” = RFC4880 creation date), then I think we should swap the first two members: first one would create the review, and only after sign, so the naive and simplest behaviour would be incorrect as per your specification.
... that's exactly the problem. If you want to interpret packet dates as "serial numbers" you need
> packet's "date" field <= review's "date" field <= packet's-commit-date <= author's key own "expire date"
to meet the monotonicity requirements, but semantically
> review's "date" field <= packet's "date" field <= packet's-commit-date <= author's key own "expire date"
makes more sense because you first review, then sign. (Which, of course just means that you actually want `review's "date" field == packet's "date" field`, hence no need for two.)
So, yes, let's just drop the second date.
Sounds good to me.
Good. :)
Btw, the alternative to the described setup is to do
review's "date" field <= packet's "date" field <= author's key own "expire date"
and ignore the commits, but then we would get problems with agreeing on the set of reviews. All that linearity of the history is to make sure everyone agrees on the state of the common state, i.e. literally the "blockchain". (I should do an ICO and ask for VC funding somewhere around now, yes.)
The key point is that by commiting to a common chain with the outlined rules of verification you can be sure that nobody censors your reviews, which is kinda the point for "!" reviews.
The same result can be archived by verifying that merge commits don't add or drop any packets and preserve the partial ordering on packets in packet-lists and dates, but that's way too hard for a first version, especially when it's written in bash.
> When interpreting packet-list only the packet with largest "date" field (serial-number) MUST be considered for each signing key, all other packets for the same key MUST be ignored.
Hmm… there may be ambiguity here for a masterkey with two signing subkeys. I think we agree we want to allow only one signature per masterkey, so maybe something like “When interpreting packet-list, only the packet with the highest packet date MUST be considered for each RFC4880 master key. All other packets by the same key MUST be ignored. In addition, packets by keys for which no judgement trust has been assigned SHOULD be ignored.”
Agreed.
> GOODSIG, EXPSIG, …
This is GnuPG jargon. Maybe use RFC4880 terms instead? Then it's kind of obvious what these means wrt. RFC4880, so I guess that's not an important change.
Okay.
> all other statuses, including "BADSIG" and "ERRSIG without NO_PUBKEY", MUST cause a critical error (can be made into a warning, in case gpg guys break something, I guess).
-> SHOULD and remove the parenthesis?
Something like
SHOULD cause a critical error which MAY be turned into a warning, in case gpg guys break something, I guess
?
> Failure to interpret the contents of "GOODSIG" review MUST cause a warning and MUST cause such a review to be ignored.
-> SHOULD for the first MUST: we may want fully automated setups where no one would be there to check the warning… and failure to interpret the contents of GOODSIG is not a security issue by itself, just proof of a bug in either the sender or the recipient.
Yeah, but I'd rather see those warnings if that's the case so that I could fix the tool immediately. But okay.
> meaning that the reviewer certifies the "diff" of this "commit" when applied to the parent of the first "<commit-id>" together "with" commit "diff"s of the later "commit-id"s
OK, so after a few minutes of thought, I think I understand what you mean (undertone: this needs clarification :p). It's the “trust this commit iff. you're evaluating the security of a commit that already includes these other commits”, right?
Yes. Though I read it as "trust this commit iff (with) those later commits are applied too".
> meaning that the reviewer certifies the "state" "of" the following "filepaths" in "<commit-id>" tree.
I… am not sure `<filepath>` is a route we want to go. It's quite hard to evaluate a file all by itself, ...
Hm, agreed. But I kinda want a way to specify "how much" of the context did you check.
That said, we do need state signatures that sign the whole repository at least.
Agreed.
In that case the following grammar also makes sense
```
commit <commit-id> (state|diff)
[with <commit-id> diff
with <commit-id> diff
with <commit-id> diff]
```
I.e. now you can sign the state, assuming the diffs are applied. Which might be useful.
> [context-understanding (medium|high|author)] # low by default, indicates understanding of the context of the diff
I think we should allow an explicit `low` (same for all other fields)
Okay. We can also make all of them required for simplicity, git will compress them away anyway when packing into deltas.
> [result-otherwise (!|-|+)] # "!" by default with header conditions, "0" without header conditions, this line MUST not be present without header conditions, this is the "result" when not all header conditions are met
s/MUST not/MUST NOT/
Fixed.
Also, are header conditions the `with <commit-id>` tags? (I don't see what `of <filepath>` would integrate here) If so, it'd likely be better to make it explicit :)
Yes, they didn't integrate together.
> [about priorities] I opted in for "low". But this can be reconsidered, indeed.
Well, I know for my tool, if I get around to doing it (not really good at doing stuff nowadays, too much things flying around…), I'll just output the values all the time. IMO default values would be a UI question that would thus be left implementation-defined.
I see, okay.
Thanks for a review. I'll massage the RFC some more and publish the new version when I'll have a big enough slot of free time.
|
> The simplest solution I see is just use two gpg trustdbs: one for "keys trust" and a second one for "judgment trust".
Hmm… Actually, I'm not really sure we need keys trust? If judgement trust is associated to a public key anyway, then there have been manual vetting of said public key (to attribute it a judgement trust), so we wouldn't need to also verify the key (actually, I may decide to trust your public key if I see you doing good reviews, without ever having checked your identity and thus not having signed your key, and even less given it ownertrust-in-the-WoT-sense)
Makes sense. Agreed.
How do we handle verification? My understanding is we would start from the latest `commit .* state`-signed commit that's trusted according to the judgement trust database (without `of <filename>` restrictions), and then advance commit by commit checking each commit.
Yes.
(BTW, given each discovered CVE would potentially require re-signing the state commits, I don't think we would actually have many state commits)
First question: How do we find the latest trusted state-signed commit? `git log --oneline` on nixpkgs already takes ~2s on my machine, if we need to search all of these files, removing a layer of base64 and parsing for state commits, I fear already this will take relatively long.
I guess you can try verifying all commits from HEAD down, making holes for all commits with "with <commit-id>", until you hit a required depth or encounter a "state", then you would do a second pass in the opposite direction to cover the "with" holes.
But the important question is the second question: once we have the latest trust-signed commit, how do we check all commits until the latest? It's basically running a state-machine from the latest trust-signed commit to the latest commit. Which isn't reasonable if my guess is correct and the only state-signed commit we see in practice is trust initialization. So we would need a way to cache results between calls (at least finding the latest state-signed commit and maybe a dump of the state of the state machine along with the commit on the refs/signatures branch at which it was computed). This would make the implementation… very complex, I fear :/
I have no complete ideas yet.
That said, apart from completely removing the `with <commit-id>` possibility, I don't see any way of reducing complexity, so… (and even then, having to store the state across invocations would be required to avoid having to re-check all signatures all the time)
True.
|
Agreed :)
Hmm… This works well in a centralized system, where everyone pushes the reviews on the same branch. This works medium-well in a fully decentralized system (because the blockchain benefit is quite reduced, given that blocking a Actually, thinking it over, there likely won't be that many issues. The biggest question will likely be thinking of how to handle multiple remotes, so long as we agree that each public repository would have its independent refs/signature branch (and I think we do)… but anyway we'd have to. So, OK for me :)
Sounds good to me! indeed, signing the state assuming diffs are applied would likely be useful for when we find a serious CVE in the trust-initialization state commit, so as not to have to re-do the trust-initialization from scratch or live with an unsafe signed state commit :)
SGTM
Great, thanks! Also, maybe consider giving it a git repository or whatever so that we can both easily refer to it henceforth and discuss on specific patches? :)
Good idea, indeed! and actually the |
OK, so now that, I think, we agree on almost everything (except the question of whether tree-id or patch-id should be supported or not, but I guess that can be considered an extension by git-signatures so long as it's compatible with the message format), there is the major point (:D) to solve: what technical format? Currently, three proposals have been raised:
I personally see no reason for RFC822: it's like the git commit-like objects, except with random whitespaces to make parsing harder. This leaves us with either JSON and git-like commit objects. To give an example of what the result would look like with the two tools:
vs. JSON (which would be minified in practice) {
"type": "commit-diff",
"oid": "1234567890123456789012345678901234567890",
"with": [
"1234567890123456789012345678901234567890",
"0987654321123456789009876543211234567890"
],
"context-understanding": "high",
"diff-understanding": "high",
"thoroughness": "high",
"result": "+",
"result-otherwise": "-",
"priority": "high",
"message": "My comment."
} I… don't really care which format we use. Rust (the language I love nowadays) allows both to write a parser and to parse from JSON quite easily, so… My “elegance” internal barometer prefers the git commit-like object. My “ease of use” internal barometer tells me most languages would find it much easier to just parse from JSON using a library than to write a full parser from scratch -- and if the language is not memory-safe, parsers are a well-known source of security issues. Also, JSON has the advantage of being more easily extensible should we want to extend the signature format later… but I don't really see why we would. So… meh. Both ways work for me. But I'll note that we're already using nested structures with the |
I have a number of repos I have deterministic building on. I type "make bin" and get the same binary from the same code every time. Then other people build those binaries and confirm they get the same hashes. We then have to certify those hashes came from that code by signing the code commit hashes with the binary hashes. This is imo a pretty good fit as extra metadata to put in the review signature for these use cases to do this as part of the review, as that is the stage that we do it.
My comments were mostly the same as @Ekleog but overall this is a fantastic start. Thanks for that. Please feel free to make a new PR with this to maybe the "docs" folder in this repo so we can isolate spec discussions and review away from implementation concerns in this PR. I am happy to create a "git-signature-spec" repo if we really want another repo this early. I granted you write permissions to this one for now. |
The 'patch' level anchored author/review + maintiainer certifying order with 'commit' level anchoring per my last use case under anchoring, would be a good fit for cherry picking across origins where contributors can only put changes in tree A but maintainers control tree B and certify patch ordering there and how changes interact with each other. I actually know a number of companies that do this for strong isolation of security domains and repo hosting with one public and one internal. |
Verifying the artifacts is trivial if you assume deterministic builds. (IMO any non deterministic builds are a security bug and one I spend weeks fixing for some projects). If m-of-n reviewers (or CI servers) agree on the same hashes, then the given hashes are considered reviewed/trusted. Any downstream systems now know that is what was reviewed and if a new hash pops up saying it was for that given release/tag, it should be regarded as an imposter.
If I am following you: committer signs in a PR, reviewer signs, merge to a pending branch (that everyone somehow has access to without that being a problem?), commiter -re-signs-, then later maintainer signs as a part of a release tag etc. This seems much more complex than the workflow I presented without any security gain other than avoiding different anchoring levels for as far as I can tell no gain. In effect it seems like you are re-creating the same security levels at the branch level vs signature level which I am sure there are use cases for. That certianly can't meet mine or that of most orgs I am aware of in silicon valley. Asking companies to totally abandon the git-flow it took them yaers to adopt is pretty rough. Asking them to add signatures at their regular steps in their usual flow I expect can be more widely and readily adopted and with my mitigations I don't think adds any attack surface over your presented flow. I am however totally in favor of documenting all these potential workflows and security assumptions they make so people can make the right decision for their project as there are clearly multiple workflows that will end with the same security level. We do seem in strong agreement that 'commit' level signing is strongest and should be the thing that downstreams should trust be it on master branch, tags, etc. |
So, your Overall, the git commit signatures are designed to sign the contents of the git repository. I'm really not convinced that artifacts can be reviewed: signing the two in the same step is conflating the “a human verified this code” and “a computer generated this artifact from this code” concerns. Also, you're anyways going to provide a
That's a good idea :) Though in order to have a vendor-neutral repository, I've created a repository at a created-for-this-purpose org: git-wotr/spec. @oxij, feel free to use this repository to push the RFC (or not), or to tell me you'd rather use the For the time being, I've initialized the repository with the GFDL 1.3 (which will require a header at the top of the RFC), and have restricted to only signed commits and only through reviewed PRs. All this is subject to change.
I was thinking of having multiple remotes for the
This sentence shows clearly the conflation of concerns. Reviews are done by reviewers. Artifact signing is done by CI servers. Having reviews include artifacts requires people willing to sign an artifact to also sign a review, which is potentially false. Also, there is a problem of trust. The fact that a review-including-artifacts hash is signed by enough people means:
Overall, I can't find any similarity in the handling of the two things, so I don't understand why they should be put in the same signature: it just limits flexibility for everyone.
You're making it much more complex than it needs be. My setup is exclusively the three bullet points, preceded by an implicit “contributor sends PR”. The contributor doesn't need to sign the PR, because no one would trust their key anyway, so it doesn't matter. And the pending branch can be accessed by any committer without any problem, because anyway it's not used by anyone and master will move forward only when the review signatures will be checked.
… if you read my flow, it's exactly the git-flow, except after the merge to pending the user adds in a signature to the commit. And sorry but the patch-id method without anchoring is flawed and I haven't seen any mitigation to prevent it yet. |
@Ekleog Yay, GFDL! I pushed the original RFC to git-wotr/spec#1. |
There are a lot of interesting subthreads going on here so I am going to file issues on git-wotr so we can discuss them in their own threads, because this is getting silly and important stuff gets lost :) |
... and I think I know about the fourth one, similar to the third, but I'm too tired for now already.
First 5 commits are more or less cleanup. Then there are 3 pairs with each pair first adding a failing test that should have worked and then the next commit fixing
git-signatures
so that it would actually work.I also strongly suggest:
bash -e
, I frequently find errors this way,