Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: domain separation for Fulcio-issued certificates #1131

Closed
woodruffw opened this issue Apr 22, 2023 · 34 comments
Closed

Proposal: domain separation for Fulcio-issued certificates #1131

woodruffw opened this issue Apr 22, 2023 · 34 comments
Labels
enhancement New feature or request

Comments

@woodruffw
Copy link
Member

woodruffw commented Apr 22, 2023

This is a summary of a question/potential enhancement that I originally posted to the Sigstore Slack (ref).

Problem statement

One of Sigstore's expected use cases is package indices. Package indices like PyPI are expected to host Sigstore materials (in the form of Sigstore bundles), which can then be used to verify associated artifacts that are also hosted (or referenced) on the index.

Some packaging ecosystems (like PyPI) offer two (or more) indices: a "production" index for ordinary releases to go to, and an (optional) "beta" or "staging" index for packagers to test against. In the case of PyPI, these are pypi.org and test.pypi.org respectively.

In these ecosystems, it's common to combine publishing to both indices into a single workflow, with interior logic for determining which index the release should go to. For example, here is PyCA Cryptography's logic for selecting the index to publish to:

      - run: |
          echo "OIDC_AUDIENCE=pypi" >> $GITHUB_ENV
          echo "PYPI_DOMAIN=pypi.org" >> $GITHUB_ENV
          echo "TWINE_REPOSITORY=pypi" >> $GITHUB_ENV
          echo "TWINE_USERNAME=__token__" >> $GITHUB_ENV
        if: github.event_name == 'workflow_run' || (github.event_name == 'workflow_dispatch' && github.event.inputs.environment == 'pypi')
      - run: |
          echo "OIDC_AUDIENCE=testpypi" >> $GITHUB_ENV
          echo "PYPI_DOMAIN=test.pypi.org" >> $GITHUB_ENV
          echo "TWINE_REPOSITORY=testpypi" >> $GITHUB_ENV
          echo "TWINE_USERNAME=__token__" >> $GITHUB_ENV
        if: github.event_name == 'workflow_dispatch' && github.event.inputs.environment == 'testpypi'

(Permalink: https://github.com/pyca/cryptography/blob/30525e82c77b91963c4f2e8931d2b0257689d364/.github/workflows/pypi-publish.yml#L34-L45)

Consequently, the Sigstore signatures produced for both PyPI and TestPyPI releases look very similar: they might have slightly different repository states, but their workflow claims are identical.

This represents a potential security risk, under the following scenario:

  1. Package foo performs stable releases to index Production and nightly releases to index Staging, using the same workflow for both.
  2. A nightly release of foo contains a vulnerability. Users would ordinarily not be exposed to this vulnerability, because foo's nightly would only be pushed to Staging, which is explicitly not used in production.
  3. An attacker observes the vulnerability in the nightly foo and re-hosts it on index Production, including with its valid signature
  4. Users who install foo via index Production now receive a correctly signed but unintended (and exploitable) version of foo
    • This can occur either due to a compromise of Production, a name takeover of foo (arguably out of scope here), or a re-hosting of foo under a different name.
    • Alternatively, for a third-party of separate index, an attacker could re-host foo entirely with only exploitable versions from Staging.

This can be summarized as a "domain separation" problem (in the cryptographic sense of "domain," not the DNS sense): the packager's intent about which index the package ends up on is not communicated in the verification materials for that package, giving an attacker some ambiguity to play with.

Proposed solution

Sigstore could enable domain separation for signatures via changes to Fulcio, resulting in changes to Fulcio-issued certificates.

In particular, Fulcio could support an additional, optional extension of the following (rough) form:

Domains ::= SEQUENCE OF Domain

Domain ::= OCTET STRING

(wrapped, in turn, as an X.509v3 extension).

Each Domain would be an unstructured value; individual Sigstore-consuming ecosystems would be responsible for interpreting them in context-appropriate ways.

This extension would be signed over like all other extensions, but would be "unauthenticated" with respect to the identity token (since it would be derived from something in the CSR, rather than the OIDC token).

This would be surfaced to the user via individual Sigstore clients. For example, for sigstore-python, adding domains to a certificate might look like this:

# Domains = { hamilcar, hannibal, hasdrubal }
sigstore sign --domain hamilcar --domain hannibal --domain hasdrubal important.txt

For ecosystems like PyPI, these domains could correspond to DNS names. For example, PyPI could require that uploaded bundles contain certificates with pypi.org in their domains; other domains (or the lack of domains, if desired) would be rejected. Similarly, pip could reject an otherwise valid bundle retrieved from {domain} if the bundle's certificate does not list {domain} as a valid domain (or {domain} does not explicitly list the indices it is mirroring from).

Alternatives considered

These have not been fully considered and need to be discussed more; I'm only listing them here to preserve conversation state!

Alternative: Packagers should use distinct GitHub workflows (or other identifiable claim state) for their staging and production releases. This would result in distinguishable certificates for different kinds of releases.

Problems: Distinct workflows for staging and production releases make it difficult to test the correctness of the production workflow without actually creating a production release. Similarly, the (small) distinctions between claims for different workflows may not be immediately actionable/easily consumable on a policy level (e.g., PyPI cannot reasonably enforce that all certificates contain claims for release.yml, not beta-release.yml or any other name).

Problems identified

This approach is not without problems:

  1. Having Domains be a SEQUENCE OF OCTET STRING is very flexible, and delegates a lot of interpretative power to individual Sigstore-consuming ecosystems. IMO this is necessary given the wide range of expected Sigstore deployments (it's hard to be more opinionated here), but it also means that Sigstore should offer normative guidance about how to use this extension correctly (e.g., discouraging people from adding sub-languages like wildcards to it).
  2. As mentioned above: Domains would be signed over like the rest of the certificate, but is "unauthenticated" with respect to the binding identity token. This is arguably confusing, since all other extensions present in Fulcio-issued certificates are authenticated. This could be addressed in documentation, or (ideally) by refactoring how we handle the claim representation in certificates (via something like [RFC] Should Fulcio put the critical bit on its OIDC extensions? #981 (comment)).
@woodruffw woodruffw added the enhancement New feature or request label Apr 22, 2023
@woodruffw
Copy link
Member Author

CC @znewman01, @haydentherapper, and @asraa in particular for thoughts here!

@haydentherapper
Copy link
Contributor

Releasing from the same workflow is effectively reusing the same signing key. I would recommend a separation of workflows. You can use the same underlying reusable workflow but with different workflow inputs, which would make the certificates distinguishable. It's really no different than the example you gave for conditionals in a single workflow.

(e.g., PyPI cannot reasonably enforce that all certificates contain claims for release.yml, not beta-release.yml or any other name).

Won't this have to be a part of a verification policy regardless of domain separation? The package manager needs to know the mapping between signing workflow and artifact.

As mentioned above: Domains would be signed over like the rest of the certificate, but is "unauthenticated" with respect to the binding identity token

This is a very significant downside to this approach in my opinion. Too easy to misuse, especially since everything else is authenticated.

This also comes with the risk of spam, especially considering that these certificates will go into an immutable log. We could mitigate this by having a predefined set of domains ("nightly", "staging", "prod", etc), but this might not work for everyone.

@haydentherapper
Copy link
Contributor

Cc @laurentsimon, you might be interested in this

@woodruffw
Copy link
Member Author

Releasing from the same workflow is effectively reusing the same signing key. I would recommend a separation of workflows. You can use the same underlying reusable workflow but with different workflow inputs, which would make the certificates distinguishable. It's really no different than the example you gave for conditionals in a single workflow.

Yep, I fully agree that this is optimal (and would be best practice, including using a reusable workflow + containing workflow for distinguishability).

That being said, I think in practice this will be a difficult hurdle for a lot of packagers: it requires a degree of discipline + commitment to idioms/best practices that aren't widely held (at least in Python packaging).

Won't this have to be a part of a verification policy regardless of domain separation? The package manager needs to know the mapping between signing workflow and artifact.

Yes and no -- I think it's application specific. For contexts like PyPI, I'd argue that default verification policies probably shouldn't include things like the workflow_ref or job_workflow_ref -- they'll probably end up changing too frequently to be useful policy components, similarly to how PyPI itself is unlikely to care about the ref claim (since it has no independent way to confirm it, other than in a naive sense like "this ref appears in the corresponding repo, if public").

OTOH, for specialized users (or other applications of Sigstore), those will probably be important parts of the verification policy.

This is a very significant downside to this approach in my opinion. Too easy to misuse, especially since everything else is authenticated.

This also comes with the risk of spam, especially considering that these certificates will go into an immutable log. We could mitigate this by having a predefined set of domains ("nightly", "staging", "prod", etc), but this might not work for everyone.

Agreed 100% with both of these concerns, although I think that both are mitigable:

  1. I think this could be handled with the proposal in [RFC] Should Fulcio put the critical bit on its OIDC extensions? #981 (comment): under that scheme, the "authenticated" claims could appear in a single extension called AuthenticatedClaims, while the domains could live either in UnauthenticatedDomains or similar.
  2. I think it'd be reasonable to impose a relatively conservative size limitation on the domain set (which I've realized should really be a SEQUENCE OF, based on conversations with the PyCA folks about ASN.1 best practices). For example, this could be limited to 1KB or even 512B without (IMO) significantly impairing the intended functionality, since individual applications would be expected to interpret this individually (up to and including accommodating external references, if appropriate).
    Alternatively, riffing off your "predefined set of domains" idea: rather than allowing this to be free form, it could be a SEQUENCE OF OBJECT IDENTIFIER, where both Sigstore and individual applications can allocate OIDs for specific domains (e.g. 1.3.6.1.4.1.55738.123 could be for pypi.org). This would be more size efficient, and would eliminate some avenues for domain control misuse (e.g. by making it less trivial to declare sublanguages in this extension).

@MatthiasValvekens
Copy link

Hi, as a package maintainer, I'm in the process of setting up release workflows with Sigstore across a couple of my projects. My 2 cents:

  • Is it really that common for people to publish release artifacts with the same ID and version number to test.pypi.org? I would expect any "releases" uploaded there to be suffixed with an "obviously non-canonical" suffix like .dev1, .test2 or something of the sort. Wouldn't that serve as a differentiator in practice? Also, I obviously can't speak for anyone else, but I don't use TestPyPI as a "beta version host". To me, it's strictly a test bed for publication workflows.

  • For GitHub Actions at least, PyPI's trusted publisher system allows the upload rights to be pinned to a specific GHA environment. I have that field set to release on PyPI, and to test-release on TestPyPI. Maybe incorporating that field in the claims would already be helpful, although I'm not sure how well that generalises beyond GHA.

  • Independently of the above, using OIDs as a domain separator looks reasonable to me as well. It encourages standardisation for "the masses" (=> easier for default validation policies to tap into) but still allows power/enterprise users the freedom to do things as granularly as they want.

(Take these with a grain of salt, the code signing part of the PKI ecosystem is still fairly new to me, so if this doesn't make any sense, carry on)

@haydentherapper
Copy link
Contributor

haydentherapper commented Apr 23, 2023

For contexts like PyPI, I'd argue that default verification policies probably shouldn't include things like the workflow_ref or job_workflow_ref -- they'll probably end up changing too frequently to be useful policy components

I’m not sure these are going to change frequently, as they’re the names, not the digests, of the builder workflows. They really can’t change frequently, otherwise how else does the package repository authenticate the identity in the certificate? FWIW, no package index has implemented this yet, so maybe we will see issues pop up as they do, but I’d assume the workflow looks something like:

  • package maintainer registers with index. Package maintainer notes the source repository and workflow.
  • package maintainer automates signing and publishing with workflow
  • Index verifiers certificate identity matches that of the workflow, marks package as “trusted”

You can also add TUF into the mix to make this mapping publicly auditable.

Going back to the first comment, what is the threat we are defending against? I don’t disagree that there is an opportunity for a package to be swapped out with similar looking provenance, but again, this seems like a builder-signing problem. The more likely threat is source repo compromise - now as an attacker, I’ll focus on switching out the unauthenticated value so that the vulnerable package’s destination points to whatever index the attacker wants it to go to. This gets into SLSA discussions too, because we now have user provided values, rather than non-forgeable values provided only by the builder.

Edit: to add some context to my slsa comemnt, you could imagine an index giving different check marks based on the slsa attestation level. You couldn’t trust unauthenticated values because they might be tampered with (by an attacker, or insider/malicious maintainer).

@haydentherapper
Copy link
Contributor

Maybe incorporating that field in the claims would already be helpful, although I'm not sure how well that generalises beyond GHA

it should generalize! There are a new set of CI-platform-agnostic claims, check out oid-info.md.

@alex
Copy link

alex commented Apr 23, 2023

For GitHub Actions at least, PyPI's trusted publisher system allows the upload rights to be pinned to a specific GHA environment. I have that field set to release on PyPI, and to test-release on TestPyPI. Maybe incorporating that field in the claims would already be helpful, although I'm not sure how well that generalises beyond GHA.

It's important to distinguish between trusted publishers, which are an authentication mechanism, and therefore only need to be validated by PyPI itself, and sigstore signatures, which ened to be end user verifiable.

If an end user needs to verify a property, they need to know what the correct value is. End users do not have a way of knowing what the correct workflow or environment is.

@MatthiasValvekens
Copy link

it should generalize! There are a new set of CI-platform-agnostic claims, check out oid-info.md.

Thanks!

It's important to distinguish between trusted publishers, which are an authentication mechanism, and therefore only need to be validated by PyPI itself, and sigstore signatures, which ened to be end user verifiable.
If an end user needs to verify a property, they need to know what the correct value is. End users do not have a way of knowing what the correct workflow or environment is.

Point taken. I was implicitly assuming a human validator checking whether or not the name "looked reasonable" at validation time, but that indeed doesn't scale.

Perhaps this disadvantage is less pronounced with OID markers? If the OIDs used for "test builds" and "production builds" are uniform across any given packaging ecosystem, that's something one could standardise on in package managers for end users (or other user-facing validation tooling). And if advanced users need more granular domain separation, they could still inject their own OIDs regardless.

@woodruffw
Copy link
Member Author

  • Is it really that common for people to publish release artifacts with the same ID and version number to test.pypi.org? I would expect any "releases" uploaded there to be suffixed with an "obviously non-canonical" suffix like .dev1, .test2 or something of the sort. Wouldn't that serve as a differentiator in practice? Also, I obviously can't speak for anyone else, but I don't use TestPyPI as a "beta version host". To me, it's strictly a test bed for publication workflows.

It's confusing, but we have to distinguish between the authenticated and unauthenticated components here: the certificate's embedded claims might include the package version in the form of a similar-looking git tag, but lots of packages are also released from release branches or individual git refs.

In other words: an attacker Mallory could take a package foo-1.0b1 (and its Sigstore materials) from TestPyPI and re-upload it to another index as foo-1.0, and there's no strong guarantee that the claims in foo-1.0b1's certificate will enable a verifier to detect that. The end-user might eventually detect it by noticing that the installed distribution has the wrong version, but ideally we'd catch it earlier in the process 🙂

@haydentherapper
Copy link
Contributor

If an end user needs to verify a property, they need to know what the correct value is. End users do not have a way of knowing what the correct workflow or environment is.

PEP480 (which I believe is being rewritten to include Sigstore) touches on this. You could use TUF to provide this mapping, whether it be user managed keys, identities, or CI identities.

@woodruffw
Copy link
Member Author

I’m not sure these are going to change frequently, as they’re the names, not the digests, of the builder workflows. They really can’t change frequently, otherwise how else does the package repository authenticate the identity in the certificate? FWIW, no package index has implemented this yet, so maybe we will see issues pop up as they do, but I’d assume the workflow looks something like:

Yeah, I'm perhaps being overly conservative here, based on how complex I'm expecting this implementation to be for PyPI 🙂

I'm realizing there are two separate concerns here:

  1. How do we handle the semi-common occurrence of users renaming or otherwise substantively changing the identities of their workflows?
  2. How do we handle the semi-common occurrence of users using the same workflow identity for different domains of security?

Both of those are positive concerns, in the sense that I don't think either is what users ought to do, but probably will end up doing due to the size and diversity of most packaging ecosystems.

For concern (1), I think I agree with you -- this is handled by the conjunction of trusted publishing and signing, and public auditability concerns can be resolved by TUF.

For concern (2) the threat model is the kind of index confusion I mentioned in #1131 (comment):

  1. Legitimate signer Bob publishes foo-1.0b1.tar.gz + foo-1.0b1.sigstore to test.pypi.org, using the repository identity release.yml @ bob/foo.
  2. Mallory observes foo-1.0b1's nightly release
  3. Mallory re-publishes foo-1.0b1 as foo-1.0 to pypi.org (or any other index), along with foo-1.0.sigstore (which is just a rename of foo-1.0b1.sigstore with all contents preserved)
  4. Users who install foo-1.0 from pypi.org (or any other index) install a version of foo that has a correct signature, but with a misleading implication (that a stable release has been made, when foo-1.0b1 might actually be intentionally unstable and vulnerable).

I agree as a point of order that Bob ought not publish to both pypi.org and test.pypi.org from the same workflow, but I think that in practice the security model here is too subtle for most users: it's not easy to automatically enforce that users use different workflows for different "domains" of security and a lot of projects already combine their publishing into a single workflow, making it difficult to "lift" the whole ecosystem into the ideal practice here.

Additionally, this has all been in the context of machine identities, when email identities are also something that PyPI will likely support. In that context, domain confusion is an even more salient concern: Bob is unlikely to have two separate email identities for production and staging signatures, so he'll need some other way to convey his signing intent.

@haydentherapper
Copy link
Contributor

That attack seems like typosquatting with TOFU. Mallory could also take foo-1.0 and publish foo-2.0 on the same index with the same contents and signature and try to convince users her copy is the “real” one. This seems like it’s the responsibility of the package index to mitigate this, that versions (and across indices) should be linked together under one signer.

What about using a proof of possession? That would mitigate this risk of signature reuse because you’d have to prove ownership of the private key, whether that be a key, identity (do an OIDC dance), or repo (maybe put some value in the repo ACME-style?).

@woodruffw
Copy link
Member Author

That attack seems like typosquatting with TOFU.

I think it's similar, but a distinct attack: Mallory makes herself visible by trying to convince users that her signature is the "real" one, but remains stealthy when trying to convince users that Bob's "staging" domain signature was really intended for the "production" domain.

What about using a proof of possession? That would mitigate this risk of signature reuse because you’d have to prove ownership of the private key, whether that be a key, identity (do an OIDC dance), or repo (maybe put some value in the repo ACME-style?).

Could you say some more about this? Is the idea here that each mirroring operation would require Bob (or Mallory, unsuccessfully) to "re-prove" themselves?

@haydentherapper
Copy link
Contributor

Only on initial upload, since you require the same index identity after that (I think, worst case it’s every upload).

Currently, an unsigned package can be renamed, modified and reuploaded without detection. Now let’s add a signature, generated by a key. Now the package can’t be tampered with, but it could be reuploaded under a different name, assuming that name hasn’t been claimed. If the index were to ask for a proof of possession of the signing key for an initial registration, this would prevent anyone from copying a signature.

The same procedure can be done for identity based signing, just with a different PoP, asking for proof of ownership of a repo, or of an identity.

@woodruffw
Copy link
Member Author

That's an interesting idea, although we'd have to figure out what it means to provide an "equivalent" PoP/proof of ownership for different identity types: for emails it's straightforward, but for machine identities it's a little fiddly (e.g. you can imagine this pessimistically forbidding uploads of a different package that uses the same release workflow legitimately).

I also think this wouldn't address a "remirroring" case, e.g.:

  1. Some versions of foo legitimately go to PyPI
  2. Other versions of foo legitimately go to TestPyPI
  3. Someone else runs "CorpPyPI" and only wants to mirror versions of foo that legitimately go to PyPI and not TestPyPI

In that case, the entity doing the remirroring isn't necessarily the original signer, so they can't provide a PoP/ownership. Instead, they'll probably want to be able to make policy statements like "I accept artifacts accepted by this domain/audience."

@woodruffw
Copy link
Member Author

Also, I just realized that "domain" was a bad choice of word by me, since it's very easily confusable with DNS domains 😅

What I'm proposing is effectively the same thing as the OIDC audience (aud claim), except for Sigstore ecosystem certificates instead.

@znewman01
Copy link
Contributor

Can I ask a possibly silly question? Why does this have to happen at the Fulcio layer?

I think domain separation is a good idea, but I'd prefer to implement it at the level of what you're signing. So instead of signing the artifacts directly, you'd sign a "release attestation" which is repo-scoped. When setting up the workflow, you'd have to specify the aud for the release attestation.

In general, I think this comes from a lack of clarity in how we're thinking about the PyPI publication policy. It seems like we want to check something like:

  1. Was it built correctly? This artifact was built on a trusted GH actions build job using a known GH actions Python builder workflow (was it built correctly?)
  2. Was the input to the build trusted? The source of the artifact is something PyPI/testPyPI likes (this might actually be quite subtle, but right now it basically boils down to "the source of the artifact is the hardcoded GH repo").
  3. Was it intended for release? The author of the artifact actually wanted it to wind up on PyPI/testPyPI.

Confusion around (3) seems to be the problem in this issue. I'd really prefer to check these things each separately, rather than together. So we can have signed build provenance for (1), check the OIDC GitHub repo for (2), and check some release attestation for (3).

@haydentherapper
Copy link
Contributor

haydentherapper commented Apr 23, 2023

That's a great suggestion! That provides a clean separation between the identity layer of Fulcio and user-provided artifact metadata.


One last thing about typosquatting cause I'm thinking about that still, I do think it's in the public index's interest to validate ownership over anything provided to it. If the index were to validate repo ownership (which effectively verifies machine identity ownership, given the workflow should be in the repo), then it prevents an attacker from grabbing an old, vulnerable (but still signed!) version and reuploading it under a new name (or to a different index. For the same index, domain separation doesn’t solve this). Provenance gives you where the source is (which must be from the registered repo) and where the build config is (which also must be in the registered repo).

@woodruffw
Copy link
Member Author

Can I ask a possibly silly question? Why does this have to happen at the Fulcio layer?

I'm not sure! My motivation for putting it at the Fulcio layer is mostly expedience 🙂 -- there's a clear roadmap for getting Sigstore signatures into PyPI and similar ecosystems, which will almost certainly look like a tuple of (package, bundle) in most situations. The Fulcio certificate seemed like the "right" place in the Sigstore bundle for this information to go on first blush to me, but that could be a matter of ignorance or lack of perspective on my part.

I think domain separation is a good idea, but I'd prefer to implement it at the level of what you're signing. So instead of signing the artifacts directly, you'd sign a "release attestation" which is repo-scoped. When setting up the workflow, you'd have to specify the aud for the release attestation.

Could you say some more about what this would look like?

My first thought would that it would essentially be a digital signature with the same key that the certificate attests to, but I'm not 100% clear on where that attestation would "live":

  1. If it lives in the bundle adjacent to the other verification materials, then Mallory could engage in a "downgrade" attack by removing it from the bundle (since the bundle itself isn't enveloped) and supplying the stripped bundle.
  2. If it lives on artifact transparency log, then ecosystems like PyPI (or more troublingly, third-party mirrors) would be required to perform online lookups of these audience attestations, which would add a relatively heavy network bound during package upload or bulk mirroring operations (the latter being primarily disk I/O bound otherwise).
  3. Something else?

@haydentherapper
Copy link
Contributor

I think the idea would be that the platform would mandate a release attestation, so removing it and using a raw signature would not be allowed. It could be ecosystem dependent, so some ecosystems may not require any attestation, some could mandate certain claims.

This also works nicely with the idea of using DSSE rather than raw signatures, something that was proposed to me recently.

@znewman01
Copy link
Contributor

I think the idea would be that the platform would mandate a release attestation, so removing it and using a raw signature would not be allowed. It could be ecosystem dependent, so some ecosystems may not require any attestation, some could mandate certain claims.

Precisely. That would prevent downgrade attacks (or rather, turn them into DoS attacks), which are possible anyway.

@kommendorkapten makes a note in the Sigstore Slack about how this works for npm, which is similar to what I proposed:

I don’t know enough about the infrastructure of PyPI, but just throwing out an idea. For npm, the registry produces a “publish attestation” when a package is correctly authorized for publish. So a client would not only verify the build/provenance attestation but also the publish attestation. But, as you say compromise the "main" index this could also imply that the attacker has control over the service in the registry responsible for the publish attestation creation, so it will not offer a complete protection.

What’s already discussed with the client using different identities (workflows or git refs in GHA) for release and nightly builds would be the preferred option IMHO.

@kommendorkapten
Copy link
Member

Here is the publish attestation predicate: https://github.com/npm/attestation/tree/main/specs/publish/v0.1

It was also discussed on the last Sigstore clients meeting. Other ecosystem are interested in a similar attestation. My belief is that the one already used by npm is generic enough, and we can see if e.g. the in-toto project is interested in taking ownership of that predicate (in-toto already defines a few predicates ).

Sorry if this is off-topic, I don't wish to derail the discussion, just add more context on the publish attestation.

@woodruffw
Copy link
Member Author

Sorry for these questions, I think I've lost the thread a little 😅

I think the idea would be that the platform would mandate a release attestation, so removing it and using a raw signature would not be allowed. It could be ecosystem dependent, so some ecosystems may not require any attestation, some could mandate certain claims.

Making sure I understand: is this release attestation included in the Sigstore bundle, in lieu of an ordinary raw signature (and therefore bound to the same original signing identity), or something else?

If that understanding is correct, then this makes a lot of sense to me!

@znewman01
Copy link
Contributor

Sorry for these questions, I think I've lost the thread a little 😅

Ha, we've been a little all-over-the-place. No worries.

Making sure I understand: is this release attestation included in the Sigstore bundle, in lieu of an ordinary raw signature (and therefore bound to the same original signing identity), or something else?

Precisely :)

(it's not important that the attestation go in the bundle, but it can!)

If that understanding is correct, then this makes a lot of sense to me!

Good to hear! Yeah, I think that's a lot simpler than adding this feature to Fulcio.

@woodruffw
Copy link
Member Author

Good to hear! Yeah, I think that's a lot simpler than adding this feature to Fulcio.

Agreed!

We're now firmly outside of the domain of Fulcio so maybe it makes sense to relocate this conversation, but: what does the expected UX for these attestations look like, versus an ordinary signing slow?

For example, here's how I currently produce a Sigstore bundle (including raw signature) for a Python package distribution:

# produces some-dist.whl.sigstore
sigstore sign some-dist.whl

What would it look like to embed a release attestation instead? Do we expect there to be a registry of common attestations, e.g. could I do something like this?

sigstore sign --attest-for pypi some-dist.whl

@znewman01
Copy link
Contributor

We're now firmly outside of the domain of Fulcio so maybe it makes sense to relocate this conversation,

Meh, here's fine pending a better spot.

but: what does the expected UX for these attestations look like, versus an ordinary signing slow?

pip publish 😄

I think it's a non-goal to have users explicitly type out the actual attestations. So I'm okay if there's some command (doesn't need to be baked directly into pip) that de-sugars to something like:

# this should use in-toto tooling or similar, not echo. also '/tmp' isn't secure on a multiuser system. but the idea is there
echo "{ my publish attestation for $PACKAGE @$HASH }" > /tmp/publish.att  
sigstore sign /tmp/publish.att

Do we expect there to be a registry of common attestations,

Something like that is probably a good idea. I've hinted at similar in the past: sigstore/cosign#2892

@woodruffw
Copy link
Member Author

That makes a lot of sense, thanks for explaining!

So, to tie this all together in my head: where does DSSE fit in? With { my publish attestation for $PACKAGE @$HASH }, does the "real" signing input become the concatenation/restructuring of that input with the DSSE prefix, payload type, etc?

@haydentherapper
Copy link
Contributor

Yea, that sounds correct, and the DSSE payload would be uploaded to Rekor (there's a new DSSE type being worked on, though intoto mostly works too).

@znewman01
Copy link
Contributor

DSSE wraps the publish "statement" (unauthenticated claim like "i wanna publish package foo @ v2.5.6 hash sha256:abcde to test.pypi.org") in an envelope for signing. The signed envelope is now a full "attestation".

The raw signed bytes are specified by DSSE, so the signing tool will ideally know about DSSE. But you could also use an intermediate command like dsse-format-envelope-for-signing .

@woodruffw
Copy link
Member Author

(Dropped back to Slack because the context has switched from Fulcio to the underlying problem.)

I'm going to close this out, since I think the underlying question has been resolved. Thanks a ton @znewman01, @kommendorkapten, and @haydentherapper!

@woodruffw
Copy link
Member Author

woodruffw commented Apr 24, 2023

To summarize:

  1. My original concern was that the Sigstore ecosystem lacked a strong way to isolate verification materials by "domain": a (reasonably) naive packager might sign both production and staging artifacts with the same identity (email or workflow), resulting in ambiguity that an attacker could leverage to deliver an incorrect (but correctly signed) artifact.
  2. My original proposal was to encode "domain" information in Fulcio-issued certificates: each certificate would gain an additional "unauthenticated" (in the context of OIDC) X.509 extension with an application-specific format for representing their "domains" of interest.
  3. @haydentherapper explained why doing this at the Fulcio layer would be a significant complication, and @znewman01 and @kommendorkapten offered an alternative: instead of "raw" signatures, ecosystems like PyPI could do something similar to what NPM has done and submit an attestation, which in turn could contain structured metadata for things like domain specificity. This would be embedded in the bundle instead of a "raw" signature, meaning that an attacker can't force a "downgrade" to a raw signature.

To make things concrete, PyPI and other ecosystems will probably want a client-side attestation with (at least) the following pieces:

  1. The digest of the artifact, making the attestation equivalent in terms of artifact binding to the original "raw" signature (which was computed over the artifact digest);
  2. The version (PEP 440 for PyPI) of the artifact, meaning that an attacker can't fool an installer by providing a legitimately signed but older version of the artifact with a fake newer version;
  3. Some kind of "domain statement" for the artifact, which for PyPI will probably be the DNS name for the intended index (e.g. pypi.org).

To accomplish this, sigstore-python will need support for DSSE-style signatures. I've opened sigstore/sigstore-python#628 to track that.

@haydentherapper
Copy link
Contributor

On the PyPI side, it's also worth considering how to prevent reupload of an artifact. A few thoughts:

  • Should PyPI (or any package repository) have some notion of "freshness" and disallow the upload of identity certs/attestations that were generated awhile ago? For example, if you require the identity certificate to be valid on upload, then you have a 10 minute window, and so this proposed attack would not be possible
  • Should PyPI disallow different owners for packages across indices? As in, when you take foo, you own it across all managed indices
  • Should PyPI check ownership of a repository before configuring a mapping between a repository and a package?
  • How does TUF fit into this? In particular, can we use TUF to manage the mapping between repositories and artifacts, making this mapping auditable (and revokable! useful for revoking signatures of vulnerable artifacts)

@feelepxyz
Copy link
Member

Something else we did for npm, which also adds a layer of protection for a typosquatting-like attack, is to add the package name, version and tarball digest in the attestation subject (signed payload). We then verify this matches the published package at the registry before accepting and creating a publish attestation, and also verify in the CLI when auditing downloaded packages.

This means you can't re-use an expired identity certificate from some other package, e.g. [email protected] and bind it to a new package, e.g. [email protected], as you would need the original private key, and a still valid cert to re-sign the new payload with the correct subject matching [email protected] (and re-using the old predicate that also needs to match exactly what's in the identity cert extensions).

The attestation store where we keep these bundles and the npm registry also enforce that published package-name@version tuples are immutable on the main registry, preventing them from being re-used if the package and/or version was deleted or yanked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants