Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ITE-6: Generalized link format #15

Merged
merged 32 commits into from
Jun 8, 2021
Merged

Conversation

TomHennen
Copy link
Contributor

This is an initial draft of ITE-6 that defines a generalized link format. See the document for reasoning.

This has been discussed with @SantiagoTorres who will be sponsoring this ITE.

This was authored primarily by @MarkLodato (who is on leave). Most credit goes to him.

I've converted it to Markdown, any formatting issues are likely my fault.

Several sections are still empty. They need to be fleshed out (or removed) before the ITE is accepted.

ITE/6/README.md Outdated

## Authentication and serialization

[ITE-5](https://github.com/MarkLodato/ITE/blob/ite-5/ITE/5/README.md) defines a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: If accepted this will be the signing spec (secure-systems-lab/dsse#2) instead of ITE-5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified.

ITE/6/README.md Outdated
the following fields. See subsequent sections for
[Type definitions](#type-definitions) and [Reasoning](#reasoning).

`attestation_type` _(URI, required)_
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to add a story about how people add new attestation types. E.g. how do they define the type and add the policy evaluation code.

Thoughts?

ITE/6/README.md Outdated
> `artifact` _(ArtifactReference, required)_
>
> > Identifies the related software artifact. The following standard relations
> > are recommended[^1]: * `top_level_source`: The primary input used to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @MarkLodato "I'm [...] wondering whether we should have different "classes" of attestations, notably Provenance, CodeReview, TestResult (including vuln scan), and PolicyDecision, each with their own set of relations."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided against this for now, to keep things simple. We may want to revisit this in a future iteration.

This makes the table easier to read in both the Markdown as well as the
rendered HTML.
The snippets are supposed to be JSON, not arbitrary Javascript.
ITE/6/README.md Outdated
`attestation_type` _(URI, required)_

> Indicates the meaning of this attestation and how to interpret `details` and
> `relations`. Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of forward declaration for fields and types in this section. It would be nice, to try to re-structure so that terms are defined before they are used as part of other definitions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the latest revision, which has changed a bit.

- Clarify goals.
- Split specifications to their own files.
- Avoid introducting new terminology or talking about "layers" because
  doing so adds complexity without benefit.
- Use a much simpler schema that is both closer to existing in-toto 0.9
  as well as easier to read and write.
- Document the new Provenance type.
- Other minor fixes.
@MarkLodato
Copy link
Contributor

FYI, I just pushed a pretty significant rewrite, based on feedback from Santiago and others. It is now proposes much more modest changes from current in-toto, keeping materials and products (now called subject, with slightly different semantics) and making all the other fields type-specific.

The design still needs more work, but hopefully it is improving! :-D


```jsonc
{
"attestation_type": "https://in-toto.io/Provenance/v1",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should 'attestation_type' say something about SBOM instead of 'Provenance' ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. It is now SPDX, which is the specific format. (SBoM is not a specific format.)

ITE/6/README.md Outdated
## Introduction

An **attestation** is the generalization of an in-toto link. It is a statement
about an artifact, signed by an attester. Each attestation has a type indicating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be about one specific artifact? What about build steps that produce multiple ones? Does each of them end up having a separate attestation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had thought there would be one attestation per artifact, but our work with Busytown made us realize we need to support multiple artifacts per-attestation. This does make writing good policy somewhat harder.

Note that the example does have multiple artifacts specified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified: "set of artifacts".

ITE/6/README.md Outdated
Comment on lines 79 to 80
We expect to a few more standard attestation types over time, and customers may
define their own custom attestation types if desired.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a mechanism for registering types? Given they are represented by URIs / URLs, is there a meaning behind what these resolve to (if at all)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This is now stated explicitly.

ITE/6/README.md Outdated

## Design notes

Attestations SHOULD be designed to encourage policies to be "monotonic," meaning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for monotonicity

ITE/6/README.md Outdated
Comment on lines 515 to 516
There are two main reasons for standardizing `subject` and `materials` within
Attestation schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I miss the fact that these are in fact standardized? Should this be mentioned earlier on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified. PTAL.

ITE/6/README.md Outdated
Comment on lines 518 to 525
First, doing so allows policy engines to make decisions without requiring
`attestation_type`-specific logic or configuration. Binary Authorization
policies today are purely about "does an attestation exist that is signed by X
with subject Y", and similarly in-toto layouts are about "does an attestation
exist that is signed by X with materials/products Z?"[1] These relatively simple
policies are quite powerful. With this proposal, such policies become more
expressive without any additional configuration: "does an attestation exist that
is signed by X having type T, with subject Y and/or materials Z?"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way of looking at this: an attestation is a partial definition of a directed graph, in which edges connect sources materials to artifacts (to use your terminology), and the edges themselves are labelled with a pointer to the attestation that introduced them. Then these attestation compose naturally, and it is also possible to traverse this graph forwards or backwards, also taking into account the trust model in the various principals that sign these attestations.

This also produces a nice way of formulating what a "root of trust" is: from a given node in the graph, follow links backwards for attestations that are trusted (e.g. "GitHub actions"), until you reach an attestation that you cannot decompose any further (because you don't trust the upstream attestations).

If we go with this semantics, effectively an attestation with a set of N materials and a set of M subjects corresponds to (N * M) edges that contribute to the graph. This assumes all the materials contribute to all the subjects, which I think is a safe assumption; if this is not desired, then perhaps the attestation should be denormalized into individual edges, in order to offer more precision, at the expense of higher verbosity and lower entropy for the more common cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I think this also would connect nicely with @dlorenc 's ideas on how to incorporate more expressive entries in sigstore / rekor that contain links to / from other log entries.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I think this also would connect nicely with @dlorenc 's ideas on how to incorporate more expressive entries in sigstore / rekor that contain links to / from other log entries.

That's no coincidence! @SantiagoTorres was behind much of that thinking :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something in between:

{
  "attestation_type": "https://example.com/CustomBuilder",
  "subjects": [ "X", "Y" ],
  "links": [
    {
      "materials": [ "A" ],
      "link_type": "docker_image",
      "custom_field_0": "foo",
      "custom_field_1" : "bar",
    },
    {
      "materials": [],
      "link_type": "command",
      "command": "rustc build",
    },
    {
      "materials": [ "D" ],
      "link_type": "mounted_file",
      "mount_point": "/var/src",
    },
  ]
}
  • the list of subjects represents the artifacts that were generated / endorsed
  • for each item in the links list, the materials and link_type fields are mandatory
    • materials may be empty (e.g. for code reviews, manual inspection, etc.)
    • link_type defines how to interpret the list of materials in that link, plus any additional custom fields for that link

The nice thing about this approach is (I think) that the materials are normalized (they only appear exactly once, and alongside the corresponding metadata), and they can easily be parsed without having to know the meaning of the link types, in order to traverse the graph.

This also makes it easy to incorporate this in rekor: extend the rekor format to contain a list of related entries, which are effectively extracted from the individual link materials in this attestation format; this way the rekor log does not need to store the entire attestation, which may be too large and / or contain undesirable data, but the links to other entries are explicit and those can be extracted and indexed by rekor without knowing what they mean; of course rekor would also index the hash of the original (unredacted) attestation, so that, if someone could fetch that from somewhere else, they could check that it was indeed correctly incorporated in the log, and they would have the information about the semantics of the individual link edges.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the latest design.

  1. materials is no longer in the Statement layer. Instead, it is only in the Provenance-type predicate. This means that we can more easily change it in the future. The downside means that you can't traverse the graph without understanding the Predicate type. However, the value of being able to traverse the graph without actually understanding the edge meanings is dubious. I can believe that perhaps there is a use case out there, but until we have a concrete one, our plan is to push as much down into the Predicate layer as possible. This allows each Predicate type to do what is natural in that domain. Also, things like SPDX have their own language for links, and our model now supports that.

  2. Within Provenance, materials is an array instead of a map, so (1) the reference is by index instead of name, so there is less duplication, and (2) we have richer objects so you can attach more properties to them.

I don't follow the connection to Rekor. The big thing is standardizing subject so that you know what artifact the attestation is about. The links would be processed by the consumer, and they'd do further queries to walk the graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark!

I think there is some value in representing this as a graph that can be traversed without understanding the meaning of the edges; e.g. in Oak we may want a way to recursively download and store all the attestations starting from a given artifact, and spanning the graph (perhaps up to a certain depth). This would form a sort of collection of facts about the binary, as well as its dependencies, and so on, which may be stored offline. Then a policy engine may take this as input and decide whether or not to accept the given binary (e.g. whether to run it or not).

Note that the component that does the downloading / traversal may not know anything at all about the meaning of the predicates, and the policy engine may not be able to perform queries on line.

Additionally, IMO I think there is a nice symmetry in considering each material (or set of materials, if that helps) as a directed edge in a graph (from / to the subject), in which the predicate is effectively an attribute of that edge. This way the topology of the graph should be traversable (and even cacheable, as per my previous paragraph) even without having to know anything about the individual predicates. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the prospects of this idea, but it's not for v1. :-) We're looking to release v1 within the next two weeks, and there are just too many unknowns to make the links part of the Statement layer.

Our bar for putting something in the Statement layer is that there must be widespread, concrete use cases showing the need to have the field be common for all attestations. Right now, the use cases for links in the Statement layer are still theoretical.

Some challenges:

  • How do you terminate the graph traversal? The graph in reality is intractably large. For example, every build likely uses 10+ artifacts, each of which uses 10+, and so on, to a really large depth. Just terminating by depth is likely not the right choice, and even then, it is a huge number of attestations. You really do need to understand the predicates to do that effectively. Alternatively, you need to figure out a model that doesn't require this deep traversal (which is what I'm currently thinking for SLSA.)

  • What is the model? Right now we have Statement = Subject + Predicate. Though simple, it is easy to reason about and matches natural language. If we instead put all the graph edges in the Statement layer, we'd likely need to use a different analogy. I'm not sure that a property graph (which is what you describe) is really the right model - it doesn't easily express Provenance, for examlpe. Instead it is better represented as a hypergraph or a bipartite graph (artifacts + attestations). I do want to express the graph nature in the model, but it will take much more time than we have available for v1.

  • Will this support all predicates easily? For example, in the current model we can simply use SPDX as a predicate and it works (though not particularly cleanly). With the links in the Statement layer, now SPDX would have to simply ignore those links (making the links feature useless) or we'd need to figure out how to rework SPDX to fit the new model (complicating onboarding).

To be clear, I share your gut feeling that it's the "right" design choice, but for v1 we'll leave it to the Predicate and see what type of conventions emerge. Once we have real-world data, we can reevaluate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I filed a dedicated issue for this: in-toto/attestation#6. Let's move any further discussion to that issue.

ITE/6/README.md Outdated
"builder": { "id": "https://github.com/Attestations/GitHubHostedActions@v1" },
"recipe": {
"type": "https://github.com/Attestations/GitHubActionsWorkflow@v1",
"material": "git+https://github.com/curl/curl@curl-7_72_0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an implicit or explicit assumption somewhere that these materials that are part of the custom fields also need to be listed as part of the top-level materials? Either way it seems a bit repetitive and error prone, perhaps this can be simplified / normalized with a more structured schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is addressed in the latest design ("definedInMaterial: 0").

- Reference SLSA.
- Split into a distinct `predicateType` + `predicate` based on the SLSA
  attestation model.
- Convert `subject` and `materials` to lists of objects, rather than
  maps from URI to DigestSet, so that (1) we can more naturally extend
  with more fields, such as media type, and (2) to allow easier
  indexing.
- Add a proto definition of the Statement layer.
- Move the curl example to README (no need for a duplicate YAML file.)
- Switch to lowerCamelCase, which is already used by signing-spec.
This simplifies the model to allow consumers to rely on its existence.
Most importantly, use "Statement" in the type URI instead of
"Attestation".
@MarkLodato
Copy link
Contributor

Status update: The latest version of this PR is a good candidate for v1. Please comment on it if you have any questions or concerns.

Our intention is to keep v1 a "minimally viable product" with only the things we are confident about supporting long term. For example, v1 does not yet support artifacts that are not identified by digest (e.g. SVN revisions). That is a feature we will need to support, but since we're not yet sure about how that will work, we will defer it to a future version.

Going forward, my plan is to move this spec to the new https://github.com/in-toto/attestation repo, where we can independently version the spec and add more docs, examples, a reference implementation, and so on. Once that happens, I'll update this ITE to effectively say "in-toto will support this new attestation format". (This is how we handled ITE-5.)

ITE/6/README.md Outdated Show resolved Hide resolved
Co-authored-by: Aditya Sirish <[email protected]>
ITE/6/README.md Outdated Show resolved Hide resolved
ITE/6/README.md Outdated

```json
{
"attestation_type": "https://example.com/VulnerabilityScan/v1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I worry about this sort of attestations (e.g. vulnerability scans), since they seem to break monotonicity. Is this something that we should worry about?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a concern in general, but here it is monotonic. On line 415, we check for a positive assertion that there are zero vulnerabilities. Once we define that schema for real, we will be sure to make this point clear to policy authors. We also plan to provide policy "templates" that people instantiate, and we'll design those templates to be monotonic.

That said, the situation is a bit more nuanced. There are actually two use cases, only the first of which requires monotonicity:

  • Admission control: only allow "good" things. In this case, we require positive attestation of goodness, and allow even if there exists an attestation of badness.
  • Remediation: prevent "bad" things on a best effort basis. In this case, we would want to know about the attestation of badness, even if the artifact was previously allowed.

A policy that encompasses both might be "allow if there exists an attestation showing zero vulnerabilities, but deny if there exists an attestation showing a high severity vulnerability."

@nenaddedic FYI

Co-authored-by: Tiziano Santoro <[email protected]>
> `materials`.

<a id="recipe.definedInMaterial"></a>
`recipe.definedInMaterial` _string, optional_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be int, right? It's mentioned as an int above and in the example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Fixed.

MarkLodato added a commit to MarkLodato/attestation that referenced this pull request Apr 12, 2021
This imports the history of [ITE-6] into the new attestation repo.
Commit IDs have changed from the ITE repo because the history only
includes ITE-6. Original commit 410d15599f034ce30166c65ef5986e309759b706
maps to the parent commit, efdc870.

[ITE-6]: in-toto/ITE#15
@MarkLodato
Copy link
Contributor

Status update: I just tagged v0.1.0 of https://github.com/in-toto/attestation. This is a tagged release for anyone wanting to build off a non-moving target. Next step is to update this ITE to reference that repo. I hope to be able to do that next week.

Most of the content is removed since the attestation repo defines
everything. Now ITE-6 is basically "support attestations".

Because this ITE does not update the layout language, it only supports
link-type predicates. A future ITE is expected to add generic support
for attestations.
@MarkLodato
Copy link
Contributor

This PR is now ready for review. It is updated to only refer to https://github.com/in-toto/attestation, and furthermore to only accept link-type predicates.

There are still a few missing sections (notably testing), and it's still in Markdown, but hopefully it's good enough to be accepted as DRAFT status, or at least there are concrete things I can fix to make it ready. :-)

@SantiagoTorres SantiagoTorres merged commit 6662df8 into in-toto:master Jun 8, 2021
MarkLodato added a commit to MarkLodato/slsa that referenced this pull request Aug 3, 2021
This imports the history of [ITE-6] into the new attestation repo.
Commit IDs have changed from the ITE repo because the history only
includes ITE-6. Original commit 410d15599f034ce30166c65ef5986e309759b706
maps to the parent commit, 513fac6.

[ITE-6]: in-toto/ITE#15
MarkLodato added a commit to MarkLodato/slsa that referenced this pull request Aug 3, 2021
This imports the history of [ITE-6] into the new attestation repo.
Commit IDs have changed from the ITE repo because the history only
includes ITE-6. Original commit 410d15599f034ce30166c65ef5986e309759b706
maps to the parent commit, a40c931.

[ITE-6]: in-toto/ITE#15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants