ITE-6: Generalized link format #15

TomHennen · 2020-10-30T20:55:11Z

This is an initial draft of ITE-6 that defines a generalized link format. See the document for reasoning.

This has been discussed with @SantiagoTorres who will be sponsoring this ITE.

This was authored primarily by @MarkLodato (who is on leave). Most credit goes to him.

I've converted it to Markdown, any formatting issues are likely my fault.

Several sections are still empty. They need to be fleshed out (or removed) before the ITE is accepted.

Based on Google Doc written by [email protected]. I still need to clean up the markdown.

TomHennen · 2020-10-30T20:57:25Z

ITE/6/README.md

+
+## Authentication and serialization
+
+[ITE-5](https://github.com/MarkLodato/ITE/blob/ite-5/ITE/5/README.md) defines a


NOTE: If accepted this will be the signing spec (secure-systems-lab/dsse#2) instead of ITE-5.

TomHennen · 2020-10-30T20:58:01Z

ITE/6/README.md

+the following fields. See subsequent sections for
+[Type definitions](#type-definitions) and [Reasoning](#reasoning).
+
+`attestation_type` _(URI, required)_


It might be useful to add a story about how people add new attestation types. E.g. how do they define the type and add the policy evaluation code.

Thoughts?

TomHennen · 2020-10-30T20:59:31Z

ITE/6/README.md

+> `artifact` _(ArtifactReference, required)_
+>
+> > Identifies the related software artifact. The following standard relations
+> > are recommended[^1]: * `top_level_source`: The primary input used to


From @MarkLodato "I'm [...] wondering whether we should have different "classes" of attestations, notably Provenance, CodeReview, TestResult (including vuln scan), and PolicyDecision, each with their own set of relations."

We decided against this for now, to keep things simple. We may want to revisit this in a future iteration.

This makes the table easier to read in both the Markdown as well as the rendered HTML.

The snippets are supposed to be JSON, not arbitrary Javascript.

lukpueh · 2021-01-28T10:46:29Z

ITE/6/README.md

+`attestation_type` _(URI, required)_
+
+> Indicates the meaning of this attestation and how to interpret `details` and
+> `relations`. Example:


There is quite a bit of forward declaration for fields and types in this section. It would be nice, to try to re-structure so that terms are defined before they are used as part of other definitions.

Please take a look at the latest revision, which has changed a bit.

- Clarify goals. - Split specifications to their own files. - Avoid introducting new terminology or talking about "layers" because doing so adds complexity without benefit. - Use a much simpler schema that is both closer to existing in-toto 0.9 as well as easier to read and write. - Document the new Provenance type. - Other minor fixes.

MarkLodato · 2021-02-11T16:06:22Z

FYI, I just pushed a pretty significant rewrite, based on feedback from Santiago and others. It is now proposes much more modest changes from current in-toto, keeping materials and products (now called subject, with slightly different semantics) and making all the other fields type-specific.

The design still needs more work, but hopefully it is improving! :-D

TomHennen · 2021-03-02T22:03:03Z

ITE/6/spec/sbom.md

+
+```jsonc
+{
+  "attestation_type": "https://in-toto.io/Provenance/v1",


Should 'attestation_type' say something about SBOM instead of 'Provenance' ?

Fixed. It is now SPDX, which is the specific format. (SBoM is not a specific format.)

tiziano88 · 2021-03-22T22:38:36Z

ITE/6/README.md

+## Introduction
+
+An **attestation** is the generalization of an in-toto link. It is a statement
+about an artifact, signed by an attester. Each attestation has a type indicating


Does it need to be about one specific artifact? What about build steps that produce multiple ones? Does each of them end up having a separate attestation?

We had thought there would be one attestation per artifact, but our work with Busytown made us realize we need to support multiple artifacts per-attestation. This does make writing good policy somewhat harder.

Note that the example does have multiple artifacts specified.

Clarified: "set of artifacts".

tiziano88 · 2021-03-22T22:40:33Z

ITE/6/README.md

+We expect to a few more standard attestation types over time, and customers may
+define their own custom attestation types if desired.


Is there a mechanism for registering types? Given they are represented by URIs / URLs, is there a meaning behind what these resolve to (if at all)?

No. This is now stated explicitly.

tiziano88 · 2021-03-22T22:40:46Z

ITE/6/README.md

+
+## Design notes
+
+Attestations SHOULD be designed to encourage policies to be "monotonic," meaning


+1 for monotonicity

tiziano88 · 2021-03-22T22:43:28Z

ITE/6/README.md

+There are two main reasons for standardizing `subject` and `materials` within
+Attestation schema.


Did I miss the fact that these are in fact standardized? Should this be mentioned earlier on?

Clarified. PTAL.

tiziano88 · 2021-03-22T22:48:50Z

ITE/6/README.md

+First, doing so allows policy engines to make decisions without requiring
+`attestation_type`-specific logic or configuration. Binary Authorization
+policies today are purely about "does an attestation exist that is signed by X
+with subject Y", and similarly in-toto layouts are about "does an attestation
+exist that is signed by X with materials/products Z?"[1] These relatively simple
+policies are quite powerful. With this proposal, such policies become more
+expressive without any additional configuration: "does an attestation exist that
+is signed by X having type T, with subject Y and/or materials Z?"


Another way of looking at this: an attestation is a partial definition of a directed graph, in which edges connect sources materials to artifacts (to use your terminology), and the edges themselves are labelled with a pointer to the attestation that introduced them. Then these attestation compose naturally, and it is also possible to traverse this graph forwards or backwards, also taking into account the trust model in the various principals that sign these attestations.

This also produces a nice way of formulating what a "root of trust" is: from a given node in the graph, follow links backwards for attestations that are trusted (e.g. "GitHub actions"), until you reach an attestation that you cannot decompose any further (because you don't trust the upstream attestations).

If we go with this semantics, effectively an attestation with a set of N materials and a set of M subjects corresponds to (N * M) edges that contribute to the graph. This assumes all the materials contribute to all the subjects, which I think is a safe assumption; if this is not desired, then perhaps the attestation should be denormalized into individual edges, in order to offer more precision, at the expense of higher verbosity and lower entropy for the more common cases.

BTW I think this also would connect nicely with @dlorenc 's ideas on how to incorporate more expressive entries in sigstore / rekor that contain links to / from other log entries.

BTW I think this also would connect nicely with @dlorenc 's ideas on how to incorporate more expressive entries in sigstore / rekor that contain links to / from other log entries.

That's no coincidence! @SantiagoTorres was behind much of that thinking :)

Perhaps something in between:

{ "attestation_type": "https://example.com/CustomBuilder", "subjects": [ "X", "Y" ], "links": [ { "materials": [ "A" ], "link_type": "docker_image", "custom_field_0": "foo", "custom_field_1" : "bar", }, { "materials": [], "link_type": "command", "command": "rustc build", }, { "materials": [ "D" ], "link_type": "mounted_file", "mount_point": "/var/src", }, ] }

the list of subjects represents the artifacts that were generated / endorsed

for each item in the links list, the materials and link_type fields are mandatory

materials may be empty (e.g. for code reviews, manual inspection, etc.)

link_type defines how to interpret the list of materials in that link, plus any additional custom fields for that link

The nice thing about this approach is (I think) that the materials are normalized (they only appear exactly once, and alongside the corresponding metadata), and they can easily be parsed without having to know the meaning of the link types, in order to traverse the graph.

This also makes it easy to incorporate this in rekor: extend the rekor format to contain a list of related entries, which are effectively extracted from the individual link materials in this attestation format; this way the rekor log does not need to store the entire attestation, which may be too large and / or contain undesirable data, but the links to other entries are explicit and those can be extracted and indexed by rekor without knowing what they mean; of course rekor would also index the hash of the original (unredacted) attestation, so that, if someone could fetch that from somewhere else, they could check that it was indeed correctly incorporated in the log, and they would have the information about the semantics of the individual link edges.

WDYT?

Please take a look at the latest design.

materials is no longer in the Statement layer. Instead, it is only in the Provenance-type predicate. This means that we can more easily change it in the future. The downside means that you can't traverse the graph without understanding the Predicate type. However, the value of being able to traverse the graph without actually understanding the edge meanings is dubious. I can believe that perhaps there is a use case out there, but until we have a concrete one, our plan is to push as much down into the Predicate layer as possible. This allows each Predicate type to do what is natural in that domain. Also, things like SPDX have their own language for links, and our model now supports that.

Within Provenance, materials is an array instead of a map, so (1) the reference is by index instead of name, so there is less duplication, and (2) we have richer objects so you can attach more properties to them.

I don't follow the connection to Rekor. The big thing is standardizing subject so that you know what artifact the attestation is about. The links would be processed by the consumer, and they'd do further queries to walk the graph.

Thanks Mark!

I think there is some value in representing this as a graph that can be traversed without understanding the meaning of the edges; e.g. in Oak we may want a way to recursively download and store all the attestations starting from a given artifact, and spanning the graph (perhaps up to a certain depth). This would form a sort of collection of facts about the binary, as well as its dependencies, and so on, which may be stored offline. Then a policy engine may take this as input and decide whether or not to accept the given binary (e.g. whether to run it or not).

Note that the component that does the downloading / traversal may not know anything at all about the meaning of the predicates, and the policy engine may not be able to perform queries on line.

Additionally, IMO I think there is a nice symmetry in considering each material (or set of materials, if that helps) as a directed edge in a graph (from / to the subject), in which the predicate is effectively an attribute of that edge. This way the topology of the graph should be traversable (and even cacheable, as per my previous paragraph) even without having to know anything about the individual predicates. :)

I can see the prospects of this idea, but it's not for v1. :-) We're looking to release v1 within the next two weeks, and there are just too many unknowns to make the links part of the Statement layer.

Our bar for putting something in the Statement layer is that there must be widespread, concrete use cases showing the need to have the field be common for all attestations. Right now, the use cases for links in the Statement layer are still theoretical.

Some challenges:

How do you terminate the graph traversal? The graph in reality is intractably large. For example, every build likely uses 10+ artifacts, each of which uses 10+, and so on, to a really large depth. Just terminating by depth is likely not the right choice, and even then, it is a huge number of attestations. You really do need to understand the predicates to do that effectively. Alternatively, you need to figure out a model that doesn't require this deep traversal (which is what I'm currently thinking for SLSA.)

What is the model? Right now we have Statement = Subject + Predicate. Though simple, it is easy to reason about and matches natural language. If we instead put all the graph edges in the Statement layer, we'd likely need to use a different analogy. I'm not sure that a property graph (which is what you describe) is really the right model - it doesn't easily express Provenance, for examlpe. Instead it is better represented as a hypergraph or a bipartite graph (artifacts + attestations). I do want to express the graph nature in the model, but it will take much more time than we have available for v1.

Will this support all predicates easily? For example, in the current model we can simply use SPDX as a predicate and it works (though not particularly cleanly). With the links in the Statement layer, now SPDX would have to simply ignore those links (making the links feature useless) or we'd need to figure out how to rework SPDX to fit the new model (complicating onboarding).

To be clear, I share your gut feeling that it's the "right" design choice, but for v1 we'll leave it to the Predicate and see what type of conventions emerge. Once we have real-world data, we can reevaluate.

For the record, I filed a dedicated issue for this: in-toto/attestation#6. Let's move any further discussion to that issue.

tiziano88 · 2021-03-23T00:09:42Z

ITE/6/README.md

+  "builder": { "id": "https://github.com/Attestations/GitHubHostedActions@v1" },
+  "recipe": {
+    "type": "https://github.com/Attestations/GitHubActionsWorkflow@v1",
+    "material": "git+https://github.com/curl/curl@curl-7_72_0",


Is there an implicit or explicit assumption somewhere that these materials that are part of the custom fields also need to be listed as part of the top-level materials? Either way it seems a bit repetitive and error prone, perhaps this can be simplified / normalized with a more structured schema.

I think this is addressed in the latest design ("definedInMaterial: 0").

- Reference SLSA. - Split into a distinct `predicateType` + `predicate` based on the SLSA attestation model. - Convert `subject` and `materials` to lists of objects, rather than maps from URI to DigestSet, so that (1) we can more naturally extend with more fields, such as media type, and (2) to allow easier indexing. - Add a proto definition of the Statement layer. - Move the curl example to README (no need for a duplicate YAML file.) - Switch to lowerCamelCase, which is already used by signing-spec.

This simplifies the model to allow consumers to rely on its existence.

Most importantly, use "Statement" in the type URI instead of "Attestation".

MarkLodato · 2021-04-10T19:01:56Z

Status update: The latest version of this PR is a good candidate for v1. Please comment on it if you have any questions or concerns.

Our intention is to keep v1 a "minimally viable product" with only the things we are confident about supporting long term. For example, v1 does not yet support artifacts that are not identified by digest (e.g. SVN revisions). That is a feature we will need to support, but since we're not yet sure about how that will work, we will defer it to a future version.

Going forward, my plan is to move this spec to the new https://github.com/in-toto/attestation repo, where we can independently version the spec and add more docs, examples, a reference implementation, and so on. Once that happens, I'll update this ITE to effectively say "in-toto will support this new attestation format". (This is how we handled ITE-5.)

ITE/6/README.md

Co-authored-by: Aditya Sirish <[email protected]>

ITE/6/README.md

tiziano88 · 2021-04-10T20:44:57Z

ITE/6/README.md

+
+```json
+{
+  "attestation_type": "https://example.com/VulnerabilityScan/v1",


As an aside, I worry about this sort of attestations (e.g. vulnerability scans), since they seem to break monotonicity. Is this something that we should worry about?

Yes, that is a concern in general, but here it is monotonic. On line 415, we check for a positive assertion that there are zero vulnerabilities. Once we define that schema for real, we will be sure to make this point clear to policy authors. We also plan to provide policy "templates" that people instantiate, and we'll design those templates to be monotonic.

That said, the situation is a bit more nuanced. There are actually two use cases, only the first of which requires monotonicity:

Admission control: only allow "good" things. In this case, we require positive attestation of goodness, and allow even if there exists an attestation of badness.

Remediation: prevent "bad" things on a best effort basis. In this case, we would want to know about the attestation of badness, even if the artifact was previously allowed.

A policy that encompasses both might be "allow if there exists an attestation showing zero vulnerabilities, but deny if there exists an attestation showing a high severity vulnerability."

@nenaddedic FYI

Co-authored-by: Tiziano Santoro <[email protected]>

kommendorkapten · 2021-04-12T10:27:42Z

ITE/6/spec/provenance.md

+> `materials`.
+
+<a id="recipe.definedInMaterial"></a>
+`recipe.definedInMaterial` _string, optional_


This should be int, right? It's mentioned as an int above and in the example.

Oops! Fixed.

This imports the history of [ITE-6] into the new attestation repo. Commit IDs have changed from the ITE repo because the history only includes ITE-6. Original commit 410d15599f034ce30166c65ef5986e309759b706 maps to the parent commit, efdc870. [ITE-6]: in-toto/ITE#15

MarkLodato · 2021-04-27T21:40:41Z

Status update: I just tagged v0.1.0 of https://github.com/in-toto/attestation. This is a tagged release for anyone wanting to build off a non-moving target. Next step is to update this ITE to reference that repo. I hope to be able to do that next week.

Most of the content is removed since the attestation repo defines everything. Now ITE-6 is basically "support attestations". Because this ITE does not update the layout language, it only supports link-type predicates. A future ITE is expected to add generic support for attestations.

MarkLodato · 2021-05-06T21:30:20Z

This PR is now ready for review. It is updated to only refer to https://github.com/in-toto/attestation, and furthermore to only accept link-type predicates.

There are still a few missing sections (notably testing), and it's still in Markdown, but hopefully it's good enough to be accepted as DRAFT status, or at least there are concrete things I can fix to make it ready. :-)

This imports the history of [ITE-6] into the new attestation repo. Commit IDs have changed from the ITE repo because the history only includes ITE-6. Original commit 410d15599f034ce30166c65ef5986e309759b706 maps to the parent commit, 513fac6. [ITE-6]: in-toto/ITE#15

This imports the history of [ITE-6] into the new attestation repo. Commit IDs have changed from the ITE repo because the history only includes ITE-6. Original commit 410d15599f034ce30166c65ef5986e309759b706 maps to the parent commit, a40c931. [ITE-6]: in-toto/ITE#15

TomHennen added 9 commits October 30, 2020 13:15

Initial commit of ITE-6

fb6f331

Based on Google Doc written by [email protected]. I still need to clean up the markdown.

Some cleanup. Testing linking to headers.

5481273

fix headers

e2a86be

remove alerts

fba8547

cleanup some indentation, more to come

67c86f0

Formatting fixes

ae11860

Add languages to code blocks

f0736b5

add wietse as contributor

2900cdd

Removed incomplete non-functional requirement

6073890

TomHennen commented Oct 30, 2020

View reviewed changes

MarkLodato added 4 commits January 20, 2021 13:09

Add ITE-6 to README.

7b233cd

Fix Markdown formatting.

18534b7

Simplify "Link" schema table and convert to Markdown.

12cf578

This makes the table easier to read in both the Markdown as well as the rendered HTML.

Use 'json' in codeblocks.

fdee32c

The snippets are supposed to be JSON, not arbitrary Javascript.

lukpueh reviewed Jan 28, 2021

View reviewed changes

Add sbom type

68db575

TomHennen commented Mar 2, 2021

View reviewed changes

tiziano88 reviewed Mar 22, 2021

View reviewed changes

tiziano88 reviewed Mar 23, 2021

View reviewed changes

MarkLodato added 7 commits April 6, 2021 12:48

Simplify the message about artifacts being untyped

18967f8

Clarify processing model

a21d57f

Update Link v1 to use the new attestation format.

3cc1416

Rename SBOM to SPDX and use new format.

4ee1bbe

Make subject.name required.

4537363

This simplifies the model to allow consumers to rely on its existence.

Refactor Envelope and Statement wording and URI.

19e656b

Most importantly, use "Statement" in the type URI instead of "Attestation".

MarkLodato added 4 commits April 9, 2021 15:00

Fix source repo in curl example.

b7aa37c

Clarify processing model.

b5f6e23

Clarify that attestation can be about multiple artifacts

25b1844

Update reasoning for new design

f69ed96

adityasaky reviewed Apr 10, 2021

View reviewed changes

ITE/6/README.md Outdated Show resolved Hide resolved

Typo fix: There -> The

9aaa2ab

Co-authored-by: Aditya Sirish <[email protected]>

tiziano88 reviewed Apr 10, 2021

View reviewed changes

Typo fix: conventions

f1ce077

Co-authored-by: Tiziano Santoro <[email protected]>

kommendorkapten reviewed Apr 12, 2021

View reviewed changes

doc fix: definedInMaterial is an integer

410d155

kommendorkapten mentioned this pull request Apr 13, 2021

Data structures for generalized link formats/ITE-6 in-toto/in-toto-golang#100

Merged

MarkLodato mentioned this pull request Apr 14, 2021

Predicate-agnostic graph representation in-toto/attestation#6

Open

MarkLodato added 2 commits April 24, 2021 09:08

Create ITE table header

253f2d9

Merge branch 'master' into ite-6

464b0d6

joshuagl mentioned this pull request May 20, 2021

Add metadata container classes for signed metadata secure-systems-lab/securesystemslib#272

Closed

SantiagoTorres merged commit 6662df8 into in-toto:master Jun 8, 2021


		## Authentication and serialization

		[ITE-5](https://github.com/MarkLodato/ITE/blob/ite-5/ITE/5/README.md) defines a

		We expect to a few more standard attestation types over time, and customers may
		define their own custom attestation types if desired.


		## Design notes

		Attestations SHOULD be designed to encourage policies to be "monotonic," meaning

		There are two main reasons for standardizing `subject` and `materials` within
		Attestation schema.

ITE-6: Generalized link format #15

ITE-6: Generalized link format #15

Conversation

TomHennen commented Oct 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarkLodato commented Feb 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarkLodato commented Apr 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarkLodato commented Apr 27, 2021

MarkLodato commented May 6, 2021