Skip to content

Conversation

@jsternberg
Copy link
Collaborator

@jsternberg jsternberg commented Oct 26, 2023

This modifies how build steps are unmarshaled from JSON into the
provenance attestation. The current method doesn't correctly handle
protobuf attributes that are used with oneof.

Fixes #4380.

@jedevc
Copy link
Member

jedevc commented Oct 26, 2023

I'm not sure if anyone's consuming this provenance attestation in that much detail, but if they are, it looks like this is a breaking change. I think that's probably alright, but definitely important to highlight this in the release notes.

In principle, I think this would be good to take though 🎉

@jsternberg jsternberg force-pushed the provenance-protobuf-marshal branch from 8dee6c4 to a651ee0 Compare October 26, 2023 17:29
@jsternberg
Copy link
Collaborator Author

Yea this would definitely be a breaking change. I think it might be worth it though because attempting to unmarshal the operation into a usable structure seems to be broken. You either have to create your own JSON structures (since the current usage also causes unmarshaling file actions to be broken) or just give up on attempting to process them.

@jsternberg jsternberg force-pushed the provenance-protobuf-marshal branch from a651ee0 to 30e4ebf Compare October 26, 2023 17:34
@jedevc jedevc mentioned this pull request Nov 2, 2023
@tonistiigi
Copy link
Member

As we discussed offline, these are custom BuildKit types that don't have any definition that they need to match some proto types or proto-JSON spec. Breaking backwards compatibility would be quite a big problem here.

Otoh, it would be nice to provide some way to unmarshal these steps (and for example convert to new build definition) without the need for the parser to duplicate everything. But my understanding is that just switching to the jsonpb itself does not fix this specific problem.

What if we define custom UnmarshalJSON methods to the types that define the oneoff interfaces so that they accept the current output with dynamic processing? I guess the Op and FileAction are the problematic types. Would that solve the unmarshaling problem?

@jsternberg
Copy link
Collaborator Author

The unmarshaling problem is mostly a problem with encoding/json itself and interfaces. Sinceo interfaces don't have any concrete type, encoding/json can't decide on a specific underlying implementation during unmarshaling. There also doesn't appear to be a way to intercept the unmarshaling process and provide a custom way to allocate the target parameter. If you want to decide on a custom target for an interface, you have two easy options without any magic.

  1. Find some way to take the output of interface{} and map it to the real type.
    1. Easiest way to do that in the standard library is to remarshal interface{} back to JSON and then unmarshal to a specific interface.
    2. mapstructure exists specifically for this.
  2. Wrap the existing type in another type with the concrete type.

An example of the second would be to do this:

type BuildConfig struct {
	Definition    []BuildStep              `json:"llbDefinition,omitempty"`
	provenance.BuildConfig
}

type BuildStep struct {
	Op Op `json:"op,omitempty"`
	provenance.BuildStep
}

type Op struct {
	Op pb.Op
}

func (o *Op) UnmarshalJSON(data []byte) {
	// do jsonpb stuff here
}

The second technique can technically be used for any level of recursion but the more levels of recursion you have to duplicate the more difficult it is. If we're absolutely certain we cannot make any changes to the resulting JSON, we can always create an entirely different structure to parse the JSON. This technique is easiest though if we only have to do it once. The disadvantage of this technique is also that you have to duplicate each structure above the one that you're attempting to intercept so we're making a trivial change to BuildConfig here but the real structure also has ProvenancePredicate on top of that too that would need to be intercepted by a client.

This isn't really feasible to add to the standard library so any parser would need to do this.

It's definitely possible to use this without changing the JSON structure, but it is harder and more involved.

For the first, mapstructure likely wouldn't work because of just how the protobuf data structures work. You really have to use jsonpb to do it. So you can do this on the client after unmarshaling to retrieve the op:

data, err := json.Marshal(step.Op)
if err != nil {
	panic(err)
}

var op pb.Op
if err := jsonpb.Unmarshal(bytes.NewReader(data), &op); err != nil {
	panic(err)
}

This is quite a bit easier than the second, but it requires that we properly marshal the structure so we don't have to intercept the messages that have oneof in them.

For the second, you can technically do easier, but it's much easier to do the second if we properly marshal the structures to begin with.

I'm also going to spend a bit of time seeing if we can do some Go magic involving json.Unmarshaler, json.RawMessage, and generics and try to post a go playground example to demonstrate it.

@jsternberg
Copy link
Collaborator Author

jsternberg commented Nov 9, 2023

Another option (requires number 1) is just to use generics and that fixes the interface problem.

type ProvenancePredicate[T any] struct {
	slsa02.ProvenancePredicate
	Invocation  ProvenanceInvocation `json:"invocation,omitempty"`
	BuildConfig *BuildConfig[T]      `json:"buildConfig,omitempty"`
	Metadata    *ProvenanceMetadata  `json:"metadata,omitempty"`
}

type BuildConfig[T any] struct {
	Definition    []BuildStep[T]           `json:"llbDefinition,omitempty"`
	DigestMapping map[digest.Digest]string `json:"digestMapping,omitempty"`
}

type BuildStep[T any] struct {
	ID            string                  `json:"id,omitempty"`
	Op            T                       `json:"op,omitempty"`
	Inputs        []string                `json:"inputs,omitempty"`
	ResourceUsage *resourcestypes.Samples `json:"resourceUsage,omitempty"`
}

This isn't really as easy or feasible in Go due to not having named generics (you either specify all generics or none), but this specific structure only has one customization so it's definitely feasible and probably the easiest way.

@jsternberg jsternberg force-pushed the provenance-protobuf-marshal branch from 30e4ebf to 8b6fab8 Compare November 9, 2023 21:41
@jsternberg jsternberg changed the title llbsolver: marshal protobuf objects into the provenance attestation correctly llbsolver: unmarshal protobuf objects into the provenance attestation correctly Nov 9, 2023
@jsternberg
Copy link
Collaborator Author

After consulting with @tonistiigi, we've decided to keep the format the same and revisit the format in the future if we decide to make a different backwards incompatible change in some way.

I've modified this PR to make it so the existing structures can be used to unmarshal the JSON for clients.

@jsternberg jsternberg force-pushed the provenance-protobuf-marshal branch from 8b6fab8 to 226f3e0 Compare November 9, 2023 21:56
… correctly

This modifies how build steps are unmarshaled from JSON into the
provenance attestation. The current method doesn't correctly handle
protobuf attributes that are used with `oneof`.

Signed-off-by: Jonathan A. Sternberg <[email protected]>
@jsternberg jsternberg force-pushed the provenance-protobuf-marshal branch from 226f3e0 to 40fb5ce Compare November 13, 2023 15:57
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to not duplicate all the fields in the structs that use oneof to avoid cases where a field is added but forgotten in Unmarshal and it silently gets dropped. This is a case that is also hard to test for. Something like

func extractField(dt []byte, f string) ([]byte, *json.RawMessage, error) {
	m := make(map[string]*json.RawMessage)
	if err := json.Unmarshal(dt, &m); err != nil {
		return nil, err
	}
	v := m[f]
	delete(m, f)
	dt, err := json.Marshal(m)
	return dt, v, err
}

could solve that problem but it does get bit more complicated. I also wonder why there isn't a generic function that would do this using reflect.

This is ok for me atm., we can improve in follow-up if needed.

As we discussed offline, we should rethink if there are more changes needed if we ever do some other backwards incompatible changes in the future (eg. bumping SLSA).

@tonistiigi tonistiigi merged commit 5ae9b23 into moby:master Nov 14, 2023
@jsternberg jsternberg deleted the provenance-protobuf-marshal branch November 15, 2023 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Max provenance attestation marshals the operations incorrectly

3 participants