Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal for OCI v1.1 content source #87

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
284 changes: 284 additions & 0 deletions proposals/20230316-oci-content-source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
---
title: OCI Content Source
authors:
- @devigned
reviewers:
- tbd
creation-date: 2023-03-16
last-updated: 2023-03-16
status: implementable
---


# OCI Content Source

## Table of Contents
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals / Future Work](#non-goals--future-work)
- [Proposal](#proposal)
- [User Stories](#user-stories)
- [Story 1 - Publishing a Component](#story-1---publishing-a-component)
- [Story 2 - Publishing a Component Interface](#story-2---publishing-a-component-interface)
- [Story 3 - Publishing a Bundled Component](#story-3---publishing-a-bundled-component)
- [Story 4 - Fetching a Component](#story-4---fetching-a-component)
- [Requirements](#requirements)
- [Functional](#functional)
- [Non-Functional](#non-functional)
- [Implementation Details](#implementation-details)
- [Artifact Types](#artifact-types)
- [Image Manifest for Components](#image-manifest-for-components)
- [Image Manifest for Interface Components](#image-manifest-for-interface-components)
- [Image Manifest for Bundled Components](#image-manifest-for-bundled-components)
- [Image Manifest for Signing and SBOMs](#image-manifest-for-signing-and-sboms)
- [Image Manifests for Additional Metadata](#image-manifests-for-additional-metadata)
- [Warg Registry Implementation](#warg-registry-implementation)
- [Alternative Options](#alternative-options)
- [Publish Artifacts Using an Artifact Manifest](#publish-artifacts-using-an-artifact-manifest)
- [Pros](#pros)
- [Cons](#cons)
- [Conclusions](#conclusions)
- [Additional Details](#additional-details)
- [Test Plan](#test-plan)
- [Implementation History](#implementation-history)

## Summary

This proposal introduces a new content source kind to Warg, which will enable Warg to store and retrieve packages from OCI registries. OCI registries make it simple to store, share and manage package content, are broadly accessible from local environments to cloud service providers whom run managed OCI registries, and have a large, established set of tools built around the OCI registries and images.

## Motivation

At the time of writing this, Warg only supports a single content source kind `ContentSourceKind::HttpAnonymous`, which retrieves and persists package content via unauthenticated HTTP requests stored to the local file system. This works well for demo purposes, but is not a robust solution for long-term, distributed, scalable storage of package content.

OCI registries are a great match for the addressable content that Warg needs to store. Additionally, their nearly ubiquitous usage means that Warg registries will not need additional infrastructure to store package content, and related metadata and cryptographic assurances.

### Goals
- Define metadata and layer media types for storing Warg packages in OCI registries.
- Define a strategy for attaching software bill of materials (SBOMs), attestations, and signatures for Warg packages stored in OCI registries.
- Introduce `ContentSourceKind::OCIv1_1` to persist and fetch package content from OCI v1.1 compliant registries.
- Implement persistence and retrieval logic for `ContentSourceKind::OCIv1_1`
- Describe a pattern for attaching additional metadata to be attached to components stored in an OCI registry. Examples of additional metadata are debugging symbols, documentation, WIT, etc.

### Non-Goals
- Exhaustively describe specification for additional metadata to be attached to components stored in an OCI registry.

## Glossary
- [Component](https://github.com/bytecodealliance/SIG-Registries/blob/main/glossary.md#component): A component is defined by the (emerging) [W3C WebAssembly Component Model specification](https://github.com/WebAssembly/component-model) which defines a component as a portable binary built from WebAssembly core modules with statically-analyzable, capability-safe, language-agnostic interfaces. A component package is a type of [package](https://github.com/bytecodealliance/SIG-Registries/blob/main/glossary.md#package) whose contents are a component.
- [Bundled Component and Bundling](https://github.com/bytecodealliance/SIG-Registries/blob/main/glossary.md#bundled-component-and-bundling): A "bundled component" is a [component](https://github.com/bytecodealliance/SIG-Registries/blob/main/glossary.md#component) that only has interface dependencies and can thus run directly on a wasm engine that natively implements those interfaces
without requiring any registry access. "Bundling" is an automatic transformation on a [component](https://github.com/bytecodealliance/SIG-Registries/blob/main/glossary.md#component) that replaces [imports](https://github.com/bytecodealliance/SIG-Registries/blob/main/glossary.md#imports) of other components (in the
registry) with inline copies of those components (fetched from the registry at the time of bundling) to produce a bundled component.

## Proposal
### User Stories
#### Story 1 - Publishing a Component
Alex is an engineer working in a large organization which is building applications using Wasm components. Alex would like share the component that their team has built with others within their company. The company Alex works for has a lot of folks with experience running containers, and they have OCI registries already provisioned to store container images. Alex would like to publish the component their team has built into one of their company's OCI registry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unclear about what tools are supported for publishing and fetching components from the OCI registry. Can Alex and Erin use their familiar commands like oras push and oras pull to achieve this?


#### Story 2 - Publishing a Component Interface
Erin is an engineer working on an open source project that allows users of the project to extend the functionality of the project using Wasm components as plugins. Erin would like to publish the interface specification for their plugin, so that other developers can easily find the interface and develop plugins for their project. Erin would like to use the GitHub Container Registry to store the Wasm component interface.

#### Story 3 - Publishing a Bundled Component
Alex is an engineer working in a large, security focused organization which runs a lot of Linux containers in production. The security team at Alex's company requires Linux containers be signed and provide a software bill of materials. Alex has recently built a new application that instead of being packaged as a Linux container image, they have built their application targeting Wasm. In fact, Alex built their application using many Wasm components. Alex and their team have finalized the feature set for their first release, tested the application, and locked the version for all the dependencies. Alex would like to publish this version of their application with all the application dependencies bundled together. Alex would also like to sign the bundled application and includes a software bill of materials of the components bundled.

#### Story 4 - Fetching a Component

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a story for how these components are eventually run? How are they to be consumed after they are pulled down?

Is there a chance that there are platform specifics bits that are required to run the modules (I was looking at the image for slight which seems to need certs installed along side the module)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue here: deislabs/containerd-wasm-shims#89

Erin is an engineer working on an open source project that uses Wasm components. Erin needs to pad the left side of strings in their application and rather than write this functionality, Erin wants to find and use a component that implements this functionality. Erin's friend Alex told them about their awesome leftpad component. Erin adds a dependency in their project for Alex's leftpad component. When the dependency is added, Erin's computer fetches the component from GitHub Container Registry, validates the signature on the component, and provides Erin a software bill of materials for the contents of the leftpad component.

### Requirements
#### Functional
- FR1. Warg MUST support publishing components to an OCIv1 compatible registry.
- FR2. Warg MUST support propagating signatures and bills of materials to OCIv1 content stores.
- FR3. Warg SHOULD support existing container image secure supply chain tooling allowing existing investments in container secure supply chains to be leveraged for Wasm components.
#### Non-functional
- TODO

### Implementation Details
At the time of authoring this proposal, the OCI v1.1 image specification is still in release candidate state, and there is a bit of flux with regard to artifacts. This proposal will take into account the most up-to-date state of the image specification and may evolve as changes in the OCI specifications evolve.

#### Artifact Types
Image manifests will be differentiated based on the `artifactType` field in the image manifest. The following media types will be used.
- Component: "application/vnd.wasm.component.v1"
- Interface: "application/vnd.wasm.component.interface.v1"
- Bundled Component: "application/vnd.wasm.component.bundled.v1"

#### Image Manifest for Components
The following is an example image manifest for a component containing a configuration structure, a layer containing the `my-component.wasm` binary.
```json
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"artifactType": "application/vnd.wasm.component.v1",
"config": {
"mediaType": "application/vnd.wasm.component.config.v1+json",
"digest": "sha256:5587da2246a78f08c447bff2ac91ee5c2b57be2f2a15244b5e618ac0be626885",
"size": 331
},
"layers": [
{
"mediaType": "application/vnd.wasm.content.layer.v1+wasm",
"digest": "sha256:b36aa5d0111a488937361fdb35432510d50675a11d566c5d9e82a147fb9ff552",
"size": 2087464,
"annotations": {
"org.opencontainers.image.title": "my-component"
}
}
]
}
```

The following is an example of the configuration structure referenced in the preceding image manifest.
```json
{
"mediaType": "application/vnd.wasm.component.config.v1+json",
"architecture": "wasm32",
"os": "wasi"
}
```
Comment on lines +128 to +135

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this configuration file is necessary; it might be redundant:

  • While wasm32 shows up in the compiler target-triple, that is just to configure code generation. Given a .wasm binary, there is no distinction to be made, it has one fixed meaning defined by the spec and whether it contains 32- or 64-bit memories and instructions operating thereupon is just an internal impl detail.
  • I think the way we should think about "os" is: what are the interfaces imported by this .wasm. If you see wasi_snapshot_preview1, well, that's Preview 1. Starting with Preview 2, there isn't one monolithic "WASI" at all; you'll see imports of wasi:http/outgoing-handler or wasi:filesystem/types etc and so your runtime either supports or doesn't support those individual interfaces (and perhaps we want to have some sort of OCI Runtime spec enumerating the set of interface ids guaranteed to be present?). However, there should be no added information saying that the os is wasi.


#### Image Manifest for Interface Components
The following is an example image manifest for a component containing a configuration structure and a layer containing the `my-component-interface.wasm` binary.
Comment on lines +137 to +138

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, in our most-recent discussions, we were realizing that there's not really a need to distinguish between "component packages" and "interface packages". Both are represented as components, and an "interface package" is just a component that exports types (that represent Wit interfaces and worlds). But from a registry perspective, I don't have to care: when I see an interface identifier foo:bar/baz, I find the package foo:bar which must resolve to a component, and then I look for an export named baz which must resolve to a type representing a Wit interface and, if either of those aren't the case, foo:bar/baz is not a valid interface identifier. Thus, I don't think we'll actually need a separate artifactType here; we can simply publish "components" and client tooling can do what it needs to.

```json
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"artifactType": "application/vnd.wasm.component.interface.v1",
"config": {
"mediaType": "application/vnd.wasm.component.config.v1+json",
"digest": "sha256:c71d239df91726fc519c6eb72d318ec65820627232b2f796219e87dcf35d0ab4",
"size": 331
},
"layers": [
{
"mediaType": "application/vnd.wasm.content.layer.v1+wasm",
"digest": "sha256:dcf07c6bb395d6e1d40505b77e70af04f5fae0d54c9573fd379c1e7355a18cf3",
"size": 2087464,
"annotations": {
"org.opencontainers.image.title": "my-component-interface"
}
}
]
}
```

The following is an example of the configuration structure referenced in the preceding image manifest.
```json
{
"mediaType": "application/vnd.wasm.component.config.v1+json",
"architecture": "wasm32",
"os": "wasi"
}
```

#### Image Manifest for Bundled Components

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we take "bundled component" to mean "a component with its component-dependencies replaced with inline components" (as defined above), then this artifact type is not a bundled component -- it's a component combined with environment variable configuration and static assets: that's not a component at all, it's something bigger that contains a component, so think we should give it a different name. Because this bigger thing is no longer composable the way components are composable, I'd suggest calling it a "wasm app" (or something that indicates that it's the final product that you can deploy), and not including "component" in the name at all (that's an impl detail of the app, and the app could just as well contain a core wasm module as, I believe, it's doing today).

The following is an example image manifest for a component containing a configuration structure, a layer containing the `my-bundled-component.wasm` binary, and a data layer containing some a static asset.
```json
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"artifactType": "application/vnd.wasm.component.bundled.v1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for taking a hard dependency on OCI v1.1?
The alternative would be to only rely on the config.mediaType for now, and not mandate the top-level artifactType, which would make it possible to push to OCI v1.0-compatible registries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping someone would bring this up. I don't have a strong opinion. On one hand, I believe we should push for the most portable specification that fits our needs. On the other hand, I've received feedback that image manifests are not extremely specific and relying on the config.mediaType would be less expressive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: looking at the manifest example again, it is not a valid manifest for either v1 or v1.1. There are two options:

  • either mediaType: application/vnd.oci.artifact.manifest.v1+json and artifactType: application/vnd.wasm.component.bundled.v1 , and a list of blobs — OCI v1.1, not supported everywhere yet, but the better long term solution
    OR
  • setting config.mediaType and a list of layers (effectively ORAS) — which still has a top-level media type of a container image, but is accepted by most registries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, I'm misunderstanding the current work to remove artifact.md opencontainers/image-spec#999, and the guidance for artifacts in opencontainers/image-spec#1043, but it seems like artifact manifests will not be a thing in v1.1.

I tried to model this based on opencontainers/image-spec#1043 section on Guidelines for Artifact Usage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — if artifact manifests are not a thing in 1.1, then option 2 above is the only way to make this work for now (i.e. artifactType will not exist).

Copy link

@AaronFriel AaronFriel Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.b.: Setting artifactType is supported in OCI 1.0 and registries are out of conformance if they error on unknown fields. You can use both, and you can also set a scratch media type for config.mediaType if it's irrelevant to you.

I think Option 2 makes the most sense, using a scratch mediatype for config.mediaType - edit: I see you have defined your own media type for a config blob. That works as well!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not a significant value add to having an artifactType when there is a dedicated config.mediaType. You'll find some registries have compatibility issues. And a lot of tooling hasn't been updated to read this yet.

"config": {
"mediaType": "application/vnd.wasm.component.config.v1+json",
"digest": "sha256:105ab3237b4f0d885700892a0f4b3482d1146dff27c88d46f02b8bd7ef67c3de",
"size": 331
},
"layers": [
{
"mediaType": "application/vnd.wasm.content.layer.v1+wasm",
"digest": "sha256:2e94e0582fb925e89515435513496819dc8f364f2da400059a64d6d1412ca2ad",
"size": 2087464,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the assumption here that the bundled component is a statically linked component that includes all composing components and core modules?

This means there is no way to do layer de-duplication when pushing, and the registry would have to store the bytes for the same components for every component that takes a certain component as a dependency.

I'd like to propose a transparent way for clients to split bundled component at distribution time, and reassemble it when pulling the artifact.
This is described in this proposal — https://hackmd.io/50rfwV6BTJWN8VZBhdAN_g

And a prototype for such a tool can be found here — https://github.com/fermyon/wasm-splice

We could take this approach for nested components, core modules, and data sections.

"annotations": {
"org.opencontainers.image.title": "my-bundled-component"
}
},
{
"mediaType": "application/vnd.wasm.content.layer.v1+data",
"digest": "sha256:8c69a84ec5adec97e47d4250410a7689046762aaa8e89f82ddbb4a89acb7388e",
"size": 96
}
]
}
```

The following is an example of the configuration structure referenced in the preceding image manifest.
```json
{
"mediaType": "application/vnd.wasm.component.config.v1+json",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding on what I was saying above, I don't think we should consider this file "component configuration", but rather some sort of "wasm app" configuration. Once wasi-virt is built and working, I think there shouldn't even need to be a concept of "component configuration": the contents of this configuration file should go into the component: files go into data segments in a component generated by wasi-virt that bundles the original component (so that the new outer component does not import wasi:filesystem/types at all). Similarly, environment variables can be set by having wasi-virt virtualize wasi:cli/environment. The large win from doing it this way is that the output of wasi-virt is still a component that can be further composed and manipulated by downstream component tooling (enabling virtual platform layering). In the meantime, it makes total sense to use this config file as a stopgap (or maybe in perpetuity if folks are wanting to do other app-level configuration stuff, which I've heard); I just want to name it appropriately so that "component" really does mean "component" (and nothing else).

"architecture": "wasm32",
"os": "wasi",
"wasi": {
"environment": {
"env1": "first",
"env2": "second"
},
"files": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assume WASI in the runtime environment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. Perhaps, it shouldn't. What do you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are able to reference static files as data sections in the component (see component below), then there is not need for WASI to be required to access files that are part of the component.

Now, short term, we might have to, but I want to make sure this is not mandating all environments that run components should always allow WASI.

{
"guest": "cat.png",
"digest": "sha256:8c69a84ec5adec97e47d4250410a7689046762aaa8e89f82ddbb4a89acb7388e"
}
]
}
}

```

#### Image Manifest for Signing and SBOMs
The following example illustrates signing a component using Notary V2. Use of Notary V2 could be replaced with any other signing implementation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the value of picking a specific signing solution for this spec. Is there something that makes this content need a specific signing solution? If not, I'd cut the section and just say image signing is recommended, or say nothing at all since signing is separate from the content being signed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just using it as an example. It was not intended to imply any favor to one signing solution vs another.

- The signature manifest `mediaType` MUST be an image manifest.
- The subject descriptor digest MUST point to an image manifest for a component.
```json
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.cncf.notary.signature",
"size": 2,
"digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a"
},
"layers": [
{
"mediaType": "application/jose+json",
"digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0",
"size": 32654
}
],
"subject": {
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:e41e72e96cf23dc26baa6931e5534c7fe4b16157d485cc36bbbbd000fe37477d",
"size": 16724
},
"annotations": {
"io.cncf.notary.x509chain.thumbprint#S256":
"[\"B7A69A70992AE4F9FF103EBE04A2C3BA6C777E439253CE36562E6E98375068C3\",\"932EB6F5598435D4EF23F97B0B5ACB515FAE2B8D8FAC046AB813DDC419DD5E89\"]"
}
}
```

For additional information about Notary V2 image manifest and payload, see the [Notary V2 Signature Specification](https://github.com/notaryproject/notaryproject/blob/v1.0.0-rc.2/specs/signature-specification.md#backward-compatibility).

SBOMs are to be applied in a similar manner as signatures.

#### Image Manifests for Additional Metadata
There will likely be a need to provide additional metadata for a component. For example, debugging symbols, documentation, WIT file representing the textual description of the component interface. This proposal will not exhaustively address each of these additional pieces of metadata. However, if additional metadata is to be applied, the additional metadata MUST be specified using an image manifest which subject descriptor points to the component image manifest and MUST have an `artifactType` specified.

#### Warg Registry Implementation
TODO

## Alternative Options
### Publish Artifacts Using an Artifact Manifest
An alternative option to using image manifests to describe components is to use an artifact manifest. An artifact manifest would have `mediaType` not equal to "application/vnd.oci.image.manifest.v1+json".
#### Pros
- Specifying a custom `mediaType` could provide more opportunity to creatively describe components.
#### Cons
- It seems likely, at the time of authoring, that artifact manifests will not be part of the OCIv1.1 specification: https://github.com/opencontainers/image-spec/pull/999.
- Using artifact manifests would likely lead to less portability across registry implementations.

## Conclusions
TODO

## Additional Details
### Test Plan
TODO

## Implementation History
- 2023-04-04: Initial draft