Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 149 additions & 34 deletions layer.md
Original file line number Diff line number Diff line change
@@ -1,89 +1,200 @@
# Creating an Image Filesystem Changeset
# Image Layer Filesystem Changeset

An example of creating an Image Filesystem Changeset follows.
This document describes how to serialize a filesystem and filesystem changes like removed files into a blob called a layer.
One or more layers are ordered on top of each other to create a complete filesystem.
This document will use a concrete example to illustrate how to create and consume these filesystem layers.

An image root filesystem is first created as an empty directory.
Here is the initial empty directory structure for a changeset using the randomly-generated directory name `c3167915dc9d` ([actual layer DiffIDs are generated based on the content](#id_desc)).
## Distributable Format

Layer Changesets for the [mediatype](./media-types.md) `application/vnd.oci.image.layer.tar+gzip` MUST be packaged in [tar archive][tar-archive].
Layer Changesets for the [mediatype](./media-types.md) `application/vnd.oci.image.layer.tar+gzip` MUST NOT include duplicate entries for file paths in the resulting [tar archive][tar-archive].

## Change Types

Types of changes that can occur in a changeset are:

* Additions
* Modifications
* Removals

Additions and Modifications are represented the same in the changeset tar archive.

Removals are represented using "[whiteout](#whiteouts)" file entries (See [Representing Changes](#representing-changes)).

### File Types

Throughout this document section, the use of word "files" includes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"or entries" maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just submitted a PR to your branch using the edit feature... give that a try?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah. Now that this is merge'able, can I carry that to a new PR?


* regular files
* directories
* sockets
* symbolic links
* block devices
* character devices
* FIFOs

### File Attributes

Where supported, MUST include file attributes for Additions and Modifications include:

* Modification Time (`mtime`)
* User ID (`uid`)
* User Name (`uname`) *secondary to `uid`*
* Group ID (`gid `)
* Group Name (`gname`) *secondary to `gid`*
* Mode (`mode`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to break mode down into permissions, symlink-ness, etc.? tar allows you to dereference symlinks (and hard links).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, mode is not the same as the file type. Perhaps file type ought to be a field, because if foo changes from a regular file into a FIFO, that is a modification.

As for hardlink, perhaps but could be limited by underlying fs support. As for symlink, yes, this reference is effectively the content of the file.

* Extended Attributes (`xattrs`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see xattrs in tar(5). Can we unpack this into something from a tar spec? Although maybe there is no current tar spec? I'm a bit jumpy about leaning on tar without defining tar ;).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why i did not link the libarchive man page, because it's support is varying.
xattrs support in GNU tar was carried patches for a long time by various linux distros, but now it is in upstream GNU tar. Reflected in http://linux.die.net/man/1/tar
There is no particular tar spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Sep 07, 2016 at 06:58:28AM -0700, Vincent Batts wrote:

There is no particular tar spec.

Hmm, it looks like there was a run at trying to get a formal tar media
type in 2014 1, but I don't see it listed by the IANA 2. Without
such a spec, validating layer blobs is going to be tricky (Alice: “My
tar library cannot parse your layer!” Bob: “So what?”). In the
absence of a tar spec, I see two choices:

a. Punt the specifics to users 3. Lots of freedom, but
application/vnd.oci.image.layer.tar+gzip becomes unenforceble and
there are likely portablility edge cases (e.g. sparse files 4).

b. Require a specified format like pax 5. Implementers have to
figure out how to get their “tar” library to write pax, or have to
use a pax-specific library. But
application/vnd.oci.image.layer.pax+gzip is enforcable and portable
between all image-spec implementations.

* Symlink reference (`linkname`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any word here on how hardlinks might be supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. It was undefined before, and getting that wording right seems like a further conversation piece.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we file an issue?


[Sparse files](https://en.wikipedia.org/wiki/Sparse_file) SHOULD NOT be used because they lack consistent support across tar implementations.

## Creating

### Initial Root Filesystem

The initial root filesystem is the base or parent layer.

For this example, an image root filesystem has an initial state as an empty directory.
The name of the directory is not relevant to the layer itself, only for the purpose of producing comparisons.

Here is an initial empty directory structure for a changeset, with a unique directory name `rootfs-c9d-v1`.

```
c3167915dc9d/
rootfs-c9d-v1/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is c9d about? Can't we just call this rootfs-v1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uniqueness. We could call it fart-jar/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes you think that's unique, Vincent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair, fart-jar is as arbitrary as including -c9d-

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Sep 07, 2016 at 06:55:36AM -0700, Vincent Batts wrote:

-c3167915dc9d/
+rootfs-c9d-v1/

uniqueness. We could call it fart-jar/.

I don't have and of rootfs, rootfs-v1, rootfs-c9d-v1, or fart-jar in
my home directory ;). I think rootfs (where you have rootfs-c9d-v1)
and rootfs-v1 (where you have rootfs-c9d-v1.s1) are best because:

a. They include ‘rootfs’, which is where layers will be unpacked by
default 1.
b. They distinguish between the mutable working directory (rootfs) and
a snapshot of it (rootfs-v1) to make it easy to talk about changes.

```

### Populate Initial Filesystem

Files and directories are then created:

```
c3167915dc9d/
rootfs-c9d-v1/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why change the root name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c3167915dc9d provides zero context and gives the feeling of being content addressed. Once making a departure from that, then having an example name with uniqueness and context, it was largely an arbitrary choice of string.

etc/
my-app-config
bin/
my-app-binary
my-app-tools
```

The `c3167915dc9d` directory is then committed as a plain Tar archive with entries for the following files:
The `rootfs-c9d-v1` directory is then created as a plain [tar archive][tar-archive] with relative path to `rootfs-c9d-v1`.
Entries for the following files:

```
etc/my-app-config
bin/my-app-binary
bin/my-app-tools
./
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why now with ./?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a hard requirement, just more accurate, no?

[vbatts@bananaboat] {master *} ~/v1$ find
.
./etc
./etc/my-app-config
./bin
./bin/my-app-tools
./bin/my-app-binary
[vbatts@bananaboat] {master *} ~/v1$ tar -c . | tar tv
drwxr-xr-x vbatts/users      0 2016-09-02 10:45 ./
drwxr-xr-x vbatts/users      0 2016-09-02 10:45 ./etc/
-rw-r--r-- vbatts/users      0 2016-09-02 10:45 ./etc/my-app-config
drwxr-xr-x vbatts/users      0 2016-09-02 10:46 ./bin/
-rw-r--r-- vbatts/users      0 2016-09-02 10:46 ./bin/my-app-tools
-rw-r--r-- vbatts/users      0 2016-09-02 10:46 ./bin/my-app-binary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbatts Are layers generated with the top-level entry? I'm not sure if attributes can applied safely in this manner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Sep 06, 2016 at 04:44:12PM -0700, Stephen Day wrote:

-etc/my-app-config
-bin/my-app-binary
-bin/my-app-tools
+./

@vbatts Are layers generated with the top-level entry? I'm not sure
if attributes can applied safely in this manner.

That top-level entry will be the root of the container filesystem.
Why wouldn't its attributes be under the control of the image author?
I don't see how this is any more sensitive than the more deeply-nested
layer entries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevvooe the image would have say over the perms and owner of /, wouldn't it? Regardless of user-namespace, etc., many use-cases would ensure the root or who ever own /, otherwise it is implied by the choice of base image being built on. It is up to the extractor/consumer of the image layer to ensure that the directory being extracted to is confined and not vulnerable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"vulnerable" in the sense of potentially being world-writable and having policy not allowing such images, but the image layer has the facility declare its need for such.

./etc/
./etc/my-app-config
./bin/
./bin/my-app-binary
./bin/my-app-tools
```

To make changes to the filesystem of this container image, create a new directory, such as `f60c56784b83`, and initialize it with a snapshot of the parent image's root filesystem, so that the directory is identical to that of `c3167915dc9d`.
NOTE: a copy-on-write or union filesystem can make this very efficient:
### Populate a Comparison Filesystem

Create a new directory and initialize it with an copy or snapshot of the prior root filesystem.
Example commands that can preserve [file attributes](#file-attributes) to make this copy are:
* [cp(1)](http://linux.die.net/man/1/cp): `cp -a rootfs-c9d-v1/ rootfs-c9d-v1.s1/`
* [rsync(1)](http://linux.die.net/man/1/rsync): `rsync -aHAX rootfs-c9d-v1/ rootfs-c9d-v1.s1/`
* [tar(1)](http://linux.die.net/man/1/tar): `mkdir rootfs-c9d-v1.s1 && tar --acls --xattrs -C rootfs-c9d-v1/ -c . | tar -C rootfs-c9d-v1.s1/ --acls --xattrs -x` (including `--selinux` where supported)

Any [changes](#change-types) to the snapshot MUST NOT change or affect the directory it was copied from.

For example `rootfs-c9d-v1.s1` is an identical snapshot of `rootfs-c9d-v1`.
In this way `rootfs-c9d-v1.s1` is prepared for updates and alterations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange to me to be updating the snapshot. Can we replace rootfs-c9d-v1.s1 with rootfs-c9d-v2? Or keep rootfs-c9d-v1.s1 as the snapshot and continue working in the rootfs-c9d-v1 directory? If, for some reason, you don't buy my argument for using rootfs and rootfs-v1 ;).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean updating the snapshot?
I do not buy your arguments that are based on the contents of your home directory. Sorry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Thu, Sep 08, 2016 at 08:57:34AM -0700, Vincent Batts wrote:

+For example rootfs-c9d-v1.s1 is an identical snapshot of rootfs-c9d-v1.
+In this way rootfs-c9d-v1.s1 is prepared for updates and alterations.

what do you mean updating the snapshot?

I want:

$ mkdir rootfs
$ populate.sh rootfs
$ cp -a rootfs rootfs-v1 # freeze out a snapshot for the first layer
$ tar -C rootfs-v1 -cf layer-0.tar
$ mutate.sh rootfs
$ cp -a rootfs rootfs-v2 # freeze out the snapshot for the second layer
$ generate-layer.sh rootfs-v1 rootfs-v2 >layer-1.tar

So you're working in rootfs, but the rootsf-v* are frozen and
unchanging.

I do not buy your arguments that are based on the contents of your
home directory. Sorry.

Where do you expect folks to be working on this that they already have
a ‘rootfs’ directory around? I'm saying 1 that I don't see a need
to pick something more unique than that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not seeing where this example is instructing changes to happen in the original directory after the snapshot is made.
Is that even possible? surely. Then when the changeset is made, the changeset at that moment will reflect the changes in the parent.
I specifically did not say anything to the nature of "once the snapshot/copy is made then the parent is immutable", because it is not.


**Implementor's Note**: *a copy-on-write or union filesystem can efficiently make directory snapshots*

Initial layout of the snapshot:

```
f60c56784b83/
rootfs-c9d-v1.s1/
etc/
my-app-config
bin/
my-app-binary
my-app-tools
```

This example change is going to add a configuration directory at `/etc/my-app.d` which contains a default config file.
There's also a change to the `my-app-tools` binary to handle the config layout change.
The `f60c56784b83` directory then looks like this:
See [Change Types](#change-types) for more details on changes.

For example, add a directory at `/etc/my-app.d` containing a default config file, removing the existing config file.
Also a change (in attribute or file content) to `./bin/my-app-tools` binary to handle the config layout change.

Following these changes, the representation of the `rootfs-c9d-v1.s1` directory:

```
f60c56784b83/
rootfs-c9d-v1.s1/
etc/
.wh.my-app-config
my-app.d/
default.cfg
bin/
my-app-binary
my-app-tools
```

This reflects the removal of `/etc/my-app-config` and creation of a file and directory at `/etc/my-app.d/default.cfg`.
`/bin/my-app-tools` has also been replaced with an updated version.
Before committing this directory to a changeset, because it has a parent image, it is first compared with the directory tree of the parent snapshot, `f60c56784b83`, looking for files and directories that have been added, modified, or removed.
### Determining Changes

When two directories are compared, the relative root is the top-level directory.
The directories are compared, looking for files that have been [added, modified, or removed](#change-types).

For this example, `rootfs-c9d-v1/` and `rootfs-c9d-v1.s1/` are recursively compared, each as relative root path.

The following changeset is found:

```
Added: /etc/my-app.d/
Added: /etc/my-app.d/default.cfg
Modified: /bin/my-app-tools
Deleted: /etc/my-app-config
```

A Tar Archive is then created which contains *only* this changeset:
This reflects the removal of `/etc/my-app-config` and creation of a file and directory at `/etc/my-app.d/default.cfg`.
`/bin/my-app-tools` has also been replaced with an updated version.

### Representing Changes

- Added and modified files and directories in their entirety
- Deleted files or directory marked with a whiteout file
A [tar archive][tar-archive] is then created which contains *only* this changeset:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tar archive vs Tar archive in other places

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i down cased all of them. Perhaps it is not so proper to be capitalized.


A whiteout file is an empty file that prefixes the deleted paths basename `.wh.`.
When a whiteout is found in the upper changeset of a filesystem, any matching name in the lower changeset is ignored, and the whiteout itself is also hidden.
As files prefixed with `.wh.` are special whiteout tombstones it is not possible to create a filesystem which has a file or directory with a name beginning with `.wh.`.
- Added and modified files and directories in their entirety
- Deleted files or directories marked with a [whiteout file](#whiteouts)

The resulting Tar archive for `f60c56784b83` has the following entries:
The resulting tar archive for `rootfs-c9d-v1.s1` has the following entries:

```
/etc/my-app.d/default.cfg
/bin/my-app-tools
/etc/.wh.my-app-config
./etc/my-app.d/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to list this directory here, we'll want to explain how it was modified above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

./etc/my-app.d/default.cfg
./bin/my-app-tools
./etc/.wh.my-app-config
```

Whiteout files MUST only apply to resources in lower layers.
Where the basename name of `./etc/my-app-config` is now prefixed with `.wh.`, and will therefore be removed when the changeset is applied.

## Applying

Layer Changesets of [mediatype](./media-types.md) `application/vnd.oci.image.layer.tar+gzip` are applied rather than strictly extracted in normal fashion for tar archives.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following two lines cover this idea. Can we drop this line? You mention the media type early on in the “Distributable Format” section.


Applying a layer changeset requires consideration for the [whiteout](#whiteouts) files.
In the absence of any [whiteout](#whiteouts) files in a layer changeset, the archive is extracted like a regular tar archive.


### Changeset over existing files

This section covers applying an entry in a layer changeset, if the file path already exists.

If the file path is a directory, then the existing path just has it's attribute set from the layer changeset for that filepath.
If the file path is any other file type (regular file, FIFO, etc), then the:
* file path is unlinked (See [`unlink(2)`](http://linux.die.net/man/2/unlink))
* create the file
* If a regular file then content written.
* set attributes on the filepath

## Whiteouts

A whiteout file is an empty file with a special filename that signifies a path should be deleted.
A whiteout filename consists of the prefix .wh. plus the basename of the path to be deleted.
As files prefixed with `.wh.` are special whiteout markers, it is not possible to create a filesystem which has a file or directory with a name beginning with `.wh.`.

Once a whiteout is applied, the whiteout itself MUST also be hidden.
Whiteout files MUST only apply to resources in lower/parent layers.
Files that are present in the same layer as a whiteout file can only be hidden by whiteout files in subsequent layers.
The following is a base layer with several resources:

Expand Down Expand Up @@ -117,6 +228,8 @@ a/.wh..wh..opq

Implementations SHOULD generate layers such that the whiteout files appear before sibling directory entries.

### Opaque Whiteout

In addition to expressing that a single entry should be removed from a lower layer, layers may remove all of the children using an opaque whiteout entry.
An opaque whiteout entry is a file with the name `.wh..wh..opq` indicating that all siblings are hidden in the lower layer.
Let's take the following base layer as an example:
Expand All @@ -139,7 +252,7 @@ bin/
```

This is called _opaque whiteout_ format.
An _opaque whiteout_ file hides _all_ children of the `bin/` including sub-directories and all descendents.
An _opaque whiteout_ file hides _all_ children of the `bin/` including sub-directories and all descendants.
Using _explicit whiteout_ files, this would be equivalent to the following:

```
Expand All @@ -151,8 +264,10 @@ bin/

In this case, a unique whiteout file is generated for each entry.
If there were more children of `bin/` in the base layer, there would be an entry for each.
Note that this opaque file will apply to _all_ children, including sub-directories, other resources and all descendents.
Note that this opaque file will apply to _all_ children, including sub-directories, other resources and all descendants.

Implementations SHOULD generate layers using _explicit whiteout_ files, but MUST accept both.

Any given image is likely to be composed of several of these Image Filesystem Changeset tar archives.

[tar-archive]: https://en.wikipedia.org/wiki/Tar_(computing)