Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking for existence of blob causes creation of folders and links on filesystem #511

Closed
daviesian opened this issue Apr 23, 2022 · 2 comments
Assignees
Labels
bug Something isn't working feature New feature or request

Comments

@daviesian
Copy link

daviesian commented Apr 23, 2022

Describe the bug

When zot receives a HEAD request to check for the existence of a blob in repository foo, if the blob is found in a different repository then the directory structure for foo is created and the blob hard-linked into this new directory. This causes two problems:

  1. A directory is created in storage purely on the basis of a speculative HEAD request, which is definitely unexpected.
  2. If the original image is deleted, the hard-linked blob now exists only in the incorrectly-created directory, consuming disk space forever.

To Reproduce

Steps to reproduce the behavior:

  1. Push an image to a zot registry (e.g. podman push --format oci my-zot-registry/my-repo:latest)
  2. Note that the my-repo directory now exists in the zot storage directory.
  3. Make a HEAD request to the registry for a different repository but for a blob digest from one layer of the first image (e.g. curl --head -H 'authorization: Basic xxxx' https://my-zot-registry/v2/different-repo/blobs/sha256:aaabbbccc...)
  4. Note that the different-repo directory now also exists in the zot storage directory, and contains a hard link to the specified blob, even though this was only a HEAD request.
  5. Delete the original image from the registry (e.g. skopeo delete docker://my-zot-registry/my-repo:latest)
  6. Note that the blob still exists in the different-repo directory, leaking disk space forever.

N.B. This is not a bug in the handling of deletions - the deletion just helps to demonstrate the effects of the issue.

Expected behavior

The HEAD request to test for the existence of a blob should have no effect on storage.

Additional context

$ zot --version
{"level":"info","distribution-spec":"1.0.1","commit":"v1.4.0-ad90a4975f896fa6ea0140107b7a0c87edfcae76","binary-type":"extended","go version":"go1.17.5","time":"2022-04-23T12:22:33Z","message":"version"}

This was discovered while experimenting with podman push, which makes lots of HEAD requests for all sorts of names that images have had in the past, leaving a very messy zot storage directory. Reproducing via podman push is slightly tricky though, hence the more specific steps outlined above.

I have investigated the cause of this issue and identified the problem. It lies in the ImageStore.CheckBlob function, which unconditionally creates a copy of the blob if it is found (even when called from a HEAD request).

There is a further bug, which is that after deleting an image from the registry, deleted blobs still seem to be found in the cache, which means that subsequent HEAD requests still incorrectly create directories in storage, even though the hard-link then fails. This is probably not worth fixing as the fix for the main issue will mean this code path should never be followed.

@rchincha
Copy link
Contributor

Hi @daviesian, thanks for doing a deep-dive into zot code and behavior and filing the issue.

For some background about this behavior, pls see discussion on the dist-spec project [1]. The requirement is real in that when there are many large shared layers, without this side-effect, the only way to achieve network savings without modifying the dist-spec would be to actually upload the whole blob first. That said, HTTP RFCs will likely frown upon GET/HEAD requests causing data modifications instead of being immutable/idempotent. Furthermore, this could open the door for registry abuse (but you still have to get past authN/authZ)

Note that we do have a "dedupe" boolean knob to turn this behavior off but note that it turns off dedupe completely.

We are currently reviewing your PR [2].

References:
[1] opencontainers/distribution-spec#236
[2] #512

@rchincha rchincha added bug Something isn't working feature New feature or request labels May 1, 2022
@rchincha
Copy link
Contributor

Closing this issue.

If we want to control this particular behavior via configuration, we will revisit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants