Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker-daemon: transparently decompress layers in GetBlob #193

Merged
merged 2 commits into from
Feb 17, 2017
Merged

docker-daemon: transparently decompress layers in GetBlob #193

merged 2 commits into from
Feb 17, 2017

Conversation

cyphar
Copy link
Contributor

@cyphar cyphar commented Dec 27, 2016

In particular, this ensures that layers are decompressed if the source
layer is compressed but the destination does not permit compression.

This was a particular issue when doing a conversion from oci: to any
transport based on the docker tarfile format (because the Docker tarfile
format references DiffIDs, not the blob digests).

Closes #192
Signed-off-by: Aleksa Sarai [email protected]

@cyphar
Copy link
Contributor Author

cyphar commented Dec 27, 2016

We can't really add nice tests for this until #148 is merged, because this is only really exposed when you have a compressed source going to a docker-tarfile source.

@cyphar
Copy link
Contributor Author

cyphar commented Dec 27, 2016

However, I am adding tests for this to skopeo, so we will be able to test this change.

@cyphar
Copy link
Contributor Author

cyphar commented Dec 27, 2016

Eugh, this uncovered another issue. Currently inside tarfile we use inputInfo.Digest.String() even when the digest was not given... SIGH.

@cyphar
Copy link
Contributor Author

cyphar commented Dec 27, 2016

Okay, so CopySuite.TestCopyCompression tests that the old semantics work -- in an attempt to use dir: to check that compression was actually done. I'm not really sure how to change this test so that it still tests the same thing it tested previously (that compression is applied).

@cyphar
Copy link
Contributor Author

cyphar commented Dec 28, 2016

I've added a ::compress option to dir: transports. It's not a pretty hack, but basically this ensures that we can mirror the old test functionality (though the semantics of what's being tested is slightly different).

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 7, 2017

Eugh, this uncovered another issue. Currently inside tarfile we use inputInfo.Digest.String() even when the digest was not given... SIGH.

AFAICT that is just an oversight ( #203 ). Though actually improving the code to handle such a case, as you are doing, would be nicer.

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 7, 2017

At the highest level, ShouldCompressLayers has been intended to be as is, i.e. one-directional: either compress, or leave in the original format; not as a compress/uncompress toggle.

This is useful e.g. for the remote-repository→docker-daemon: copy, where the daemon already can transparently decompress layers, so we should neither compress nor decompress the remote input, only downloading it (and verifying digests/signatures) is perfectly fine. Also…

I've added a ::compress option to dir: transports. It's not a pretty hack, but basically this ensures that we can mirror the old test functionality (though the semantics of what's being tested is slightly different).

… yeah, basically we’ve been treating dir: as a debugging output which records the inputs given as close as possible, and it would be nice to retain this. So that would suggest that an ImageDestination might perhaps want a tri-state, compress/decompress/leave unchanged.

But, most importantly, the discussion in #192 on the top-level solution (or perhaps even explaining the problem?) needs to happen first.

copy/copy.go Outdated
if err != nil {
return types.BlobInfo{}, errors.Wrapf(err, "Error decompressing blob %s on-the-fly", srcInfo.Digest)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we already might be decompressing the stream once, in the getOriginalLayerCopyWriter path above, and doing the same work twice is rather wasteful. OTOH integrating all of that here is close to madness.

(Eventually the imageCopier needs to grow good tests. Not today, I guess :) )

@@ -41,7 +41,7 @@ func (d *dirImageDestination) SupportsSignatures() error {

// ShouldCompressLayers returns true iff it is desirable to compress layer blobs written to this destination.
func (d *dirImageDestination) ShouldCompressLayers() bool {
return false
return d.ref.compressed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really not a fan of dir: modifying the blobs, because of the the “debugging” / “just tell me what is in the source” purposes.

options []string

// Does the directory want compressed blobs?
compressed bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(review remarks on dir: just so that they are not lost if we decided to go this way, not as an endorsement of the concept)

Why have both, and silently accept any invalid input? Merely the compressed bool would allow us to both parse and format the cases we care about.

return nil, errors.Errorf("dir: invalid path option suffix: no options set: %s", path)
}
path = parts[0]
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did do so much parsing for dir references, this should probably move to the mode of ParseReference doing the string parsing, and NewReference(path string, compressed bool) just creating an object (so that API callers can do it cheaply).

if inputInfo.Digest == "" {
// In some (hacky) cases a caller might not know the size, but will
// know the digest (where "know" == "have a fake digest").
inputInfo.Digest = digester.Digest()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the whole handling of Digest values in this function, especially immediately below, seems to be broken (should be a separate PR) and only happens to work through a random coincidence:

The returned digest must match the file name within the tarball (because PutManifest uses the layer digests to compute layerPaths. But, right now, we create the file with inputInfo.Digest (using, in principle, any algorithm), but we always return a freshly computed digest.Canonical value; the two do not generally match.

Basically, if we are provided a digest, we should either use it as is, and return, or we need to completely ignore it and compute from scratch (but the rest of the code generally expects us not to do that, for signatures etc.).

As an extension, this also works only because inputInfo.Digest is computed using digest.Canonical, just like the the computation below; that is, I expect, by design, but the code does not make that obvious (and again, computing twice…)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think this comment still stands, if anything needs to happen in PutBlob.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's this way because of the shitty DiffID lie. Not sure if we can properly fix it...


if inputInfo.Digest == "" {
// In some (hacky) cases a caller might not know the size, but will
// know the digest (where "know" == "have a fake digest").
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does “having a fake digest” happen? copy.Image always clears the digest if it compresses (or, with this PR, decompresses), doesn’t it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because DiffIDs != Digests, this happens for us (when copying we just have the names of the layers that we're dealing with, but we pretend like we don't have the Digest even though the two should be the same).

I'm going to keep this hunk change, but we can discuss the comment later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primarily, is any part of the “handle empty Digest case“ commit still necessary? It seems quite unrelated to the PR topic “transparently decompress layers”.

Secondarily, I still can’t understand what “a fake digest” means in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh. Okay I'll move this to a separate PR and wait even longer for stuff to get merged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this. Smaller PRs have less to block them from getting merged :)

(Really the point is that the API as defined expect the digest to either be unknown or valid, and the rest of this PR should make that true again. If we need change the API for other reasons, sure, let’s talk about that, but actually explicitly talk about it instead of adding a weird corner case which is unjustified by the rest of the code in the repo.)

@cyphar cyphar changed the title copy: obey destination compression rules docker-daemon: transparently decompress layers in GetBlob Feb 1, 2017
@cyphar
Copy link
Contributor Author

cyphar commented Feb 1, 2017

@mtrmac @runcom This is ready for review now.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlights:

  • Does this work with uncompressed layers?
  • Is the PutBlob part necessary for this?

copy/fixtures Outdated
@@ -0,0 +1 @@
../pkg/compression/fixtures
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using symlinks to deduplicate is fine, but could you please keep the copy/fixtures directory and use symlinks to individual files? That would make it easier to add other files to copy/fixtures in the future.

// caller which is trying to verify the blob will run into problems),
// we need to decompress blobs. This is a bit ugly, but it's a
// consequence of making everything addressable by their DiffID rather
// than by their digest...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“digests != diffIDs” is true in general; it would still be worth explaining explicitly that we (choose to) generate a v2s2 manifest with the available DiffID values, so GetBlob needs to return a content matching those DiffID values even if the blob has been compressed.

decompressFunc, reader, err := compression.DetectCompression(stream)
if err != nil {
return nil, 0, errors.Wrapf(err, "detecting compression in blob %s", info.Digest)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won’t this fail on uncompressed streams, such as those created by docker save?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because "no compression" is considered a valid "compressor" if you look in pkg/compression. You only get errors if you get an EOF that was unexpected for example. However, I need to add a check that decompressorFunc != nil.

// than by their digest...
decompressFunc, reader, err := compression.DetectCompression(stream)
if err != nil {
return nil, 0, errors.Wrapf(err, "detecting compression in blob %s", info.Digest)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capitalize error messages, perhaps wrong but consistent with the rest of the code.

newStream := readCloseWrapper{
Reader: reader,
closeFunc: func() error {
return stream.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be just closeFunc: stream.Close?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

type readCloseWrapper struct {
io.Reader
closeFunc func() error
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already is a tarReadCloser in this file. Modifying that to use this readCloseWrapper would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two are wrapping different internal objects. I can't really see a nice way of combining the two.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS the *tar.Reader in tarReadCloser can just as well be an io.Reader; nothing depends on that being a *.tar.Reader.


if inputInfo.Digest == "" {
// In some (hacky) cases a caller might not know the size, but will
// know the digest (where "know" == "have a fake digest").
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primarily, is any part of the “handle empty Digest case“ commit still necessary? It seems quite unrelated to the PR topic “transparently decompress layers”.

Secondarily, I still can’t understand what “a fake digest” means in here.

if inputInfo.Digest == "" {
// In some (hacky) cases a caller might not know the size, but will
// know the digest (where "know" == "have a fake digest").
inputInfo.Digest = digester.Digest()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think this comment still stands, if anything needs to happen in PutBlob.)

@cyphar
Copy link
Contributor Author

cyphar commented Feb 7, 2017

@mtrmac I've moved the PutBlob code to a different branch. We can handle that two-line change separately.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks!

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 10, 2017

👍 again for PullApprove

Approved with PullApprove

@cyphar
Copy link
Contributor Author

cyphar commented Feb 15, 2017

Can we please merge this so that containers/skopeo#280 can finally go in. Not having tests for the OCI and Docker translation code path is not acceptable -- at SUSE we actually plan to use this and it's getting frustrating that it keeps BREAKING.

We now have tests, but this is necessary for containers/skopeo#280 to pass (in particular, it breaks issues that exist with docker-daemon which are going to be tested once docker-archive: [#148] is merged).

@cyphar
Copy link
Contributor Author

cyphar commented Feb 17, 2017

/ping @runcom -- can we please merge this?

@runcom
Copy link
Member

runcom commented Feb 17, 2017

@cyphar could you rebase and I'll merge?

@cyphar
Copy link
Contributor Author

cyphar commented Feb 17, 2017

@runcom Rebase'd.

@cyphar
Copy link
Contributor Author

cyphar commented Feb 17, 2017

The build failure makes no sense, I didn't touch manifest/manifest.go...

$ make deps
go get -t -tags "btrfs_noversion libdm_no_deferred_remove" ./...
# github.com/containers/image/manifest
manifest/manifest.go:57: undefined: v1.MediaTypeImageManifestList

This is necessary because inside the docker-daemon (soon to be Docker
archive) transport, the provided DigestInfo's digest is *not* the digest
of the blob but is rather the DiffID of the layer. This results in
issues when converting a docker-daemon image to an OCI image and then
back again -- the compression copying code doesn't know what to do
because the digest of the blob and the blob's "digest" in DigestInfo
don't match.

Fix it by always silently decompressing blobs that come from Docker
archive transports.

Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar
Copy link
Contributor Author

cyphar commented Feb 17, 2017

Heh, it's broken because of opencontainers/image-spec@a125828. We aren't vendoring image-spec so go get fails.

That's just dumb. We should be vendoring things.

@runcom
Copy link
Member

runcom commented Feb 17, 2017

That's just dumb. We should be vendoring things.

No.
This is a library, I won't vendor thing and fall into the golang mess of having a library with vendor dirs which clashes with whoever import it. If you're talking about running the test with some vendored code, that may work.

@cyphar
Copy link
Contributor Author

cyphar commented Feb 17, 2017

I completely agree that vendoring is a mess, but if you want to run go on your library to test it we need to vendor things. Is there a reason that we aren't just doing test-skopeo (which should run our unit tests --- right?).

@runcom
Copy link
Member

runcom commented Feb 17, 2017

I completely agree that vendoring is a mess, but if you want to run go on your library to test it we need to vendor things. Is there a reason that we aren't just doing test-skopeo (which should run our unit tests --- right?).

not sure test-skopeo would run that, if it does, I'm 100% ok doing that instead of go get'ting random libraries from the net.

@runcom
Copy link
Member

runcom commented Feb 17, 2017

@cyphar to unblock this, can you create a corresponding skopeo PR which vendor this code and run the tests there? After it's green over there I'll merge this. (and sorry for this, we need a better way to handle this, even if it means removing test-skopeo from here if we switch skopeo to not use the latest master every time)

@cyphar
Copy link
Contributor Author

cyphar commented Feb 17, 2017

@cyphar
Copy link
Contributor Author

cyphar commented Feb 17, 2017

Currently test-unit-local doesn't test vendor/ tests. While this makes sense for most vendors, we should be testing our own code inside skopeo. I can make a PR to improve this situation if you like (I'll do it as part of the PR updating to this patch being merged).

@runcom
Copy link
Member

runcom commented Feb 17, 2017

Currently test-unit-local doesn't test vendor/ tests. While this makes sense for most vendors, we should be testing our own code inside skopeo. I can make a PR to improve this situation if you like (I'll do it as part of the PR updating to this patch being merged).

👍 thanks Aleksa

Approved with PullApprove

@runcom
Copy link
Member

runcom commented Feb 17, 2017

LGTM

@runcom runcom merged commit 3f493f2 into containers:master Feb 17, 2017
@cyphar cyphar deleted the copy-obey-destination-compression branch February 17, 2017 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants