-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker-daemon: transparently decompress layers in GetBlob #193
docker-daemon: transparently decompress layers in GetBlob #193
Conversation
We can't really add nice tests for this until #148 is merged, because this is only really exposed when you have a compressed source going to a docker-tarfile source. |
However, I am adding tests for this to |
Eugh, this uncovered another issue. Currently inside |
Okay, so |
I've added a |
AFAICT that is just an oversight ( #203 ). Though actually improving the code to handle such a case, as you are doing, would be nicer. |
At the highest level, This is useful e.g. for the remote-repository→
… yeah, basically we’ve been treating But, most importantly, the discussion in #192 on the top-level solution (or perhaps even explaining the problem?) needs to happen first. |
copy/copy.go
Outdated
if err != nil { | ||
return types.BlobInfo{}, errors.Wrapf(err, "Error decompressing blob %s on-the-fly", srcInfo.Digest) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we already might be decompressing the stream once, in the getOriginalLayerCopyWriter
path above, and doing the same work twice is rather wasteful. OTOH integrating all of that here is close to madness.
(Eventually the imageCopier
needs to grow good tests. Not today, I guess :) )
directory/directory_dest.go
Outdated
@@ -41,7 +41,7 @@ func (d *dirImageDestination) SupportsSignatures() error { | |||
|
|||
// ShouldCompressLayers returns true iff it is desirable to compress layer blobs written to this destination. | |||
func (d *dirImageDestination) ShouldCompressLayers() bool { | |||
return false | |||
return d.ref.compressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really not a fan of dir:
modifying the blobs, because of the the “debugging” / “just tell me what is in the source” purposes.
directory/directory_transport.go
Outdated
options []string | ||
|
||
// Does the directory want compressed blobs? | ||
compressed bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(review remarks on dir:
just so that they are not lost if we decided to go this way, not as an endorsement of the concept)
Why have both, and silently accept any invalid input? Merely the compressed
bool would allow us to both parse and format the cases we care about.
directory/directory_transport.go
Outdated
return nil, errors.Errorf("dir: invalid path option suffix: no options set: %s", path) | ||
} | ||
path = parts[0] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did do so much parsing for dir
references, this should probably move to the mode of ParseReference
doing the string parsing, and NewReference(path string, compressed bool)
just creating an object (so that API callers can do it cheaply).
docker/daemon/daemon_dest.go
Outdated
if inputInfo.Digest == "" { | ||
// In some (hacky) cases a caller might not know the size, but will | ||
// know the digest (where "know" == "have a fake digest"). | ||
inputInfo.Digest = digester.Digest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the whole handling of Digest
values in this function, especially immediately below, seems to be broken (should be a separate PR) and only happens to work through a random coincidence:
The returned digest must match the file name within the tarball (because PutManifest
uses the layer digests to compute layerPaths
. But, right now, we create the file with inputInfo.Digest
(using, in principle, any algorithm), but we always return a freshly computed digest.Canonical
value; the two do not generally match.
Basically, if we are provided a digest, we should either use it as is, and return, or we need to completely ignore it and compute from scratch (but the rest of the code generally expects us not to do that, for signatures etc.).
As an extension, this also works only because inputInfo.Digest
is computed using digest.Canonical
, just like the the computation below; that is, I expect, by design, but the code does not make that obvious (and again, computing twice…)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think this comment still stands, if anything needs to happen in PutBlob
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's this way because of the shitty DiffID lie. Not sure if we can properly fix it...
docker/daemon/daemon_dest.go
Outdated
|
||
if inputInfo.Digest == "" { | ||
// In some (hacky) cases a caller might not know the size, but will | ||
// know the digest (where "know" == "have a fake digest"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does “having a fake digest” happen? copy.Image
always clears the digest if it compresses (or, with this PR, decompresses), doesn’t it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because DiffIDs != Digests, this happens for us (when copying we just have the names of the layers that we're dealing with, but we pretend like we don't have the Digest even though the two should be the same).
I'm going to keep this hunk change, but we can discuss the comment later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primarily, is any part of the “handle empty Digest case“ commit still necessary? It seems quite unrelated to the PR topic “transparently decompress layers”.
Secondarily, I still can’t understand what “a fake digest” means in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh. Okay I'll move this to a separate PR and wait even longer for stuff to get merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting this. Smaller PRs have less to block them from getting merged :)
(Really the point is that the API as defined expect the digest to either be unknown or valid, and the rest of this PR should make that true again. If we need change the API for other reasons, sure, let’s talk about that, but actually explicitly talk about it instead of adding a weird corner case which is unjustified by the rest of the code in the repo.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlights:
- Does this work with uncompressed layers?
- Is the
PutBlob
part necessary for this?
copy/fixtures
Outdated
@@ -0,0 +1 @@ | |||
../pkg/compression/fixtures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using symlinks to deduplicate is fine, but could you please keep the copy/fixtures
directory and use symlinks to individual files? That would make it easier to add other files to copy/fixtures
in the future.
// caller which is trying to verify the blob will run into problems), | ||
// we need to decompress blobs. This is a bit ugly, but it's a | ||
// consequence of making everything addressable by their DiffID rather | ||
// than by their digest... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“digests != diffIDs” is true in general; it would still be worth explaining explicitly that we (choose to) generate a v2s2 manifest with the available DiffID values, so GetBlob
needs to return a content matching those DiffID values even if the blob has been compressed.
docker/daemon/daemon_src.go
Outdated
decompressFunc, reader, err := compression.DetectCompression(stream) | ||
if err != nil { | ||
return nil, 0, errors.Wrapf(err, "detecting compression in blob %s", info.Digest) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won’t this fail on uncompressed streams, such as those created by docker save
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because "no compression" is considered a valid "compressor" if you look in pkg/compression
. You only get errors if you get an EOF that was unexpected for example. However, I need to add a check that decompressorFunc != nil
.
docker/daemon/daemon_src.go
Outdated
// than by their digest... | ||
decompressFunc, reader, err := compression.DetectCompression(stream) | ||
if err != nil { | ||
return nil, 0, errors.Wrapf(err, "detecting compression in blob %s", info.Digest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please capitalize error messages, perhaps wrong but consistent with the rest of the code.
docker/daemon/daemon_src.go
Outdated
newStream := readCloseWrapper{ | ||
Reader: reader, | ||
closeFunc: func() error { | ||
return stream.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be just closeFunc: stream.Close
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
type readCloseWrapper struct { | ||
io.Reader | ||
closeFunc func() error | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already is a tarReadCloser
in this file. Modifying that to use this readCloseWrapper
would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two are wrapping different internal objects. I can't really see a nice way of combining the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS the *tar.Reader
in tarReadCloser
can just as well be an io.Reader
; nothing depends on that being a *.tar.Reader
.
docker/daemon/daemon_dest.go
Outdated
|
||
if inputInfo.Digest == "" { | ||
// In some (hacky) cases a caller might not know the size, but will | ||
// know the digest (where "know" == "have a fake digest"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primarily, is any part of the “handle empty Digest case“ commit still necessary? It seems quite unrelated to the PR topic “transparently decompress layers”.
Secondarily, I still can’t understand what “a fake digest” means in here.
docker/daemon/daemon_dest.go
Outdated
if inputInfo.Digest == "" { | ||
// In some (hacky) cases a caller might not know the size, but will | ||
// know the digest (where "know" == "have a fake digest"). | ||
inputInfo.Digest = digester.Digest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think this comment still stands, if anything needs to happen in PutBlob
.)
@mtrmac I've moved the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks!
We now have tests, but this is necessary for containers/skopeo#280 to pass (in particular, it breaks issues that exist with |
/ping @runcom -- can we please merge this? |
@cyphar could you rebase and I'll merge? |
@runcom Rebase'd. |
Signed-off-by: Aleksa Sarai <[email protected]>
The build failure makes no sense, I didn't touch
|
This is necessary because inside the docker-daemon (soon to be Docker archive) transport, the provided DigestInfo's digest is *not* the digest of the blob but is rather the DiffID of the layer. This results in issues when converting a docker-daemon image to an OCI image and then back again -- the compression copying code doesn't know what to do because the digest of the blob and the blob's "digest" in DigestInfo don't match. Fix it by always silently decompressing blobs that come from Docker archive transports. Signed-off-by: Aleksa Sarai <[email protected]>
Heh, it's broken because of opencontainers/image-spec@a125828. We aren't vendoring That's just dumb. We should be vendoring things. |
No. |
I completely agree that vendoring is a mess, but if you want to run |
not sure |
@cyphar to unblock this, can you create a corresponding skopeo PR which vendor this code and run the tests there? After it's green over there I'll merge this. (and sorry for this, we need a better way to handle this, even if it means removing test-skopeo from here if we switch skopeo to not use the latest master every time) |
Currently |
LGTM |
In particular, this ensures that layers are decompressed if the source
layer is compressed but the destination does not permit compression.
This was a particular issue when doing a conversion from oci: to any
transport based on the docker tarfile format (because the Docker tarfile
format references DiffIDs, not the blob digests).
Closes #192
Signed-off-by: Aleksa Sarai [email protected]