Skip to content

Conversation

@vrothberg
Copy link
Member

No description provided.

@runcom
Copy link
Member

runcom commented Dec 19, 2018

This works in openshift/builder 👍

@giuseppe
Copy link
Member

LGTM

@vrothberg
Copy link
Member Author

Very odd errors in Travis: # time="2018-12-19T11:09:53Z" level=debug msg="error parsing image name \"localhost/alpine\" as given, trying with transport \"docker://\": Invalid image name \"localhost/alpine\", expected colon-separated transport:reference"

@rhatdan
Copy link
Member

rhatdan commented Dec 19, 2018

Looks like a different change is not handling localhost correctly.

@vrothberg
Copy link
Member Author

It's a regression in c/image. We're on it :)

@vrothberg
Copy link
Member Author

Looks like a different change is not handling localhost correctly.

I think you're right. I can reproduce the issue on a vanilla master locally.

@vrothberg vrothberg force-pushed the vendor-everything branch 2 times, most recently from 6eac228 to a0bf1db Compare December 19, 2018 16:54
@vrothberg
Copy link
Member Author

The localhost warnings/errors are red herrings. It actually comes from the progress bars writing to stdout which will then be read by the scripts and interpreted as IDs. Will tackle this issue tomorrow.

@rhatdan
Copy link
Member

rhatdan commented Dec 19, 2018

Can we just tell the output to be quiet?

@vrothberg
Copy link
Member Author

vrothberg commented Dec 19, 2018 via email

@rh-atomic-bot
Copy link
Collaborator

☔ The latest upstream changes (presumably 4674656) made this pull request unmergeable. Please resolve the merge conflicts.

Update vendoring of containers/storage and containers/image to improve
pulls and pushes compression speeds.

Signed-off-by: Daniel J Walsh <[email protected]>
@rhatdan rhatdan changed the title Vendor everything [WIP] Vendor everything Dec 20, 2018
@vrothberg
Copy link
Member Author

@rhatdan @giuseppe ... CI should pass now. Please, take a look at c46e256. The pgzip library changed a bit the blobcache behaviour and I would like some other pairs of eyes on it.

@rhatdan rhatdan changed the title [WIP] Vendor everything Vendor everything Dec 20, 2018
@rhatdan
Copy link
Member

rhatdan commented Dec 20, 2018

LGTM,
But I think we need @mtrmac and @nalind to agree to your changes on the blob cache.

@vrothberg vrothberg changed the title Vendor everything Vendor containers/{image,storage} for faster compression and parallel pulls Dec 20, 2018
@vrothberg
Copy link
Member Author

LGTM,
But I think we need @mtrmac and @nalind to agree to your changes on the blob cache.

Sure! Once this PR is merged, we can update Podman as well.

Copy link
Member

@TomSweeneyRedHat TomSweeneyRedHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but would like a head nod from @mtrmac or @nalind

@giuseppe
Copy link
Member

same from me, LGTM beside the change in the tests

@vrothberg
Copy link
Member Author

Alright ... this is to be expected and actually @vbatts has warned us about this. The compressed objects are now different on which we're computing the checksums. It still seems to work in some cases but I need to digg deeper in the code/

@rhatdan
Copy link
Member

rhatdan commented Jan 4, 2019

@vrothberg @mtrmac @nalind What is the verdict on this one?

@vrothberg
Copy link
Member Author

vrothberg commented Jan 4, 2019

Alright. I've manually reproduced the tests before this change and with this change. To me, everything looks fine.

In a simplified way, the tests in question first pull-and-build (with cache) to cache, then push (without cache) to dir:no-cache and then push (with cache) to dir:with-cache. After that, the test compares the contents of cache no-cache and with-cache and searches for matching files.

After the change, the tests were complaining because the base layer (i.e., busybox) was not matching between cache and no-cache which makes sense as the layer in cache was compressed with pgzip while the layer in no-cache is the one we pulled which was compressed differently. In other words, the test was just working before this change because the layers we pulled were compressed with the very same library Buildah was using. In yet other words, we would hit this issue sooner or later once the gzip compression of the stdlib is changed (which is happening). Or in the words of @vbatts: it was a mistake computing digests on compressed objects.

I have a screenshot with 8 tmux tiles that I can use to explain this situation in case you want to verify my above claims.

What I suggest to do is to remove the no-cache comparisons from the tests as those are prone to errors. It was really painful to reproduce 😿

Remove error-prone parts of the explicit push tests that push to a
directory without using the blob-cache.  The proneness relates directly
to the fact that the layers of pulled images (i.e., the busybox base
image) are compressed with another gzip library the blobs in Buildah's
cache, which now use pgzip.

Signed-off-by: Valentin Rothberg <[email protected]>
Parallel copying of layers is currently supported when pulling from
a registry to the storage.

Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member Author

I updated the tests (see d88807e).

@runcom
Copy link
Member

runcom commented Jan 4, 2019

What I suggest to do is to remove the no-cache comparisons from the tests as those are prone to errors. It was really painful to reproduce

@mtrmac could you ack 👼

@rhatdan
Copy link
Member

rhatdan commented Jan 4, 2019

@vrothberg SGTM

@vrothberg
Copy link
Member Author

Three Travis jobs hit the timeout (now restarted).

@mtrmac
Copy link
Contributor

mtrmac commented Jan 4, 2019

In a simplified way, the tests in question first pull-and-build (with cache) to cache, then push (without cache) to dir:no-cache and then push (with cache) to dir:with-cache. After that, the test compares the contents of cache no-cache and with-cache and searches for matching files.

Right

After the change, the tests were complaining because the base layer (i.e., busybox) was not matching between cache and no-cache which makes sense as the layer in cache was compressed with pgzip while the layer in no-cache is the one we pulled which was compressed differently.

The other way around AFAICS: The cache contains the original compressed version (and a decompressed version); whereas buildah push without using the cache compresses the data written to dir:no-cache with pgzip.

In other words, the test was just working before this change because the layers we pulled were compressed with the very same library Buildah was using. In yet other words, we would hit this issue sooner or later once the gzip compression of the stdlib is changed (which is happening).

Yes.

What I suggest to do is to remove the no-cache comparisons from the tests as those are prone to errors. It was really painful to reproduce 😿

Yes, for now.

So, LGTM.


Ultimately, though, the data flow in buildah would really benefit from a closer [or is that more high-level?] look, quite apart from the blobcache: AFAICS the simplest buildah {commit,bud}, by default:

  • In containerImageRef.NewImageSource:
    • Exports the top layer from c/storage as a tar file stream
    • Compresses it on the fly
    • Writes the compressed stream to a temporary file
  • In containerImageSource.GetBlob:
    • Opens the file with the compressed data
  • In storageImageDestination.PutBlob:
    • Decompresses that data on the fly
    • Writes the decompressed result into another temporary file
  • In storageImageDestination.Commit:
    • Opens the file with the decompressed data
    • Creates a c/storage layer from it
  • For good measure, a manifest that actually refers to the compressed blobs is created and recorded as well with the newly created image.

I don’t understand c/storage enough to know whether buildah commit can logically be just a “tag” operation that turns the modifiable top layer of a container into an immutable layer in a cheap operation or whether ultimately the export of the container writable layer as tar + import into a new immutable layer is necessary — but the compression+decompression seems clearly unnecessary, and may be very expensive (in my bechmarks, though admittedly on a laptop and not a cloud node, the CPU cost of compression was by fast the slowest part with dealing with large layers).

@rhatdan
Copy link
Member

rhatdan commented Jan 4, 2019

Lets bring up the rest of the discussion with @nalind When he gets back.
@rh-atomic-bot r+

@rh-atomic-bot
Copy link
Collaborator

📌 Commit 759f40b has been approved by rhatdan

@rhatdan
Copy link
Member

rhatdan commented Jan 5, 2019

@rh-atomic-bot retry

@vrothberg
Copy link
Member Author

The bot seems to have a cold :^)

@rhatdan rhatdan merged commit bb710f3 into containers:master Jan 5, 2019
@mrunalp
Copy link

mrunalp commented Jan 6, 2019

We want this vendored into openshift/builder next.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants