-
Notifications
You must be signed in to change notification settings - Fork 33
[sha512] image: add configurable digest support #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6412 |
ebb29e3 to
ba27cc9
Compare
ba27cc9 to
cf2c428
Compare
cf2c428 to
63e8928
Compare
63e8928 to
6621c6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fairly brief skim of just the changes, without carefully thinking about any individual use case.
(“Request changes” for the storage DiffID matching enforcement)
| return digest.Canonical.FromReader(stream) | ||
| // Use the configured digest algorithm | ||
| algorithm := supportedDigests.Get() | ||
| return algorithm.FromReader(stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works locally, but there’s the cachedDiffID code path where we use an older value. That probably needs to be adjusted, if we want the behavior to be predictable for users.
(This is relevant for schema1 only, and that is nowadays entirely disabled in Docker and the distribution/distribution registry, at least by default. Arguably interoperable support for sha512+schema1 is never going to happen … but, for us, it might be easier to generate sha512+schema1 and let it fail, than to have an ~undocumented exception where we ignore the configuration, or to specifically hard-code an error path and make it absolutely impossible to use such a setup.)
image/copy/single_test.go
Outdated
| // Save original algorithm and set SHA512 | ||
| originalAlgorithm := supportedDigests.Get() | ||
| defer func() { | ||
| err := supportedDigests.Set(originalAlgorithm) | ||
| require.NoError(t, err) | ||
| }() | ||
| err = supportedDigests.Set(digest.SHA512) | ||
| require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applies throughout: Overriding a global inside a test like this is fairly unclean (and somewhat verbose); e.g. it ~prevents running the tests in parallel, in a not-externally-obvious way.
(Leaving aside the question whether we should have a supportedDigest global), I think a much cleaner way to deal with global state of this kind is to parametrize the tested function with an explicit “digest” parameter; maybe adding that parameter to an existing function, maybe splitting the primary function into a tested function with an extra parameter + a trivial wrapper (compare the various …WithHomeDir functions in the codebase).
image/storage/storage_transport.go
Outdated
| return err | ||
| } | ||
| // Try SHA512 (128 characters) | ||
| if len(id) == 128 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not generalize. Do we need to think about including the algorithm in the ID? (Or, alternatively, having supported-digests promise that it will only add ≤1 algorithm for each length — not sustainable long-term.)
6621c6e to
621025c
Compare
621025c to
241f03e
Compare
- Replace hardcoded SHA256 with configurable digest algorithms using storage/pkg/supported-digests - Add centralized digest validation utilities in image/pkg/digestvalidation - Implement parameterized digest computation in image/copy/single.go - Rename DigestIfCanonicalUnknown to DigestIfConfiguredUnknown for clarity Signed-off-by: Lokesh Mandvekar <[email protected]>
241f03e to
e2b0cc6
Compare
|
@mtrmac This is intended at not before podman v5.8, so not in an immediate rush. I'll be happy to fix the storage stuff before we go forward with this, if you prefer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m … generally frustrated with the various tests that do nothing or don’t at all do what they say, with several cases where the one instance of a problem was fixed and the others weren’t, or new cases of exactly the same problem were added again. Looking at the diffs, I don’t feel like making progress much at all.
On the positive side, this PR does, directly or indirectly, run into various important concerns that need addressing and figuring out; and just listing them is valuable knowledge.
What would work as a way to track all those, so that they are not hidden in a sea of comments disappearing behind “Load more”? In the past I did #205 — admittedly also rather unwieldy; there are almost certainly better tools and better options.
Looking at what we’ve ~learned, I think this PR, adding “configurable digests” for all of c/image, is just too big a step (with ID syntax in c/storage, ID syntax in image-like strings (and filter syntax in libimage, in #376), DiffIDs for schema1, DiffID matching for pulls, wanted/unwanted digest updates in destinations, … and perhaps half a dozen other concerns already) — and I can’t imagine implementing all of that in a single commit the way this PR is now.
Can we find some drastically smaller thing to do, and fully solve? “All code in the package which matches bytes against a digest can handle arbitrary digests”? “All digest comparisons for equality or uses as lookup keys are identified and categorized as “validation to be trivially fixed” / “we want this one unchanged” / “some specific tracked problem area”? “We have decided what to do with schema1, and we are doing that and/or refusing to do that, precisely correctly”? “Copying an unmodified sha512-digested image works for $pairOfTransports”?
I don’t know, perhaps even smaller. (One package at a time? Small enough, but maybe too small to allow expressing an idea.)
It’s not provably true that fixing the small easy things makes it any more possible to fix the large unwieldy things, but, in my experience so far, it does tend to be the case. With each small thing that is done done, we learn something, and the outstanding unwieldy blob is a bit smaller and a bit easier to think about — and, eventually, deal with.
| } | ||
|
|
||
| // computeDiffID reads all input from layerStream, uncompresses it using decompressor if necessary, and returns its digest. | ||
| // This is a wrapper around computeDiffIDWithAlgorithm that uses the globally configured digest algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document it the other way — this function does not need to be documented as a wrapper, callers don’t care. On the contrary, computeDiffIDWithAlgorithm should say “this is an internal implementation detail of computeDiffID, and exists only to allow testing it …” so that no-one uses it.
| defer stream2.Close() | ||
|
|
||
| // Use the parametrized function directly instead of overriding global state | ||
| digest, err := computeDiffIDWithAlgorithm(stream2, nil, digest.SHA512) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to be a test of DiffIDComputationGoroutine…
| require.NoError(t, err, c.filename) | ||
| defer stream.Close() | ||
|
|
||
| // Save original algorithm and set the desired one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… and this test, which actually does test computeDiffID, was not updated to benefit from the new parametrized variant.
| return nil, fmt.Errorf("Download config.json digest %s does not match expected %s", computedDigest, m.m.ConfigDescriptor.Digest) | ||
| expectedDigest := m.m.ConfigDescriptor.Digest | ||
|
|
||
| // Validate the blob against the expected digest using centralized validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet again, not worth commenting.
| {"v2s1.manifest.json", TestDockerV2S2ManifestDigest, false}, | ||
| // Unrecognized algorithm | ||
| {"v2s2.manifest.json", digest.Digest("md5:2872f31c5c1f62a694fbd20c1e85257c"), false}, | ||
| // SHA512 test cases (these should fail because we're using SHA256 by default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there’s no reason why MatchesDigest shouldn’t be able to validate against sha512. It doesn’t, but that can be fixed.
| _, err := digest.Parse("sha256:" + id) | ||
| return err | ||
| // ValidateImageID returns nil if id is a valid (full) image ID, or an error | ||
| func ValidateImageID(id string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still #375 (comment)
| } | ||
|
|
||
| if len(id) == expectedLength { | ||
| _, err := digest.Parse(algorithm.String() + ":" + id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not “try each supported algorithm”, and still effectively assumes lengths are unique.
| computedDigest := digest.FromBytes(blob) | ||
| if computedDigest != m.m.ConfigDescriptor.Digest { | ||
| return nil, fmt.Errorf("Download config.json digest %s does not match expected %s", computedDigest, m.m.ConfigDescriptor.Digest) | ||
| expectedDigest := m.m.ConfigDescriptor.Digest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I’m not sure that we need an extra variable here.)
| expectedDigest := m.m.Config.Digest | ||
| algorithm := expectedDigest.Algorithm() | ||
| computedDigest := algorithm.FromBytes(blob) | ||
| if computedDigest != expectedDigest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the schema2 code got the nice new ValidateBlobAgainstDigest … and this ~equivalent does not use it.
| // Based on github.com/docker/docker/distribution/pull_v2.go | ||
| func (m *manifestSchema1) convertToManifestSchema2(_ context.Context, options *types.ManifestUpdateOptions) (*manifestSchema2, error) { | ||
| // Explicitly reject SHA512+Schema1 combinations as they are not supported | ||
| // Schema1 is deprecated and Docker/registry don't support SHA512+Schema1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema1 is deprecated and we don’t expect it to work with sha512? Fine.
Then we don’t touch the code that converts to schema1, but we do modify the code that converts from schema1 to _non_schema1 to reject sha512? That’s exactly the wrong way around.
(And then the code individually rejects sha512, when it should reject all other non-sha256 algorithms instead….)
|
@mtrmac yup, I'll clean out the storage stuff first and then get to this one. A lot of the AI assistance ended up being just added pain. I'll clean out the slop here too after storage and ping you once that's done. Thanks for the reviews so far. |
Depends on #374 . This includes the commit from there, but maybe better to let that one go in first and then I can rebase this one.
(cursor assisted)