Skip to content

Conversation

@lsm5
Copy link
Member

@lsm5 lsm5 commented Oct 6, 2025

Depends on #374 . This includes the commit from there, but maybe better to let that one go in first and then I can rebase this one.

(cursor assisted)

@github-actions github-actions bot added storage Related to "storage" package image Related to "image" package labels Oct 6, 2025
@lsm5 lsm5 changed the title Image agile digest image: add configurable digest support Oct 6, 2025
@lsm5 lsm5 changed the title image: add configurable digest support [sha512] image: add configurable digest support Oct 6, 2025
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Oct 6, 2025
@podmanbot
Copy link

✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6412

@lsm5 lsm5 force-pushed the image-agile-digest branch from ebb29e3 to ba27cc9 Compare October 6, 2025 18:20
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Oct 6, 2025
@lsm5 lsm5 force-pushed the image-agile-digest branch from ba27cc9 to cf2c428 Compare October 6, 2025 18:25
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Oct 6, 2025
@lsm5 lsm5 force-pushed the image-agile-digest branch from cf2c428 to 63e8928 Compare October 6, 2025 18:33
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Oct 6, 2025
@lsm5 lsm5 force-pushed the image-agile-digest branch from 63e8928 to 6621c6e Compare October 6, 2025 18:38
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Oct 6, 2025
Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fairly brief skim of just the changes, without carefully thinking about any individual use case.

(“Request changes” for the storage DiffID matching enforcement)

return digest.Canonical.FromReader(stream)
// Use the configured digest algorithm
algorithm := supportedDigests.Get()
return algorithm.FromReader(stream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works locally, but there’s the cachedDiffID code path where we use an older value. That probably needs to be adjusted, if we want the behavior to be predictable for users.

(This is relevant for schema1 only, and that is nowadays entirely disabled in Docker and the distribution/distribution registry, at least by default. Arguably interoperable support for sha512+schema1 is never going to happen … but, for us, it might be easier to generate sha512+schema1 and let it fail, than to have an ~undocumented exception where we ignore the configuration, or to specifically hard-code an error path and make it absolutely impossible to use such a setup.)

Comment on lines 127 to 134
// Save original algorithm and set SHA512
originalAlgorithm := supportedDigests.Get()
defer func() {
err := supportedDigests.Set(originalAlgorithm)
require.NoError(t, err)
}()
err = supportedDigests.Set(digest.SHA512)
require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applies throughout: Overriding a global inside a test like this is fairly unclean (and somewhat verbose); e.g. it ~prevents running the tests in parallel, in a not-externally-obvious way.

(Leaving aside the question whether we should have a supportedDigest global), I think a much cleaner way to deal with global state of this kind is to parametrize the tested function with an explicit “digest” parameter; maybe adding that parameter to an existing function, maybe splitting the primary function into a tested function with an extra parameter + a trivial wrapper (compare the various …WithHomeDir functions in the codebase).

return err
}
// Try SHA512 (128 characters)
if len(id) == 128 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not generalize. Do we need to think about including the algorithm in the ID? (Or, alternatively, having supported-digests promise that it will only add ≤1 algorithm for each length — not sustainable long-term.)

@lsm5 lsm5 force-pushed the image-agile-digest branch from 6621c6e to 621025c Compare October 7, 2025 22:01
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Oct 7, 2025
@lsm5
Copy link
Member Author

lsm5 commented Oct 7, 2025

@mtrmac I think I've addressed everything so far along with the requested change. Also, rebased on the latest in #374 as well. PTAL. Thanks a lot for all the detailed comments.

@lsm5 lsm5 marked this pull request as ready for review October 7, 2025 22:25
@lsm5 lsm5 force-pushed the image-agile-digest branch from 621025c to 241f03e Compare October 8, 2025 12:58
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Oct 8, 2025
@lsm5 lsm5 mentioned this pull request Oct 8, 2025
- Replace hardcoded SHA256 with configurable digest algorithms using storage/pkg/supported-digests
- Add centralized digest validation utilities in image/pkg/digestvalidation
- Implement parameterized digest computation in image/copy/single.go
- Rename DigestIfCanonicalUnknown to DigestIfConfiguredUnknown for clarity

Signed-off-by: Lokesh Mandvekar <[email protected]>
@lsm5 lsm5 force-pushed the image-agile-digest branch from 241f03e to e2b0cc6 Compare October 8, 2025 13:11
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Oct 8, 2025
@lsm5
Copy link
Member Author

lsm5 commented Oct 8, 2025

@mtrmac This is intended at not before podman v5.8, so not in an immediate rush. I'll be happy to fix the storage stuff before we go forward with this, if you prefer.

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m … generally frustrated with the various tests that do nothing or don’t at all do what they say, with several cases where the one instance of a problem was fixed and the others weren’t, or new cases of exactly the same problem were added again. Looking at the diffs, I don’t feel like making progress much at all.


On the positive side, this PR does, directly or indirectly, run into various important concerns that need addressing and figuring out; and just listing them is valuable knowledge.

What would work as a way to track all those, so that they are not hidden in a sea of comments disappearing behind “Load more”? In the past I did #205 — admittedly also rather unwieldy; there are almost certainly better tools and better options.


Looking at what we’ve ~learned, I think this PR, adding “configurable digests” for all of c/image, is just too big a step (with ID syntax in c/storage, ID syntax in image-like strings (and filter syntax in libimage, in #376), DiffIDs for schema1, DiffID matching for pulls, wanted/unwanted digest updates in destinations, … and perhaps half a dozen other concerns already) — and I can’t imagine implementing all of that in a single commit the way this PR is now.

Can we find some drastically smaller thing to do, and fully solve? “All code in the package which matches bytes against a digest can handle arbitrary digests”? “All digest comparisons for equality or uses as lookup keys are identified and categorized as “validation to be trivially fixed” / “we want this one unchanged” / “some specific tracked problem area”? “We have decided what to do with schema1, and we are doing that and/or refusing to do that, precisely correctly”? “Copying an unmodified sha512-digested image works for $pairOfTransports”?

I don’t know, perhaps even smaller. (One package at a time? Small enough, but maybe too small to allow expressing an idea.)

It’s not provably true that fixing the small easy things makes it any more possible to fix the large unwieldy things, but, in my experience so far, it does tend to be the case. With each small thing that is done done, we learn something, and the outstanding unwieldy blob is a bit smaller and a bit easier to think about — and, eventually, deal with.

}

// computeDiffID reads all input from layerStream, uncompresses it using decompressor if necessary, and returns its digest.
// This is a wrapper around computeDiffIDWithAlgorithm that uses the globally configured digest algorithm.
Copy link
Contributor

@mtrmac mtrmac Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document it the other way — this function does not need to be documented as a wrapper, callers don’t care. On the contrary, computeDiffIDWithAlgorithm should say “this is an internal implementation detail of computeDiffID, and exists only to allow testing it …” so that no-one uses it.

defer stream2.Close()

// Use the parametrized function directly instead of overriding global state
digest, err := computeDiffIDWithAlgorithm(stream2, nil, digest.SHA512)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be a test of DiffIDComputationGoroutine

require.NoError(t, err, c.filename)
defer stream.Close()

// Save original algorithm and set the desired one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and this test, which actually does test computeDiffID, was not updated to benefit from the new parametrized variant.

return nil, fmt.Errorf("Download config.json digest %s does not match expected %s", computedDigest, m.m.ConfigDescriptor.Digest)
expectedDigest := m.m.ConfigDescriptor.Digest

// Validate the blob against the expected digest using centralized validation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet again, not worth commenting.

{"v2s1.manifest.json", TestDockerV2S2ManifestDigest, false},
// Unrecognized algorithm
{"v2s2.manifest.json", digest.Digest("md5:2872f31c5c1f62a694fbd20c1e85257c"), false},
// SHA512 test cases (these should fail because we're using SHA256 by default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there’s no reason why MatchesDigest shouldn’t be able to validate against sha512. It doesn’t, but that can be fixed.

_, err := digest.Parse("sha256:" + id)
return err
// ValidateImageID returns nil if id is a valid (full) image ID, or an error
func ValidateImageID(id string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

if len(id) == expectedLength {
_, err := digest.Parse(algorithm.String() + ":" + id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not “try each supported algorithm”, and still effectively assumes lengths are unique.

computedDigest := digest.FromBytes(blob)
if computedDigest != m.m.ConfigDescriptor.Digest {
return nil, fmt.Errorf("Download config.json digest %s does not match expected %s", computedDigest, m.m.ConfigDescriptor.Digest)
expectedDigest := m.m.ConfigDescriptor.Digest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I’m not sure that we need an extra variable here.)

expectedDigest := m.m.Config.Digest
algorithm := expectedDigest.Algorithm()
computedDigest := algorithm.FromBytes(blob)
if computedDigest != expectedDigest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the schema2 code got the nice new ValidateBlobAgainstDigest … and this ~equivalent does not use it.

// Based on github.com/docker/docker/distribution/pull_v2.go
func (m *manifestSchema1) convertToManifestSchema2(_ context.Context, options *types.ManifestUpdateOptions) (*manifestSchema2, error) {
// Explicitly reject SHA512+Schema1 combinations as they are not supported
// Schema1 is deprecated and Docker/registry don't support SHA512+Schema1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema1 is deprecated and we don’t expect it to work with sha512? Fine.

Then we don’t touch the code that converts to schema1, but we do modify the code that converts from schema1 to _non_schema1 to reject sha512? That’s exactly the wrong way around.

(And then the code individually rejects sha512, when it should reject all other non-sha256 algorithms instead….)

@lsm5
Copy link
Member Author

lsm5 commented Oct 24, 2025

@mtrmac yup, I'll clean out the storage stuff first and then get to this one. A lot of the AI assistance ended up being just added pain. I'll clean out the slop here too after storage and ping you once that's done. Thanks for the reviews so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

image Related to "image" package storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants