Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADD of remote file in Dockerfile causes occasional "digest mismatch" #5020

Closed
dnephin opened this issue Jun 11, 2024 · 1 comment · Fixed by #5343
Closed

ADD of remote file in Dockerfile causes occasional "digest mismatch" #5020

dnephin opened this issue Jun 11, 2024 · 1 comment · Fixed by #5343
Assignees
Milestone

Comments

@dnephin
Copy link
Member

dnephin commented Jun 11, 2024

Buildkit version: v0.12.3

From a Dockerfile with:

ADD https://host/file.crt

Occasionally we see build failures that look like this:

#9 https://host/file.crt
#9 ERROR: digest mismatch sha256:b155ec88e83ea2a01ff5a72ff270e3bbcb500ecf09a1ef1da80b47478a563bb1: 

The failures are intermittent, but I'm not sure if that's because we round robin between 4 build nodes, or if there is some other problem.

The relevant code seems to be:

if dgst != hs.cacheKey {
ref.Release(context.TODO())
return nil, errors.Errorf("digest mismatch %s: %s", dgst, hs.cacheKey)

I notice that hs.cacheKey is an empty string, which seems strange. I guess it's expected that hs.cacheKey has a non-zero value here?

Anything else I can provide to help debug this problem?

@tonistiigi
Copy link
Member

I guess it could be hitting this case https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L424 unexpectedly. Maybe because the ref has been deleted before Snapshot gets called (or maybe it was already deleted and metadata DB has stale data). Maybe there are some GC logs that confirm that?

It should be that CacheKey can return digest directly if user set one manually (pin), or data can be pulled and digest calculated or digest from previously downloaded ref can be reused if the etag confirms that it has not changed since the last pull. In the latter two cases the data is already pulled and should be accessed by refID. If hs.cacheKey is empty that should mean that it expected early return in L424.

To fix this case, need to make sure ref exists in all possible returns for CacheKey. Eg. it does in https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L283-L291 but not sure if guaranteed in https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L273-L280 and https://github.com/moby/buildkit/blob/v0.14.0/source/http/source.go#L235-L240, and then make sure there is no way it can get released between CacheKey and Snapshot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants