Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shared cache mounts result in overlay corruption #2637

Merged
merged 1 commit into from
Feb 17, 2022

Conversation

ktock
Copy link
Collaborator

@ktock ktock commented Feb 15, 2022

Fixes #2334

Currently, MutableRef.Mounts can return writable overlayfs mounts that can be mounted to multiple directories.
However, mounting a writable overlayfs to multiple places can result in an error.

This commit avoids this issue by the approach suggested in #2334 (comment).

One potential fix is to use a trick where you can create a single overlay mount somewhere and then bind-mount the root of that overlay mount multiple places.

@ktock ktock marked this pull request as draft February 15, 2022 13:20
@ktock ktock marked this pull request as ready for review February 15, 2022 13:58
@crazy-max
Copy link
Member

Unrelated but we still have the CNI panic even after upgrading to go-cni 1.1.2 (#2632): https://github.com/moby/buildkit/runs/5201057000?check_suite_focus=true#step:6:1033

2022/02/15 13:52:21 > startCmd 2022-02-15 13:52:06.336865811 +0000 UTC m=+67.430699044 [buildkitd --oci-worker=false --containerd-worker-gc=false --containerd-worker=true --containerd-worker-addr /tmp/bktest_containerd1093028255/containerd.sock --containerd-worker-labels=org.mobyproject.buildkit.worker.sandbox=true --config=/tmp/bktest_config3118302386/buildkitd.toml --root /tmp/bktest_buildkitd572081326 --addr unix:///tmp/bktest_buildkitd572081326/buildkitd.sock --debug]
2022/02/15 13:52:21 time="2022-02-15T13:52:06Z" level=debug msg="remote introspection plugin filters" filters="[type==io.containerd.runtime.v1 type==io.containerd.runtime.v2]"
2022/02/15 13:52:21 panic: runtime error: invalid memory address or nil pointer dereference
2022/02/15 13:52:21 [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xf3a54c]
2022/02/15 13:52:21 
2022/02/15 13:52:21 goroutine 13 [running]:
2022/02/15 13:52:21 github.com/containernetworking/cni/pkg/invoke.(*DefaultExec).FindInPath(0x0, {0xc0005c70f8, 0xc0000c89c0}, {0xc000217050, 0x7f4382367f18, 0x58})
2022/02/15 13:52:21 	<autogenerated>:1 +0x2c
2022/02/15 13:52:21 github.com/containernetworking/cni/libcni.(*CNIConfig).addNetwork(0xc000422f40, {0x1532a38, 0xc000134028}, {0xc0005c6d90, 0xc}, {0xc0005c6d68, 0x5}, 0xc00007c9a0, {0x0, 0x0}, ...)
2022/02/15 13:52:21 	/src/vendor/github.com/containernetworking/cni/libcni/api.go:395 +0x13a
2022/02/15 13:52:21 github.com/containernetworking/cni/libcni.(*CNIConfig).AddNetworkList(0xc0009881b0, {0x1532a38, 0xc000134028}, 0xc00040d260, 0xc0000c92f0)
2022/02/15 13:52:21 	/src/vendor/github.com/containernetworking/cni/libcni/api.go:422 +0x10b
2022/02/15 13:52:21 github.com/containerd/go-cni.(*Network).Attach(0xc00059fda0, {0x1532a38, 0xc000134028}, 0xc0000c9410)
2022/02/15 13:52:21 	/src/vendor/github.com/containerd/go-cni/namespace.go:33 +0x6d
2022/02/15 13:52:21 github.com/containerd/go-cni.asynchAttach({0x1532a38, 0xc000134028}, 0x0, 0xc0000c96e0, 0xc0000c9710, 0xc0000c9740, 0xc0000c9770)
2022/02/15 13:52:21 	/src/vendor/github.com/containerd/go-cni/cni.go:176 +0x85
2022/02/15 13:52:21 created by github.com/containerd/go-cni.(*libcni).attachNetworks
2022/02/15 13:52:21 	/src/vendor/github.com/containerd/go-cni/cni.go:188 +0xb3
2022/02/15 13:52:21 > stopped 2022-02-15 13:52:06.412870394 +0000 UTC m=+67.506703527 exit status 2 2
=== CONT  TestIntegration/TestMountRWCache/worker=containerd/frontend=gateway

@fuweid Upgrading to go-cni 1.1.3 could fix it?

@fuweid
Copy link
Contributor

fuweid commented Feb 15, 2022

Unrelated but we still have the CNI panic even after upgrading to go-cni 1.1.2 (#2632): https://github.com/moby/buildkit/runs/5201057000?check_suite_focus=true#step:6:1033


2022/02/15 13:52:21 > startCmd 2022-02-15 13:52:06.336865811 +0000 UTC m=+67.430699044 [buildkitd --oci-worker=false --containerd-worker-gc=false --containerd-worker=true --containerd-worker-addr /tmp/bktest_containerd1093028255/containerd.sock --containerd-worker-labels=org.mobyproject.buildkit.worker.sandbox=true --config=/tmp/bktest_config3118302386/buildkitd.toml --root /tmp/bktest_buildkitd572081326 --addr unix:///tmp/bktest_buildkitd572081326/buildkitd.sock --debug]

2022/02/15 13:52:21 time="2022-02-15T13:52:06Z" level=debug msg="remote introspection plugin filters" filters="[type==io.containerd.runtime.v1 type==io.containerd.runtime.v2]"

2022/02/15 13:52:21 panic: runtime error: invalid memory address or nil pointer dereference

2022/02/15 13:52:21 [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xf3a54c]

2022/02/15 13:52:21 

2022/02/15 13:52:21 goroutine 13 [running]:

2022/02/15 13:52:21 github.com/containernetworking/cni/pkg/invoke.(*DefaultExec).FindInPath(0x0, {0xc0005c70f8, 0xc0000c89c0}, {0xc000217050, 0x7f4382367f18, 0x58})

2022/02/15 13:52:21 	<autogenerated>:1 +0x2c

2022/02/15 13:52:21 github.com/containernetworking/cni/libcni.(*CNIConfig).addNetwork(0xc000422f40, {0x1532a38, 0xc000134028}, {0xc0005c6d90, 0xc}, {0xc0005c6d68, 0x5}, 0xc00007c9a0, {0x0, 0x0}, ...)

2022/02/15 13:52:21 	/src/vendor/github.com/containernetworking/cni/libcni/api.go:395 +0x13a

2022/02/15 13:52:21 github.com/containernetworking/cni/libcni.(*CNIConfig).AddNetworkList(0xc0009881b0, {0x1532a38, 0xc000134028}, 0xc00040d260, 0xc0000c92f0)

2022/02/15 13:52:21 	/src/vendor/github.com/containernetworking/cni/libcni/api.go:422 +0x10b

2022/02/15 13:52:21 github.com/containerd/go-cni.(*Network).Attach(0xc00059fda0, {0x1532a38, 0xc000134028}, 0xc0000c9410)

2022/02/15 13:52:21 	/src/vendor/github.com/containerd/go-cni/namespace.go:33 +0x6d

2022/02/15 13:52:21 github.com/containerd/go-cni.asynchAttach({0x1532a38, 0xc000134028}, 0x0, 0xc0000c96e0, 0xc0000c9710, 0xc0000c9740, 0xc0000c9770)

2022/02/15 13:52:21 	/src/vendor/github.com/containerd/go-cni/cni.go:176 +0x85

2022/02/15 13:52:21 created by github.com/containerd/go-cni.(*libcni).attachNetworks

2022/02/15 13:52:21 	/src/vendor/github.com/containerd/go-cni/cni.go:188 +0xb3

2022/02/15 13:52:21 > stopped 2022-02-15 13:52:06.412870394 +0000 UTC m=+67.506703527 exit status 2 2

=== CONT  TestIntegration/TestMountRWCache/worker=containerd/frontend=gateway

@fuweid Upgrading to go-cni 1.1.3 could fix it?

Yes! Sorry for rush fix in v1.1.2. I forgot to update cni opt. Sorry for that

Ref:

@crazy-max
Copy link
Member

Thanks @fuweid!

Copy link
Collaborator

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, thanks for tackling it!

// Don't need temporary mount wrapper for non-overlayfs mounts
return mounts, release, nil
}
dir, err := ioutil.TempDir("", "buildkit")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a defer that removes this dir if an error is encountered after this that causes the func to return early

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. Fixed this.

Comment on lines +1606 to +1626
// no mount exist. release the current mount.
sm.curMounts = nil
if err := mount.Unmount(sm.curMountPoint, 0); err != nil {
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't have to be fixed in this PR as it's a larger issue, but if buildkitd crashed or errors were encountered unmounting here I guess we could leak overlay mounts. Then if buildkitd started up again, you could run into the same issue of duplicating overlay mounts because the old ones are still sitting around. This is different from mounts made for containers w/ runc/containerd because those are typically mounted inside a mount namespace, meaning they will get auto cleaned up by the kernel when no processes are left in the namespace. The only remediation would be to manually unmount stuff from under /tmp or reboot the host.

Like I said, this is probably a pre-existing issue, but something to keep in mind. We might want a way of cleaning up any leftover mounts when buildkitd starts up (though even that can't be 100% robust if it's possible for users to change the configuration of where such tmp mounts are made). Maybe we could even run buildkitd inside its own mount namespace, but trying to get Go to do that is never easy and might have weird undesirable side effects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. Let's work on it in following-up PRs. I think we can add a directory under /var/lib/buildkit/ for storing all temporary overlay mounts. When buildkitd starts, it should clean up all existing mounts under that directory.

@tonistiigi tonistiigi merged commit c14bf69 into moby:master Feb 17, 2022
@ktock ktock deleted the sharemounts branch February 17, 2022 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shared cache mounts created from not scratch result in overlay corruption
5 participants