Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroupv2 checkpoint/restore is not working #2328

Closed
kolyshkin opened this issue Apr 19, 2020 · 3 comments · Fixed by #2335
Closed

cgroupv2 checkpoint/restore is not working #2328

kolyshkin opened this issue Apr 19, 2020 · 3 comments · Fixed by #2335

Comments

@kolyshkin
Copy link
Contributor

One reason it is not working is cgroupv2 mount (and BTW cgroupv1 mounts in case --cgroupns is used) should not be treated like (presented to criu as) external bind mount(s).

  • cgroupv1 + cgroupns mounts are real mounts (having fstype of cgroup)
  • cgroupv2 mount is a real mount (fstype is cgroup2)

There might be other bugs, too.

@kolyshkin
Copy link
Contributor Author

I already have a patch to fix this, just need more time to create a PR

kolyshkin added a commit to kolyshkin/runc that referenced this issue Apr 20, 2020
Currently, CRIU 3.14 (not yet released) is required to checkpoint
fstype=cgroup2 mount[1].

Also, some runc changes are needed (since cgroup2 mount is the real mount,
not an external bind mount[2]).

Therefore, let's disable checkpoint tests for cgroup v2 (the ones
that are running on a Fedora 31 Vagrant box) for now.

[1] checkpoint-restore/criu@3a15076405c
[2] opencontainers#2328

Signed-off-by: Kir Kolyshkin <[email protected]>
@adrianreber
Copy link
Contributor

When I implemented the freezer CRIU cgroup2 changes for runc and CRIU it was still working. That was a few weeks ago.

The integration tests are actually running checkpoint/restore using vagrant-fedora-31 with cgroup2 in the container and outside of the container: https://github.com/opencontainers/runc/blob/master/libcontainer/integration/checkpoint_test.go#L86

I am certain I do not understand the whole cgroup details as good as you, just wanted to mention that some checkpoint/restore tests using cgroup2 are working.

kolyshkin added a commit to kolyshkin/runc that referenced this issue Apr 21, 2020
Currently, CRIU 3.14 (not yet released) is required to checkpoint
fstype=cgroup2 mount[1].

Also, some runc changes are needed (since cgroup2 mount is the real mount,
not an external bind mount[2]).

Therefore, let's disable checkpoint tests for cgroup v2 (the ones
that are running on a Fedora 31 Vagrant box) for now.

[1] checkpoint-restore/criu@3a15076405c
[2] opencontainers#2328

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

@adrianreber I'll take a closer look at it, but I think I found (and fixed) a genuine bug, please see #2335. The reason it works in your case is that you explicitly set Device to "cgroup2", while in case of runc run/checkpoint/restore it says "cgroup" since it is taken right from contig.json.

Maybe I need to redo my fix differently, changing "cgroup" to "cgroupv2" in case we're using v2 unified hierarchy, but let's discuss it in #2335

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants