Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CgroupsV2 - possible migration issue #4481

Closed
returntrip opened this issue Nov 8, 2019 · 15 comments
Closed

CgroupsV2 - possible migration issue #4481

returntrip opened this issue Nov 8, 2019 · 15 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@returntrip
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description
After having migrated my Silverblue 30 toolboxes, libpod.conf will populate cgroup_manager with cgroupfs. If, for whatever reason a user, stops the toolbox container, deletes libpod.conf and re runs toolbox enter a new libpod.conf is generated with cgroups_manager = "systemd" and all migrated containers will Error: unable to start container "fedora-toolbox-31": sd-bus call: Invalid argument: OCI runtime error when they are started. This affects any migrated container and not only toolboxes as far as I can see.
Replacing systemd with cgroupfs restores functionality.

Steps to reproduce the issue:

  1. In Silverblue 31: toolbox enter to migrate pre existing toolbox containers

  2. Stop the toolbox container with podman stop <container>

  3. Delete .config/containers/libpod.conf

  4. Re enter toolbox toolbox enter.

Describe the results you received:
Above will cause: Error: unable to start container "fedora-toolbox-31": sd-bus call: Invalid argument: OCI runtime error

Describe the results you expected:
Toolbox should open up as usual.
libpod.conf should be created correctly with relevant cgroups_manager or maybe with some dynamic cgroups_manager allocation (if technically possible).

Additional information you deem important (e.g. issue happens only occasionally):
Issue happens systematically on libpod.conf creation

**Output of podman version:

Version:            1.6.2
RemoteAPI Version:  1
Go Version:         go1.13.1
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.13.1
  podman version: 1.6.2
host:
  BuildahVersion: 1.11.3
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.2-1.fc31.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.2, commit: 186a550ba0866ce799d74006dab97969a2107979'
  Distribution:
    distribution: fedora
    version: "31"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 2215649280
  MemTotal: 8249016320
  OCIRuntime:
    name: crun
    package: crun-0.10.2-1.fc31.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.10.2
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 8380215296
  SwapTotal: 8380215296
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: anduin.figura.io
  kernel: 5.3.8-300.fc31.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31.x86_64
    Version: |-
      slirp4netns version 0.4.0-beta.3+dev
      commit: bbd6f25c70d5db2a1cd3bfb0416a8db99a75ed7e
  uptime: 2h 10m 17.66s (Approximately 0.08 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /home/returntrip/.config/containers/storage.conf
  ContainerStore:
    number: 15
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.6.5-2.fc31.x86_64
      Version: |-
        fusermount3 version: 3.6.2
        fuse-overlayfs: version 0.6.5
        FUSE library version 3.6.2
        using FUSE kernel interface version 7.29
  GraphRoot: /var/home/returntrip/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 11
  RunRoot: /tmp/1000
  VolumePath: /var/home/returntrip/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.6.2-2.fc31.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):
Physical Silverblue 31

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 8, 2019
@returntrip
Copy link
Contributor Author

@debarshiray FYI

@mheon
Copy link
Member

mheon commented Nov 8, 2019

@giuseppe PTAL - I don't think there are migration issues around systemd/cgroupfs, so this is probably something separate...

@returntrip If the cgroup manager is set to systemd, do new containers work and migrated containers fail, or do all containers fail?

@mheon
Copy link
Member

mheon commented Nov 8, 2019

Oh, wait, wait, wait... This could be the container's CGroup parent. That won't be migrated by our current code... Damn.

@returntrip
Copy link
Contributor Author

With systemd manager, new stef container and previous cgroups made fedora-toolbox-31 container

[returntrip@anduin ~]$ grep 'cgroup_manager' .config/containers/libpod.conf 
cgroup_manager = "systemd"
[returntrip@anduin ~]$ podman create --name stef -t fedora echo test
e00f1faba4154d9116863546d6b54ca6186a576d59aab03f53d224710cebdf21
[returntrip@anduin ~]$ podman start stef
e00f1faba4154d9116863546d6b54ca6186a576d59aab03f53d224710cebdf21
[returntrip@anduin ~]$ podman stop stef
e00f1faba4154d9116863546d6b54ca6186a576d59aab03f53d224710cebdf21
[returntrip@anduin ~]$ podman start stef
e00f1faba4154d9116863546d6b54ca6186a576d59aab03f53d224710cebdf21
[returntrip@anduin ~]$ podman stop stef
e00f1faba4154d9116863546d6b54ca6186a576d59aab03f53d224710cebdf21
[returntrip@anduin ~]$ podman start fedora-toolbox-31
Error: unable to start container "fedora-toolbox-31": sd-bus call: Invalid argument: OCI runtime error

With cgroupfs manager, systemd created stef container and cgroupsfs created container:

[returntrip@anduin ~]$ grep 'cgroup_manager' .config/containers/libpod.conf 
cgroup_manager = "cgroupfs"
[returntrip@anduin ~]$ podman start stef
Error: unable to start container "stef": cannot configure rootless cgroup using the cgroupfs manager
cannot set limits without cgroups: OCI runtime error
[returntrip@anduin ~]$ podman start fedora-toolbox-31
458064c1f4e4bac00ddcefb7e9a40479a489e708937fba81cb8d1f33b8a92fea
[returntrip@anduin ~]$ podman stop fedora-toolbox-31
458064c1f4e4bac00ddcefb7e9a40479a489e708937fba81cb8d1f33b8a92fea
[returntrip@anduin ~]$ podman start fedora-toolbox-31
458064c1f4e4bac00ddcefb7e9a40479a489e708937fba81cb8d1f33b8a92fea
[returntrip@anduin ~]$ podman stop fedora-toolbox-31
458064c1f4e4bac00ddcefb7e9a40479a489e708937fba81cb8d1f33b8a92fea

@mheon
Copy link
Member

mheon commented Nov 8, 2019

Alright, it's probably CGroup parent, then. We may want to force cgroupfs driver for migrated systems, but it's not clear how we would identify that the system should retain the cgroupfs driver.

@returntrip
Copy link
Contributor Author

This is also odd, why I cannot re enter a freshly systemd manager created toolbox?

[returntrip@anduin ~]$ grep 'cgroup_manager' .config/containers/libpod.conf 
cgroup_manager = "systemd"
[returntrip@anduin ~]$ toolbox create --container systemd
Created container: systemd
Enter with: toolbox enter --container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
⬢[returntrip@toolbox returntrip]$ exit
logout
[returntrip@anduin ~]$ toolbox enter --container systemd
⬢[returntrip@toolbox returntrip]$ exit
logout
[returntrip@anduin ~]$ podman stop systemd
]7dcda7013379ac68990f83e84d5008fdabe698f1937c83af5483b685fb19aa26
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox -v enter --container systemd
toolbox: running as real user ID 1000
toolbox: resolved absolute path for /usr/bin/toolbox to /usr/bin/toolbox
toolbox: checking if /etc/subgid and /etc/subuid have entries for user returntrip
toolbox: TOOLBOX_PATH is /usr/bin/toolbox
toolbox: running on a cgroups v2 host
toolbox: current Podman version is 1.6.2
toolbox: migration not needed: Podman version 1.6.2 is unchanged
toolbox: Fedora generational core is f31
toolbox: base image is fedora-toolbox:31
toolbox: container is systemd
toolbox: checking if container systemd exists
toolbox: calling org.freedesktop.Flatpak.SessionHelper.RequestSession
toolbox: starting container systemd
toolbox: /etc/profile.d/toolbox.sh already mounted in container systemd
Error: unable to start container "systemd": container '7dcda7013379ac68990f83e84d5008fdabe698f1937c83af5483b685fb19aa26' already exists: OCI runtime error
toolbox: failed to start container systemd

@mheon
Copy link
Member

mheon commented Nov 8, 2019

Can you reproduce that consistently? That's a separate bug that we haven't found a consistent repro for yet.

@returntrip
Copy link
Contributor Author

returntrip commented Nov 8, 2019

Not sure this tests well consistence but it never enters toolbox:

[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ toolbox enter --container systemd
toolbox: failed to start container systemd
[returntrip@anduin ~]$ grep 'cgroup_manager' .config/containers/libpod.conf 
cgroup_manager = "systemd"

I also guess we need to reopen #4198 cause my test was made using toolbox (migrated) images created with cgroupfs manager. I have a headache atm so can't think straight :(.

Adding some logs

Nov 08 18:33:04 anduin.figura.io conmon[32763]: conmon edbaeec67ff06907f894 <ndebug>: failed to write to /proc/self/oom_score_adj: Permission denied
Nov 08 18:33:04 anduin.figura.io conmon[32764]: conmon edbaeec67ff06907f894 <ninfo>: attach sock path: /run/user/1000/libpod/tmp/socket/edbaeec67ff06907f89466db0cf87232cd1542f1e7dd00788a9d4a0672328ec8/attach
Nov 08 18:33:04 anduin.figura.io conmon[32764]: conmon edbaeec67ff06907f894 <ninfo>: addr{sun_family=AF_UNIX, sun_path=/run/user/1000/libpod/tmp/socket/edbaeec67ff06907f89466db0cf87232cd1542f1e7dd00788a9d4a0672328ec8/attach}
Nov 08 18:33:04 anduin.figura.io conmon[32764]: conmon edbaeec67ff06907f894 <ninfo>: ctl fifo path: /var/home/returntrip/.local/share/containers/storage/overlay-containers/edbaeec67ff06907f89466db0cf87232cd1542f1e7dd00788a9d4a0672328ec8/userdata/ctl
Nov 08 18:33:04 anduin.figura.io conmon[32764]: conmon edbaeec67ff06907f894 <ninfo>: terminal_ctrl_fd: 12
Nov 08 18:33:04 anduin.figura.io systemd[1441]: Started libcrun container.
Nov 08 18:33:04 anduin.figura.io conmon[32764]: conmon edbaeec67ff06907f894 <ndebug>: container PID: 32770
Nov 08 18:33:04 anduin.figura.io conmon[32764]: conmon edbaeec67ff06907f894 <ndebug>: couldn't find cb for pid 32769
Nov 08 18:33:04 anduin.figura.io podman[32747]: 2019-11-08 18:33:04.456267111 +0100 CET m=+0.128287861 container init edbaeec67ff06907f89466db0cf87232cd1542f1e7dd00788a9d4a0672328ec8 (image=registry.fedoraproject.org/f31/fedora-toolbox:31, name=systemd)
Nov 08 18:33:04 anduin.figura.io podman[32747]: 2019-11-08 18:33:04.466775231 +0100 CET m=+0.138796001 container start edbaeec67ff06907f89466db0cf87232cd1542f1e7dd00788a9d4a0672328ec8 (image=registry.fedoraproject.org/f31/fedora-toolbox:31, name=systemd)
Nov 08 18:33:15 anduin.figura.io systemd[1441]: libpod-edbaeec67ff06907f89466db0cf87232cd1542f1e7dd00788a9d4a0672328ec8.scope: Succeeded.
Nov 08 18:33:15 anduin.figura.io conmon[32764]: conmon edbaeec67ff06907f894 <ninfo>: container 32770 exited with status 143
Nov 08 18:33:15 anduin.figura.io conmon[32764]: conmon edbaeec67ff06907f894 <nwarn>: Failed to open cgroups file: /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/user.slice/libpod-edbaeec67ff06907f89466db0cf87232cd1542f1e7dd00788a9d4a0672328ec8.scope/memory.events
Nov 08 18:33:15 anduin.figura.io podman[32870]: 2019-11-08 18:33:15.226072134 +0100 CET m=+0.199095097 container died edbaeec67ff06907f89466db0cf87232cd1542f1e7dd00788a9d4a0672328ec8 (image=registry.fedoraproject.org/f31/fedora-toolbox:31, name=systemd)
Nov 08 18:33:15 anduin.figura.io podman[32870]: 2019-11-08 18:33:15.231748826 +0100 CET m=+0.204771763 container stop edbaeec67ff06907f89466db0cf87232cd1542f1e7dd00788a9d4a0672328ec8 (image=registry.fedoraproject.org/f31/fedora-toolbox:31, name=systemd)
Nov 08 18:33:15 anduin.figura.io podman[32885]: 2019-11-08 18:33:15.260807278 +0100 CET m=+0.136858600 container cleanup edbaeec67ff06907f89466db0cf87232cd1542f1e7dd00788a9d4a0672328ec8 (image=registry.fedoraproject.org/f31/fedora-toolbox:31, name=systemd)

@cryobry
Copy link

cryobry commented Nov 19, 2019

I am getting OCI errors during a podman run new container on F31:

[conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied

DEBU[0000] Received: -1                                 
DEBU[0000] Cleaning up container 2334603b9d2c93f41439eeda5075e1accdd19f7281a6277ea2422a2c63ac2b2d 
DEBU[0000] Network is already cleaned up, skipping...   
DEBU[0000] unmounted container "2334603b9d2c93f41439eeda5075e1accdd19f7281a6277ea2422a2c63ac2b2d" 
DEBU[0000] Cleaning up container 2334603b9d2c93f41439eeda5075e1accdd19f7281a6277ea2422a2c63ac2b2d 
DEBU[0000] Network is already cleaned up, skipping...   
DEBU[0000] Container 2334603b9d2c93f41439eeda5075e1accdd19f7281a6277ea2422a2c63ac2b2d storage is already unmounted, skipping... 
DEBU[0000] Container 2334603b9d2c93f41439eeda5075e1accdd19f7281a6277ea2422a2c63ac2b2d storage is already unmounted, skipping... 
DEBU[0000] ExitCode msg: "creating file '/home/bryan/.local/share/containers/storage/overlay/0d16ae8e2bc06993506be6c9cc3742d980a6f7ff35b0f3955ad7241a56836abb/merged/run/.containerenv': permission denied: oci runtime permission denied error" 
ERRO[0000] creating file '/home/bryan/.local/share/containers/storage/overlay/0d16ae8e2bc06993506be6c9cc3742d980a6f7ff35b0f3955ad7241a56836abb/merged/run/.containerenv': Permission denied: OCI runtime permission denied error

I opened containers/crun#187 in crun.

@rhatdan
Copy link
Member

rhatdan commented Nov 20, 2019

What podman command are you executing?

@giuseppe
Copy link
Member

I think it is the same issue we are tracking in crun

@cryobry
Copy link

cryobry commented Nov 20, 2019

Yes, see containers/crun#187

@returntrip
Copy link
Contributor Author

@giuseppe so the issue I logged is related to containers/crun#187?

if so, should we close one of the two?

@github-actions
Copy link

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

@rhatdan
Copy link
Member

rhatdan commented Dec 21, 2019

Looks like this is fixed in master.

@rhatdan rhatdan closed this as completed Dec 21, 2019
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

No branches or pull requests

6 participants