Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman unable to limit memory (-m flag) on Ubuntu/Debian distros #6365

Closed
richpeaua opened this issue May 23, 2020 · 45 comments
Closed

podman unable to limit memory (-m flag) on Ubuntu/Debian distros #6365

richpeaua opened this issue May 23, 2020 · 45 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@richpeaua
Copy link

richpeaua commented May 23, 2020

/kind bug

Description

Was trying to test drive Hashicorp Nomad podman driver on Debian/Ubuntu machines and I ran into an interesting issue. Deployment of containers, via Nomad, fail, yet if I just ran podman run <image> it worked fine.

The errors I was getting were "failed to write -1 to /sys/fs/cgroup/memory/machine.slice/libpod-2e61c7b46bc2aeed6dadecb07583e97e03ffcc694.scope/memory.memsw.limit_in_bytes".

I looked into that libpod-*-.scope cgroup directory and it's missing the memory.memsw.limit_in_bytes resource control file

image

I then deployed to a Centos8 machine and it worked just fine. I followed up by checking to see if that file was present and sure enough it was.

image

Due to this difference in the cgroup control files podman, and thus Nomad, is unable to limit memory in the Debian/Ubuntu distros

Steps to reproduce the issue:

  1. sudo podman run -m=40m hello-world (on Ubuntu16-20 or Debian/9/10)

Describe the results you received:

Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap. Error: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:415: setting cgroup config for procHooks process caused \\\"failed to write \\\\\\\"-1\\\\\\\" to \\\\\\\"/sys/fs/cgroup/memory/machine.slice/libpod-8639bb6811f6bec5282ee72dac48abb6a647ce7daf84dd000b195dc4fe6f4df7.scope/memory.memsw.limit_in_bytes\\\\\\\": open /sys/fs/cgroup/memory/machine.slice/libpod-8639bb6811f6bec5282ee72dac48abb6a647ce7daf84dd000b195dc4fe6f4df7.scope/memory.memsw.limit_in_bytes: permission denied\\\"\"": OCI runtime permission denied error

Describe the results you expected:

Output of hello-world container/successful job deployment status on Nomad

Additional information you deem important (e.g. issue happens only occasionally):

I've tried using multiple versions of podman. I've looked into all the Ubuntu distros 16.04+ and the same issue in all of those releases, so it seems to be just a weird divergence in cgroup resource control interfaces between the RPM and Deb distros.

And I want to repeat, running podman without -m flag WORKS. both rootless and with root.

Output of podman version:

Version:            1.9.2
RemoteAPI Version:  1
Go Version:         go1.10.1
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  gitCommit: ""
  goVersion: go1.10.1
  podmanVersion: 1.9.2
host:
  arch: amd64
  buildahVersion: 1.14.8
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.16, commit: '
  cpus: 1
  distribution:
    distribution: ubuntu
    version: "18.04"
  eventLogger: file
  hostname: nomad-server01
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 4.15.0-101-generic
  memFree: 137125888
  memTotal: 1033011200
  ociRuntime:
    name: runc
    package: 'runc: /usr/sbin/runc'
    path: /usr/sbin/runc
    version: 'runc version spec: 1.0.1-dev'
  os: linux
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 0.4.3
      commit: unknown
  swapFree: 0
  swapTotal: 0
  uptime: 11h 50m 16.22s (Approximately 0.46 days)
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /home/vagrant/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 0
    stopped: 2
  graphDriverName: vfs
  graphOptions: {}
  graphRoot: /home/vagrant/.local/share/containers/storage
  graphStatus: {}
  imageStore:
    number: 1
  runRoot: /run/user/1000/containers
  volumePath: /home/vagrant/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman/unknown,now 1.9.2~3 amd64 [installed]

Additional environment details (AWS, VirtualBox, physical, etc.):

VirtualBox, GCP compute instances

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 23, 2020
@geverartsdev
Copy link

geverartsdev commented May 23, 2020

Did you try adding swapaccount=1 and cgroup_enable=memoryas parameter to your init process? (Probably by modifying grub configuration)

I think I remember having the sample problem a while ago and fixing it this way

@richpeaua
Copy link
Author

@geverartsdev Geeze have I just been spoiled by the RPM distros lol. I totally forgot about the kernal params that aren't switched on by default in other distros. Yes this totally worked thank you.

A follow up question though. Are there changes that can be made to not have this problem out-of-the-box. Docker doesn't have this problem, so I'm guessing they are doing something to bypass ( I say bypass because I just tested installing docker on a Debian machine and doing memory limiting without issue but then installed podman and got the same issue until I set the params) or maybe they're only limiting pure memory and not swap+mem?

@geverartsdev
Copy link

geverartsdev commented May 24, 2020

@richpeaua I am pleased to know that it worked for you.

I guess it could be possible to avoid running into this issue out of the box yes. When you take a look at the configuration file (config.json) that podman transmits to the container runtime in use, you can see that whenever you activate any memory limit, podman will also add a swap usage limit (which was in my case the double of the memory limit I imposed). I don't know why it does that, but this could surely be possible for podman to detect that no swap support is present on the machine and not add this line (as I suppose Docker does).

An example of the config.json file with memory limitation (see linux.resources.memory)
{
   "ociVersion":"1.0.1-dev",
   "process":{
      "terminal":true,
      "user":{
         "uid":0,
         "gid":0,
         "additionalGids":[
            0,
            1,
            2,
            3,
            4,
            6,
            10,
            11,
            20,
            26,
            27
         
]
      
},
      "args":[
         "sh"
      
],
      "env":[
         "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
         "TERM=xterm",
         "HOSTNAME=1d3498689c56",
         "container=podman",
         "HOME=/root"
      
],
      "cwd":"/",
      "capabilities":{
         "bounding":[
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
         
],
         "effective":[
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
         
],
         "inheritable":[
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
         
],
         "permitted":[
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
         
],
         "ambient":[
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
         
]
      
},
      "rlimits":[
         {
            "type":"RLIMIT_NOFILE",
            "hard":1048576,
            "soft":1048576
         
},
         {
            "type":"RLIMIT_NPROC",
            "hard":32768,
            "soft":32768
         
}
      
],
      "oomScoreAdj":0
   
},
   "root":{
      "path":"/var/lib/containers/storage/aufs/mnt/4b6754a223c81dcc18b836a5a0ddd3742dbe18331e9d8c1cf444deaca3101577"
   
},
   "hostname":"1d3498689c56",
   "mounts":[
      {
         "destination":"/proc",
         "type":"proc",
         "source":"proc",
         "options":[
            "nosuid",
            "noexec",
            "nodev"
         
]
      
},
      {
         "destination":"/dev",
         "type":"tmpfs",
         "source":"tmpfs",
         "options":[
            "nosuid",
            "strictatime",
            "mode=755",
            "size=65536k"
         
]
      
},
      {
         "destination":"/sys",
         "type":"sysfs",
         "source":"sysfs",
         "options":[
            "nosuid",
            "noexec",
            "nodev",
            "ro"
         
]
      
},
      {
         "destination":"/dev/pts",
         "type":"devpts",
         "source":"devpts",
         "options":[
            "nosuid",
            "noexec",
            "newinstance",
            "ptmxmode=0666",
            "mode=0620",
            "gid=5"
         
]
      
},
      {
         "destination":"/dev/mqueue",
         "type":"mqueue",
         "source":"mqueue",
         "options":[
            "nosuid",
            "noexec",
            "nodev"
         
]
      
},
      {
         "destination":"/etc/resolv.conf",
         "type":"bind",
         "source":"/var/run/containers/storage/aufs-containers/1d3498689c569473c6e0c905abaf0e705efe11243e053a2cc71dbd605eee9d88/userdata/resolv.conf",
         "options":[
            "bind",
            "private"
         
]
      
},
      {
         "destination":"/etc/hosts",
         "type":"bind",
         "source":"/var/run/containers/storage/aufs-containers/1d3498689c569473c6e0c905abaf0e705efe11243e053a2cc71dbd605eee9d88/userdata/hosts",
         "options":[
            "bind",
            "private"
         
]
      
},
      {
         "destination":"/dev/shm",
         "type":"bind",
         "source":"/var/lib/containers/storage/aufs-containers/1d3498689c569473c6e0c905abaf0e705efe11243e053a2cc71dbd605eee9d88/userdata/shm",
         "options":[
            "bind",
            "private"
         
]
      
},
      {
         "destination":"/etc/hostname",
         "type":"bind",
         "source":"/var/run/containers/storage/aufs-containers/1d3498689c569473c6e0c905abaf0e705efe11243e053a2cc71dbd605eee9d88/userdata/hostname",
         "options":[
            "bind",
            "private"
         
]
      
},
      {
         "destination":"/run/.containerenv",
         "type":"bind",
         "source":"/var/run/containers/storage/aufs-containers/1d3498689c569473c6e0c905abaf0e705efe11243e053a2cc71dbd605eee9d88/userdata/.containerenv",
         "options":[
            "bind",
            "private"
         
]
      
},
      {
         "destination":"/sys/fs/cgroup",
         "type":"cgroup",
         "source":"cgroup",
         "options":[
            "rprivate",
            "nosuid",
            "noexec",
            "nodev",
            "relatime",
            "ro"
         
]
      
}
   
],
   "annotations":{
      "io.container.manager":"libpod",
      "io.kubernetes.cri-o.Created":"2020-05-24T10:29:21.571932948+02:00",
      "io.kubernetes.cri-o.TTY":"true",
      "io.podman.annotations.autoremove":"TRUE",
      "io.podman.annotations.init":"FALSE",
      "io.podman.annotations.privileged":"FALSE",
      "io.podman.annotations.publish-all":"FALSE",
      "org.opencontainers.image.stopSignal":"15"
   
},
   "linux":{
      "resources":{
         "devices":[
            {
               "allow":false,
               "access":"rwm"
            
}
         
],
         "memory":{
            "limit":1073741824,
            "swap":2147483648,
            "disableOOMKiller":false
         
},
         "cpu":{
            "quota":100000,
            "period":100000
         
},
         "pids":{
            "limit":4096
         
}
      
},
      "cgroupsPath":"machine.slice:libpod:1d3498689c569473c6e0c905abaf0e705efe11243e053a2cc71dbd605eee9d88",
      "namespaces":[
         {
            "type":"pid"
         
},
         {
            "type":"network",
            "path":"/var/run/netns/cni-e21d40f8-6b16-fd55-07fe-b82b80c8a071"
         
},
         {
            "type":"ipc"
         
},
         {
            "type":"uts"
         
},
         {
            "type":"mount"
         
},
         {
            "type":"cgroup"
         
}
      
],
      "seccomp":{
         "defaultAction":"SCMP_ACT_ERRNO",
         "architectures":[
            "SCMP_ARCH_X86_64",
            "SCMP_ARCH_X86",
            "SCMP_ARCH_X32"
         
],
         "syscalls":[...]
      
},
      "maskedPaths":[
         "/proc/acpi",
         "/proc/kcore",
         "/proc/keys",
         "/proc/latency_stats",
         "/proc/timer_list",
         "/proc/timer_stats",
         "/proc/sched_debug",
         "/proc/scsi",
         "/sys/firmware",
         "/sys/fs/selinux"
      
],
      "readonlyPaths":[
         "/proc/asound",
         "/proc/bus",
         "/proc/fs",
         "/proc/irq",
         "/proc/sys",
         "/proc/sysrq-trigger"
      
]
   
}
}
An example of the config.json file without memory limitation (see linux.resources.memory)
{
   "ociVersion":"1.0.1-dev",
   "process":{
      "terminal":true,
      "user":{
         "uid":0,
         "gid":0,
         "additionalGids":[
            0,
            1,
            2,
            3,
            4,
            6,
            10,
            11,
            20,
            26,
            27
         
]
      
},
      "args":[
         "sh"
      
],
      "env":[
         "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
         "TERM=xterm",
         "HOSTNAME=618b302f1e2c",
         "container=podman",
         "HOME=/root"
      
],
      "cwd":"/",
      "capabilities":{
         "bounding":[
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
         
],
         "effective":[
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
         
],
         "inheritable":[
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
         
],
         "permitted":[
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
         
],
         "ambient":[
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
         
]
      
},
      "rlimits":[
         {
            "type":"RLIMIT_NOFILE",
            "hard":1048576,
            "soft":1048576
         
},
         {
            "type":"RLIMIT_NPROC",
            "hard":32768,
            "soft":32768
         
}
      
],
      "oomScoreAdj":0
   
},
   "root":{
      "path":"/var/lib/containers/storage/aufs/mnt/87f43a4d823ebc7bb385944aa1761d5ec6093e98e667a13f67a2f208996de12e"
   
},
   "hostname":"618b302f1e2c",
   "mounts":[
      {
         "destination":"/proc",
         "type":"proc",
         "source":"proc",
         "options":[
            "nosuid",
            "noexec",
            "nodev"
         
]
      
},
      {
         "destination":"/dev",
         "type":"tmpfs",
         "source":"tmpfs",
         "options":[
            "nosuid",
            "strictatime",
            "mode=755",
            "size=65536k"
         
]
      
},
      {
         "destination":"/sys",
         "type":"sysfs",
         "source":"sysfs",
         "options":[
            "nosuid",
            "noexec",
            "nodev",
            "ro"
         
]
      
},
      {
         "destination":"/dev/pts",
         "type":"devpts",
         "source":"devpts",
         "options":[
            "nosuid",
            "noexec",
            "newinstance",
            "ptmxmode=0666",
            "mode=0620",
            "gid=5"
         
]
      
},
      {
         "destination":"/dev/mqueue",
         "type":"mqueue",
         "source":"mqueue",
         "options":[
            "nosuid",
            "noexec",
            "nodev"
         
]
      
},
      {
         "destination":"/etc/resolv.conf",
         "type":"bind",
         "source":"/var/run/containers/storage/aufs-containers/618b302f1e2c2b7e6c0d88f6a68b7385c73004689f4c860b50406758f2fe653f/userdata/resolv.conf",
         "options":[
            "bind",
            "private"
         
]
      
},
      {
         "destination":"/etc/hosts",
         "type":"bind",
         "source":"/var/run/containers/storage/aufs-containers/618b302f1e2c2b7e6c0d88f6a68b7385c73004689f4c860b50406758f2fe653f/userdata/hosts",
         "options":[
            "bind",
            "private"
         
]
      
},
      {
         "destination":"/dev/shm",
         "type":"bind",
         "source":"/var/lib/containers/storage/aufs-containers/618b302f1e2c2b7e6c0d88f6a68b7385c73004689f4c860b50406758f2fe653f/userdata/shm",
         "options":[
            "bind",
            "private"
         
]
      
},
      {
         "destination":"/etc/hostname",
         "type":"bind",
         "source":"/var/run/containers/storage/aufs-containers/618b302f1e2c2b7e6c0d88f6a68b7385c73004689f4c860b50406758f2fe653f/userdata/hostname",
         "options":[
            "bind",
            "private"
         
]
      
},
      {
         "destination":"/run/.containerenv",
         "type":"bind",
         "source":"/var/run/containers/storage/aufs-containers/618b302f1e2c2b7e6c0d88f6a68b7385c73004689f4c860b50406758f2fe653f/userdata/.containerenv",
         "options":[
            "bind",
            "private"
         
]
      
},
      {
         "destination":"/sys/fs/cgroup",
         "type":"cgroup",
         "source":"cgroup",
         "options":[
            "rprivate",
            "nosuid",
            "noexec",
            "nodev",
            "relatime",
            "ro"
         
]
      
}
   
],
   "annotations":{
      "io.container.manager":"libpod",
      "io.kubernetes.cri-o.Created":"2020-05-24T10:38:46.60949562+02:00",
      "io.kubernetes.cri-o.TTY":"true",
      "io.podman.annotations.autoremove":"TRUE",
      "io.podman.annotations.init":"FALSE",
      "io.podman.annotations.privileged":"FALSE",
      "io.podman.annotations.publish-all":"FALSE",
      "org.opencontainers.image.stopSignal":"15"
   
},
   "linux":{
      "resources":{
         "devices":[
            {
               "allow":false,
               "access":"rwm"
            
}
         
],
         "memory":{
            "disableOOMKiller":false
         
},
         "cpu":{
            "quota":100000,
            "period":100000
         
},
         "pids":{
            "limit":4096
         
}
      
},
      "cgroupsPath":"machine.slice:libpod:618b302f1e2c2b7e6c0d88f6a68b7385c73004689f4c860b50406758f2fe653f",
      "namespaces":[
         {
            "type":"pid"
         
},
         {
            "type":"network",
            "path":"/var/run/netns/cni-5c2ac273-08c9-663e-8dbc-5d333ee21a8c"
         
},
         {
            "type":"ipc"
         
},
         {
            "type":"uts"
         
},
         {
            "type":"mount"
         
},
         {
            "type":"cgroup"
         
}
      
],
      "seccomp":{
         "defaultAction":"SCMP_ACT_ERRNO",
         "architectures":[
            "SCMP_ARCH_X86_64",
            "SCMP_ARCH_X86",
            "SCMP_ARCH_X32"
         
],
         "syscalls":[ ...]
      
},
      "maskedPaths":[
         "/proc/acpi",
         "/proc/kcore",
         "/proc/keys",
         "/proc/latency_stats",
         "/proc/timer_list",
         "/proc/timer_stats",
         "/proc/sched_debug",
         "/proc/scsi",
         "/sys/firmware",
         "/sys/fs/selinux"
      
],
      "readonlyPaths":[
         "/proc/asound",
         "/proc/bus",
         "/proc/fs",
         "/proc/irq",
         "/proc/sys",
         "/proc/sysrq-trigger"
      
]
   
}
}

@afbjorklund
Copy link
Contributor

afbjorklund commented May 24, 2020

We have this issue with Minikube on Ubuntu as well, there is a difference between Docker and Podman. As a workaround, we have to disable all memory limits (not only swap) for Podman...

i.e. with docker we get a warning, but with podman it is an error. It's acceptable for the memory-swap limit to not be enforced, but it is unfortunate that we also have to disable memory limit without it.

Here is our workaround, in some Go code:

       runArgs = append(runArgs, fmt.Sprintf("--cpus=%s", p.CPUs))

        memcgSwap := true
        if runtime.GOOS == "linux" {
                if _, err := os.Stat("/sys/fs/cgroup/memory/memsw.limit_in_bytes"); os.IsNotExist(err) {
                        // requires CONFIG_MEMCG_SWAP_ENABLED or cgroup_enable=memory in grub
                        glog.Warning("Your kernel does not support swap limit capabilities or the cgroup is not mounted.")
                        memcgSwap = false
                }
        }

        if p.OCIBinary == Podman && memcgSwap { // swap is required for memory
                runArgs = append(runArgs, fmt.Sprintf("--memory=%s", p.Memory))
        }
        if p.OCIBinary == Docker { // swap is only required for --memory-swap
                runArgs = append(runArgs, fmt.Sprintf("--memory=%s", p.Memory))
        }

And here is the command line tool output:

 docker run -m 1024m busybox echo OK
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
OK
 sudo podman run -m 1024m busybox echo OK
Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap.
Error: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:415: setting cgroup config for procHooks process caused \\\"failed to write \\\\\\\"-1\\\\\\\" to \\\\\\\"/sys/fs/cgroup/memory/libpod_parent/libpod-88e5f84f450c2d7721b41b31adfc974a6209aea5e35802f5079c77bcb5967bdc/memory.memsw.limit_in_bytes\\\\\\\": open /sys/fs/cgroup/memory/libpod_parent/libpod-88e5f84f450c2d7721b41b31adfc974a6209aea5e35802f5079c77bcb5967bdc/memory.memsw.limit_in_bytes: permission denied\\\"\"": OCI runtime permission denied error

@afbjorklund
Copy link
Contributor

In the original code it was unconditionally disabled, due to some confusion with rootless:

 podman run -m 1024m busybox echo OK
Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap.
Error: invalid configuration, cannot specify resource limits without cgroups v2 and --cgroup-manager=systemd

That is , forgot to run sudo podman when verifying if --cpus and --memory worked...

So unfortunately this was not reported upstream (cc @medyagh)

kubernetes/minikube@398c7ef

And then we just did the workaround.

kubernetes/minikube@7647b1f

@richpeaua
Copy link
Author

@geverartsdev That config.json provided great information, thank you. I took the liberty in doing something similar with docker to see the difference, but instead of memory limited vs non-limited, here is the output of hostconfig.json for memory-limited w/ swap enabled and w/out swap enabled.

As you can see, Docker is able to detect and conditionally change limiting from swap+mem to just mem (and outputting the swap limit warning seen in @afbjorklund posts).

So the next steps would be to adjust the code in podman to do the same detect and switch (also seen in @afbjorklund post)

hostconfig.json w/ swap limiting enabled (MemorySwap)
{
  "Binds": null,
  "ContainerIDFile": "",
  "LogConfig": {
    "Type": "json-file",
    "Config": {}
  },
  "NetworkMode": "default",
  "PortBindings": {},
  "RestartPolicy": {
    "Name": "no",
    "MaximumRetryCount": 0
  },
  "AutoRemove": false,
  "VolumeDriver": "",
  "VolumesFrom": null,
  "CapAdd": null,
  "CapDrop": null,
  "Capabilities": null,
  "Dns": [],
  "DnsOptions": [],
  "DnsSearch": [],
  "ExtraHosts": null,
  "GroupAdd": null,
  "IpcMode": "private",
  "Cgroup": "",
  "Links": null,
  "OomScoreAdj": 0,
  "PidMode": "",
  "Privileged": false,
  "PublishAllPorts": false,
  "ReadonlyRootfs": false,
  "SecurityOpt": null,
  "UTSMode": "",
  "UsernsMode": "",
  "ShmSize": 67108864,
  "Runtime": "runc",
  "ConsoleSize": [
    0,
    0
  ],
  "Isolation": "",
  "CpuShares": 0,
  "Memory": 41943040,
  "NanoCpus": 0,
  "CgroupParent": "",
  "BlkioWeight": 0,
  "BlkioWeightDevice": [],
  "BlkioDeviceReadBps": null,
  "BlkioDeviceWriteBps": null,
  "BlkioDeviceReadIOps": null,
  "BlkioDeviceWriteIOps": null,
  "CpuPeriod": 0,
  "CpuQuota": 0,
  "CpuRealtimePeriod": 0,
  "CpuRealtimeRuntime": 0,
  "CpusetCpus": "",
  "CpusetMems": "",
  "Devices": [],
  "DeviceCgroupRules": null,
  "DeviceRequests": null,
  "KernelMemory": 0,
  "KernelMemoryTCP": 0,
  "MemoryReservation": 0,
  "MemorySwap": 83886080,
  "MemorySwappiness": null,
  "OomKillDisable": false,
  "PidsLimit": null,
  "Ulimits": null,
  "CpuCount": 0,
  "CpuPercent": 0,
  "IOMaximumIOps": 0,
  "IOMaximumBandwidth": 0,
  "MaskedPaths": [
    "/proc/asound",
    "/proc/acpi",
    "/proc/kcore",
    "/proc/keys",
    "/proc/latency_stats",
    "/proc/timer_list",
    "/proc/timer_stats",
    "/proc/sched_debug",
    "/proc/scsi",
    "/sys/firmware"
  ],
  "ReadonlyPaths": [
    "/proc/bus",
    "/proc/fs",
    "/proc/irq",
    "/proc/sys",
    "/proc/sysrq-trigger"
  ]
}
hostconfig.json w/out swap limiting enabled (MemorySwap)
{
  "Binds": null,
  "ContainerIDFile": "",
  "LogConfig": {
    "Type": "json-file",
    "Config": {}
  },
  "NetworkMode": "default",
  "PortBindings": {},
  "RestartPolicy": {
    "Name": "no",
    "MaximumRetryCount": 0
  },
  "AutoRemove": false,
  "VolumeDriver": "",
  "VolumesFrom": null,
  "CapAdd": null,
  "CapDrop": null,
  "Capabilities": null,
  "Dns": [],
  "DnsOptions": [],
  "DnsSearch": [],
  "ExtraHosts": null,
  "GroupAdd": null,
  "IpcMode": "private",
  "Cgroup": "",
  "Links": null,
  "OomScoreAdj": 0,
  "PidMode": "",
  "Privileged": false,
  "PublishAllPorts": false,
  "ReadonlyRootfs": false,
  "SecurityOpt": null,
  "UTSMode": "",
  "UsernsMode": "",
  "ShmSize": 67108864,
  "Runtime": "runc",
  "ConsoleSize": [
    0,
    0
  ],
  "Isolation": "",
  "CpuShares": 0,
  "Memory": 52428800,
  "NanoCpus": 0,
  "CgroupParent": "",
  "BlkioWeight": 0,
  "BlkioWeightDevice": [],
  "BlkioDeviceReadBps": null,
  "BlkioDeviceWriteBps": null,
  "BlkioDeviceReadIOps": null,
  "BlkioDeviceWriteIOps": null,
  "CpuPeriod": 0,
  "CpuQuota": 0,
  "CpuRealtimePeriod": 0,
  "CpuRealtimeRuntime": 0,
  "CpusetCpus": "",
  "CpusetMems": "",
  "Devices": [],
  "DeviceCgroupRules": null,
  "DeviceRequests": null,
  "KernelMemory": 0,
  "KernelMemoryTCP": 0,
  "MemoryReservation": 0,
  "MemorySwap": -1,
  "MemorySwappiness": null,
  "OomKillDisable": false,
  "PidsLimit": null,
  "Ulimits": null,
  "CpuCount": 0,
  "CpuPercent": 0,
  "IOMaximumIOps": 0,
  "IOMaximumBandwidth": 0,
  "MaskedPaths": [
    "/proc/asound",
    "/proc/acpi",
    "/proc/kcore",
    "/proc/keys",
    "/proc/latency_stats",
    "/proc/timer_list",
    "/proc/timer_stats",
    "/proc/sched_debug",
    "/proc/scsi",
    "/sys/firmware"
  ],
  "ReadonlyPaths": [
    "/proc/bus",
    "/proc/fs",
    "/proc/irq",
    "/proc/sys",
    "/proc/sysrq-trigger"
  ]
}

@mheon
Copy link
Member

mheon commented May 26, 2020

I know where we set the swap limit in the code (I was the one who added it to Podman v2.0), so if there's an easy way to resolve if the host supports swap limiting, it should be easy enough to add.

@mheon
Copy link
Member

mheon commented May 26, 2020

@giuseppe Is this something we can throw in the libpod cgroups package?

@richpeaua
Copy link
Author

@mheon Awesome. When swap limiting is disabled, the cgroup resource control file to control swap+mem, memsw.limit_in_bytes, is not present. The code provided by @afbjorklund looks for the presence of that control file as a condition.

i.e

if runtime.GOOS == "linux" {
       if _, err := os.Stat("/sys/fs/cgroup/memory/memsw.limit_in_bytes"); os.IsNotExist(err) {
       }
}

For the fix. We would need to switch from using memsw.limit_in_bytes to memory.limit_in_bytes

@geverartsdev
Copy link

@richpeaua Just note that this would only work for cgroupv1, not cgroupv2. But the same principle can be applied to cgroupv2 (looking for file /sys/fs/cgroup/system.slice/memory.swap.* or /sys/fs/cgroup/user.slice/memory.swap.*, depending on whether we are launching rootless container or not)

@geverartsdev
Copy link

For the fix. We would need to switch from using memsw.limit_in_bytes to memory.limit_in_bytes

Why? This file is actually present even when no control over swap is supported

@mheon
Copy link
Member

mheon commented May 26, 2020

It doesn't look like that file check works - it's likely fine for cgroups v1, but my F31 laptop on v2 does not have it (and definitely supports swap limits)

@afbjorklund
Copy link
Contributor

afbjorklund commented May 26, 2020

I'm not sure if kubernetes supports v2 yet, so we just tell people to turn it off (or use a VM, if needed)

We really need some better warnings/suggestions, for people running with SELinux and Cgroups V2

@richpeaua
Copy link
Author

richpeaua commented May 26, 2020

@geverartsdev Yes you're correct. Those are cgroupv1 control files. So maybe we can also use this to future proof as well by adding checks and conditions for v2?

Regarding the presence of the memsw file, if you look at my first post concerning the issue I posted the differences in the swap limit enabled and disabled directories (albeit one from Debian and the other from Centos8) but as you can see memsw.limit_in_bytes is not present in swap limit disabled. And the workaround used by @afbjorklund for minikube takes that into account as well.

The potential fix would be something like (psuedo code).

if memsw.limit_in_bytes present:
   use memsw.limit_in_bytes
else:
   use mem.limit_in_bytes

Or something along those lines.

@mheon I would suggest trying this out on a Debian/Ubuntu machine if you can. I haven't run into this issue on any RPM distro due to most, if not all, having swap limiting enabled by default.

@geverartsdev
Copy link

geverartsdev commented May 26, 2020

@richpeaua Right, I think I now see what you meant by saying "switch from using memsw.limit_in_bytes to memory.limit_in_bytes".

The thing is, if my understaning of the situation is correct, Podman does not have to deal with those files directly, this is actually the underlying container runtime (runc, crun, ...) which will, through the mean of systemd or cgroupfs, edit those files accordingly. What I showed you in the config.json files is the input given to those container runtimes, they will see that we require some resource management for the container to create, and they will take care of it for us. So the only thing that podman would have to do in order to avoid your problem out of the box, is to not ask to the container runtime to setup such limit that the system does not support. The relevance of finding a good detection mechanism doesn't change is the same.
Are we on the same page? Maybe you already got all of this but I wasn't sure by reading your last comment. Sorry if it was redundant information then.

Alternatively the container runtime could also detect this error and decide to ignore the requirement of the user, but this is another debate.

@richpeaua
Copy link
Author

richpeaua commented May 26, 2020

@geverartsdev You're right. That would fall to the runtime... and I believe you're correct again in stating that the fix would lie, not in changing the control file to use, but what is written to the config.json files.

This can be seen in a post I made with the docker version of config.json, hostconfig.json. There was a difference in the MemorySwap field between swap-limit-enabled and swap-limit-disabled. The MemorySwap field with limits disabled is -1.

Interestingly, the error outputted by podman is that it is attempting to write -1 to the memsw.limit_in_byte file. It seems that the intent is similar but the implementation is bugged on the podman side. The docker config.json is fairly different from the podman config.json but, if I'm not mistaken, both use runc. I'm wondering if the docker config file is parsed/processed differently and thus uses that -1 to flag disabled swap-limits (thus effectively changing the actual config to exclude something like the linux.resources.memory.swap field in podman config.json) as opposed to writing -1 to a file.

@richpeaua
Copy link
Author

@mheon can we perform the check on the presence of the memsw.limit_in_byt and change the fields added to config.json file? The field in question would be linux.resources.memory.swap.

For example, if the file is not present then exclude that field and have the value of the -m flag change only the linux.resources.memory.limit field?

@mheon
Copy link
Member

mheon commented May 26, 2020

As I said before, we cannot gate on the presence of that file, because it does not work for both cgroups v1 and v2. This may not be relevant for Minikube, but Podman has to support both cgroup versions.

With a reliable testing function, yes, this is a trivial change.

@geverartsdev
Copy link

Seems like the Kata-Containers folks had a discussion about this (not exactly the same problem but related), this might give some ideas kata-containers/packaging#104
They show where Docker prompt that gentle warning (https://github.com/moby/moby/blob/5fc12449d830ae9005138fb3d3782728fa8d137a/daemon/daemon_unix.go#L368), maybe starting from there one can find what Docker does to detect the eventual missing support for swap... Though there is a high chance that it would only work for cgroupv1, and that another trick would still need to be found for cgroupv2.
Another idea would be to check whether we can rely on the values of CONFIG_MEMCG_SWAP and CONFIG_MEMCG_SWAP_ENABLED.

@mheon How bad would it be to use two different tricks for cgroupv1 and cgroupv2? As long as in the end it works for both of course. Btw, do you know what does podman do to detect the current cgroup version?

@mheon
Copy link
Member

mheon commented May 26, 2020

We can already detect whether we're on a v1 or v2 system, and we already have a viable way to detect whether swap limits are available on v1, so we just need a way to detect whether support is available on v2 and we should be good.

@geverartsdev
Copy link

we already have a viable way to detect whether swap limits are available on v1

Ow, didn't know that, what is it? Couldn't this inspire something similar for v2?

@richpeaua
Copy link
Author

richpeaua commented May 26, 2020

FYI the kernel config settings CONFIG_MEMCG_SWAP_ENABLED and CONFIG_MEMCG_SWAP don't reliably indicate whether swap limiting is enabled or not.

Just tested this on Debian and Ubuntu machines. Setting both to =y did not enable swap limiting. Only explicitly adding cgroup_enable=memory swapaccount=1 to /etc/default/grub enabled it... and doing so did not update the two kernel config settings.

They do, however, indicate if swap limiting is enabled by default (which is the case with the RPM distros).

@afbjorklund
Copy link
Contributor

As I said before, we cannot gate on the presence of that file, because it does not work for both cgroups v1 and v2. This may not be relevant for Minikube, but Podman has to support both cgroup versions.

I guess ultimately Kubernetes will support v2, and distributions (and even docker) will start using it. And then it would of course be nice if the user was able to run crio-in-podman with minikube on those.

But since that is not yet the case, we require our Fedora users to either disable cgroups v2 or to run a KVM VM (with cgroups v1). As described elsewhere, this is more of a chicken and egg problem...

kubernetes/minikube#6795 (comment)

https://medium.com/nttlabs/cgroup-v2-596d035be4d7

As long as nobody uses v2, nothing will support it. And when nothing supports it, nobody will use it.
So it's the same situation as described in https://bugzilla.redhat.com/show_bug.cgi?id=1746355


It would still be nice if podman was able to limit memory, when memory swap limits are unavailable.

And even better if this also worked with cgroups v2, or at least when back to failing in the old way.

We can already detect whether we're on a v1 or v2 system, and we already have a viable way to detect whether swap limits are available on v1, so we just need a way to detect whether support is available on v2 and we should be good.

Assuming that they are available when on v2 would probably work out (and only check on v1).
Or are there any Debian distros (the ones with the flag) that have started using cgroups v2 yet ?

Currently it happily assumes that they are available on both, even when not needed, and errors out.
So if it looked for the flag situation and avoided swap limits, it would improve things on cgroups v1.

@mheon
Copy link
Member

mheon commented May 26, 2020

I think this is perfectly reasonable to do for us to support limiting memory only when swap limits aren't available, but on the Podman side, I need to make sure this does work on both v1 and v2 environments.

My current theory is that, if we can identify the cgroup that the current process is part of (alternatively we can use system.slice but that introduces a dependency on systemd I would prefer to avoid), we can check /sys/fs/cgroup for memsw.* cgroup controllers (in v1) and memory.swap.* controllers (in v2), on the assumption that they are missing if the controller is not enabled.

@richpeaua
Copy link
Author

richpeaua commented May 26, 2020

@mheon It seems that rootless podman run container cgroups are in the user.slice and root podman containers are in machine.slice. A way to find which cgroup the container belongs to is run a find -follow (-follow flag to follow sim links) on the full container ID outputted on container creation or the shortened container ID outputed by podman ps

i.e

sudo find /sys/fs/cgroup/ -follow -name '*<CONTAINER-ID>*.scope'

Here are some outputs from some containers I've run.

podman run -d hello-world sleep 120[ROOTLESS]:

[vagrant@fedora32 ~]$ podman ps
CONTAINER ID  IMAGE                            COMMAND    CREATED        STATUS            PORTS  NAMES
d77d1ee54a66  docker.io/library/alpine:latest  sleep 120  8 seconds ago  Up 8 seconds ago         quizzical_shannon

[vagrant@fedora32 ~]$ sudo find /sys/fs/cgroup/ -follow -name '*d77d1ee54a66*.scope'
/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/user.slice/libpod-conmon-d77d1ee54a6654513f0264e1024e33c27eb8d2268676e0a1b8514cd324a07b35.scope
/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/user.slice/libpod-d77d1ee54a6654513f0264e1024e33c27eb8d2268676e0a1b8514cd324a07b35.scope

sudo podman run -d hello-world sleep 120 [ROOT]:

[vagrant@fedora32 ~]$ sudo podman run -m=40m -d alpine sleep 120
74686d427638e4574a2e8d73c42eb8067c244bd0f476a6cc4f3b33372baa484c

[vagrant@fedora32 ~]$ sudo podman ps
CONTAINER ID  IMAGE                            COMMAND    CREATED        STATUS            PORTS  NAMES
74686d427638  docker.io/library/alpine:latest  sleep 120  4 seconds ago  Up 4 seconds ago         gracious_galileo

[vagrant@fedora32 ~]$ sudo find /sys/fs/cgroup/ -follow -name '*74686d427638e4574a2e8d73c42eb8067c244bd0f476a6cc4f3b33372baa484c*'
/sys/fs/cgroup/machine.slice/libpod-74686d427638e4574a2e8d73c42eb8067c244bd0f476a6cc4f3b33372baa484c.scope
/sys/fs/cgroup/machine.slice/libpod-conmon-74686d427638e4574a2e8d73c42eb8067c244bd0f476a6cc4f3b33372baa484c.scope

@giuseppe
Copy link
Member

memory swap is always present on cgroup v2, so we don't have to worry.

For cgroup v1, we need something like cri-o/cri-o@7bdf938

If someone is interested in fixing it and opens a PR, I can review it

@geverartsdev
Copy link

geverartsdev commented May 27, 2020

memory swap is always present on cgroup v2, so we don't have to worry.

On my machine, if I don't add swapaccount=1 in the grub configuration (where I set systemd.unified_cgroup_hierarchy=1), I will be lacking swap support... (and podman will fail with -m option)

Edit: But given the awful amount of problems I currently have with cgroupv2, this might need to be confirmed by someone else

@mheon
Copy link
Member

mheon commented Jun 2, 2020

I'll try and get to this one this week

@richpeaua
Copy link
Author

richpeaua commented Jun 23, 2020

@mheon Any update on the this?

@mheon mheon self-assigned this Jun 23, 2020
@mheon
Copy link
Member

mheon commented Jun 23, 2020

Not yet - fell through the cracks (I forgot to self-assign so I didn't see it when I reviewed open issues - oops). I'll try and get to it tomorrow.

@richpeaua
Copy link
Author

@mheon Not a worries! Thank you!

@mheon
Copy link
Member

mheon commented Jul 10, 2020

I haven't double-checked, but I'm fairly certain this was fixed by the 2.0 release, which checks sysinfo to verify swap limits are possible

@rhatdan
Copy link
Member

rhatdan commented Jul 13, 2020

Please reopen if @mheon is mistaken.

@jerboaa
Copy link

jerboaa commented Sep 22, 2020

I haven't double-checked, but I'm fairly certain this was fixed by the 2.0 release, which checks sysinfo to verify swap limits are possible

@rhatdan @mheon It doesn't look to me it's possible to run containers with a memory limit, but without swap accounting (swapaccount=0) on cgroups v2. So it sounds like this issue should get reopened?

# grep cgroup /proc/self/mountinfo 
29 22 0:26 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:4 - cgroup2 cgroup2 rw,seclabel,nsdelegate
# rpm -q podman
podman-2.0.6-1.fc32.x86_64
# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.8.9-200.fc32.x86_64 root=/dev/mapper/fedora_localhost--live-root ro resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet swapaccount=0
# podman run -it --rm --memory=512m --memory-swap=512m fedora:32 /bin/bash
Error: opening file `memory.swap.max` for writing: Permission denied: OCI runtime permission denied error

@jerboaa
Copy link

jerboaa commented Sep 22, 2020

Using sysinfo doesn't seem sufficient. It's only working in conjunction with swapon/swapoff:

# cat test_sysinfo.c 
#include <sys/sysinfo.h>
#include <stdio.h>

int main(int argc, char** args) {
   struct sysinfo foo;
   int retval = sysinfo(&foo); 
   if (retval != 0) {
     printf("sysinfo failed.\n");
   }
   printf("total swap: %ld\n", foo.totalswap);
}
# gcc -Wall test_sysinfo.c
# grep swapaccount= /proc/cmdline 
# ./a.out
total swap: 3221221376
# swapoff -a
# podman run -it --rm --memory=512m --memory-swap=512m fedora:32 /bin/bash
[root@a8fcd41bfcbf /]# exit
# ./a.out
total swap: 0

Would it be reasonable to check /proc/cmdline for swapaccount=0? I guess this wouldn't account for CONFIG_MEMCG_SWAP disabled in the kernel, but that's less likely in the wild.

@jerboaa
Copy link

jerboaa commented Sep 22, 2020

I don't have permissions to reopen this issue, fwiw.

@mheon mheon reopened this Sep 22, 2020
@bobvandette
Copy link

Why can't you just verify that the file memory.swap.max exists before trying to write it?
There's logic in sysinfo_linux.go that is already verifying memory.memsw.limit_in_bytes to see if SwapLimit is enabled. I assume this is only used on cgroupv1 systems.

@rhatdan
Copy link
Member

rhatdan commented Sep 22, 2020

Podman does not currently have logic for this, and it could get complicated supporting CGroups V1 and V2.

@jerboaa
Copy link

jerboaa commented Oct 1, 2020

podman behaves different to docker (or moby-engine) on systems with swapaccount=0:

podman (either with runc or crun):

# grep swapaccount= /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.6-300.fc32.x86_64 root=/dev/mapper/fedora_localhost--live-root ro resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet systemd.unified_cgroup_hierarchy=0 swapaccount=0
# podman run --rm -it --memory-reservation=200m --memory=300m --memory-swap=300m fedora:32 /bin/bash
Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap.
Error: writing file `memory.memsw.limit_in_bytes`: Permission denied: OCI runtime permission denied error

docker (with same runc as podman above):

# docker run --rm -it --memory-reservation=200m --memory=300m --memory-swap=300m fedora:32 /bin/bash
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
[root@c6e9bd4ea745 /]#

Also note that docker info detects the swap limit issue and prints WARNING: No swap limit support at the end. podman info on the other hand has this (seems to come from sysinfo?):

# podman info | grep Swap
  SwapFree: 3221221376
  SwapTotal: 3221221376
# grep swapaccount=
# grep swapaccount= /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.6-300.fc32.x86_64 root=/dev/mapper/fedora_localhost--live-root ro resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet systemd.unified_cgroup_hierarchy=0 swapaccount=0

@rhatdan
Copy link
Member

rhatdan commented Oct 30, 2020

@giuseppe WDYT?

@giuseppe
Copy link
Member

giuseppe commented Oct 30, 2020

Podman 2.0 seems to behave differently:

]# podman --runtime /usr/bin/runc run --rm -it --memory-reservation=200m --memory=300m --memory-swap=300m fedora echo hi
Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap.
hi
# podman --runtime /usr/bin/crun run --rm -it --memory-reservation=200m --memory=300m --memory-swap=300m fedora echo hiYour kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap.
hi

Is this the desired behaviour?

Even if compatible with what Docker does, I am not sure we should ignore a configuration that was explicitly set by the user. For cgroup v2 we don't have this check in place (even if swapaccount=0 can still be set)

@giuseppe
Copy link
Member

opened a PR to add the same check on cgroup v2: #8197

@jerboaa
Copy link

jerboaa commented Oct 30, 2020

Podman 2.0 seems to behave differently:

]# podman --runtime /usr/bin/runc run --rm -it --memory-reservation=200m --memory=300m --memory-swap=300m fedora echo hi
Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap.
hi
# podman --runtime /usr/bin/crun run --rm -it --memory-reservation=200m --memory=300m --memory-swap=300m fedora echo hiYour kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap.
hi

Is this the desired behaviour?

I believe so, yes.

For the record, it seems podman 1.8.2 had the issue described above (cgroup v1, swapaccount=0):

$ sudo podman --runtime /usr/bin/crun run --rm -it --memory=300m fedora:32 /bin/bash
Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap.
Error: opening file `memory.memsw.limit_in_bytes` for writing: Permission denied: OCI runtime permission denied error

podman 2.1.1 works (cgroup v1, swapaccount=0):

$ sudo podman --runtime /usr/bin/crun run --rm -it --memory=300m fedora:32 /bin/bash
Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap.
[root@ca40ed983374 /]#

@rhatdan
Copy link
Member

rhatdan commented Oct 31, 2020

This is fixed in master.

@rhatdan rhatdan closed this as completed Oct 31, 2020
@jerboaa
Copy link

jerboaa commented Nov 2, 2020

Thanks all!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

9 participants