-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fedora 31: "podman start" hangs #4621
Comments
Running with or without root? |
Can't repro as root, Fedora 30, v1.6.2. Looking at the code, it looks like |
Interestingly, we're not printing the args we're calling Conmon with... We might not be getting there? |
Aha, I think we're stuck somewhere in mounting... Can you try a |
(For reference, I think we may be running into a deadlock between the container lock and the volume lock - if the volume and container were created in a pre-1.6.x version, the volume was not originally assigned a lock number, so is now defaulting to 0) |
After a restart, pods and containers both run a refresh() function to prepare to run after a reboot. Until now, volumes have not had a similar function, because they had no per-boot setup to perform. Unfortunately, this was not noticed when in-memory locking was introduced to volumes. The refresh() routine is, among other things, responsible for ensuring that locks are reserved after a reboot, ensuring they cannot be taken by a freshly-created container, pod, or volume. If this reservation is not done, we can end up with two objects using the same lock, potentially needing to lock each other for some operations - classic recipe for deadlocks. Add a refresh() function to volumes to perform lock reservation and ensure it is called as part of overall refresh(). Fixes containers#4605 Fixes containers#4621 Signed-off-by: Matthew Heon <[email protected]>
Running with root. |
[root@xroads irlanet]# podman system renumber |
After a restart, pods and containers both run a refresh() function to prepare to run after a reboot. Until now, volumes have not had a similar function, because they had no per-boot setup to perform. Unfortunately, this was not noticed when in-memory locking was introduced to volumes. The refresh() routine is, among other things, responsible for ensuring that locks are reserved after a reboot, ensuring they cannot be taken by a freshly-created container, pod, or volume. If this reservation is not done, we can end up with two objects using the same lock, potentially needing to lock each other for some operations - classic recipe for deadlocks. Add a refresh() function to volumes to perform lock reservation and ensure it is called as part of overall refresh(). Fixes containers#4605 Fixes containers#4621 Signed-off-by: Matthew Heon <[email protected]>
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Steps to reproduce the issue:
podman create
--name squid-ip-pub
--ip 10.88.13.24
--dns 10.88.13.4
--expose 3128
--hostname squid-ip-pub
--mount type=volume,src=squid-ip-config,dst=/etc/config
--mount type=volume,src=squid-ip-pub-config,dst=/etc/squid
--restart always
vimagick/squid
podman --log-level=debug start --attach squid-ip-pub
Describe the results you received:
The output from the above two commands is
6f7d3cd28e52b45aa4fbd7a4a8badd544ff488dc701c5e799fb08ed8a44c2844
DEBU[0000] using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Using volume path /var/lib/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] cached value indicated that overlay is supported
DEBU[0000] cached value indicated that metacopy is being used
DEBU[0000] cached value indicated that native-diff is not being used
WARN[0000] Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled
DEBU[0000] backingFs=extfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=true
DEBU[0000] Initializing event backend journald
DEBU[0000] using runtime "/usr/bin/runc"
DEBU[0000] using runtime "/usr/bin/crun"
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
INFO[0000] Found CNI network podnet1 (type=bridge) at /etc/cni/net.d/podnet1.conflist
DEBU[0000] Not attaching to stdin
DEBU[0000] Made network namespace at /var/run/netns/cni-12272194-73cc-47c2-0dbd-3a4872fbe6d0 for container 6f7d3cd28e52b45aa4fbd7a4a8badd544ff488dc701c5e799fb08ed8a44c2844
INFO[0000] Got pod network &{Name:squid-ip-pub Namespace:squid-ip-pub ID:6f7d3cd28e52b45aa4fbd7a4a8badd544ff488dc701c5e799fb08ed8a44c2844 NetNS:/var/run/netns/cni-12272194-73cc-47c2-0dbd-3a4872fbe6d0 Networks:[podman] RuntimeConfig:map[podman:{IP:10.88.13.24 PortMappings:[] Bandwidth: IpRanges:[]}]}
INFO[0000] About to add CNI network cni-loopback (type=loopback)
INFO[0000] Got pod network &{Name:squid-ip-pub Namespace:squid-ip-pub ID:6f7d3cd28e52b45aa4fbd7a4a8badd544ff488dc701c5e799fb08ed8a44c2844 NetNS:/var/run/netns/cni-12272194-73cc-47c2-0dbd-3a4872fbe6d0 Networks:[podman] RuntimeConfig:map[podman:{IP:10.88.13.24 PortMappings:[] Bandwidth: IpRanges:[]}]}
INFO[0000] About to add CNI network podman (type=bridge)
DEBU[0000] overlay: mount_data=nodev,metacopy=on,lowerdir=/var/lib/containers/storage/overlay/l/JGO6DESSVVX5NJZUVCVQVLGGEW:/var/lib/containers/storage/overlay/l/57Q4RCONHFROQV5V4GHVEONIRH,upperdir=/var/lib/containers/storage/overlay/6f9470335241b21b46e0de3704e469f9e388aea997935d3671987d80fe340aef/diff,workdir=/var/lib/containers/storage/overlay/6f9470335241b21b46e0de3704e469f9e388aea997935d3671987d80fe340aef/work
DEBU[0000] mounted container "6f7d3cd28e52b45aa4fbd7a4a8badd544ff488dc701c5e799fb08ed8a44c2844" at "/var/lib/containers/storage/overlay/6f9470335241b21b46e0de3704e469f9e388aea997935d3671987d80fe340aef/merged"
DEBU[0000] [0] CNI result: Interfaces:[{Name:cni-podman0 Mac:0a:69:bc:8e:a1:08 Sandbox:} {Name:veth3ed1af4b Mac:ae:a8:ca:a6:cb:41 Sandbox:} {Name:eth0 Mac:06:6e:a2:1c:82:3f Sandbox:/var/run/netns/cni-12272194-73cc-47c2-0dbd-3a4872fbe6d0}], IP:[{Version:4 Interface:0xc000619ac8 Address:{IP:10.88.13.24 Mask:ffff0000} Gateway:10.88.0.1}], Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:}], DNS:{Nameservers:[] Domain: Search:[] Options:[]}
Describe the results you expected:
I expected the container to start running and not hang.
Additional information you deem important (e.g. issue happens only occasionally):
This happens reliably.
Output of
podman version
Version: 1.6.2
RemoteAPI Version: 1
Go Version: go1.13.1
OS/Arch: linux/amd64
sion`:
Output of
podman info --debug
:debug:
compiler: gc
git commit: ""
go version: go1.13.1
podman version: 1.6.2
host:
BuildahVersion: 1.11.3
CgroupVersion: v1
Conmon:
package: conmon-2.0.2-1.fc31.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.2, commit: 186a550ba0866ce799d74006dab97969a2107979'
Distribution:
distribution: fedora
version: "31"
MemFree: 18905640960
MemTotal: 33428758528
OCIRuntime:
name: crun
package: crun-0.10.6-1.fc31.x86_64
path: /usr/bin/crun
version: |-
crun version 0.10.6
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
SwapFree: 8589930496
SwapTotal: 8589930496
arch: amd64
cpus: 4
eventlogger: journald
hostname: xroads
kernel: 5.3.12-300.fc31.x86_64
os: linux
rootless: false
uptime: 2h 53m 51.6s (Approximately 0.08 days)
registries:
blocked: null
insecure: null
search:
store:
ConfigFile: /etc/containers/storage.conf
ContainerStore:
number: 45
GraphDriverName: overlay
GraphOptions:
overlay.mountopt: nodev,metacopy=on
GraphRoot: /var/lib/containers/storage
GraphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
ImageStore:
number: 86
RunRoot: /var/run/containers/storage
VolumePath: /var/lib/containers/storage/volumes
Package info (e.g. output of
rpm -q podman
orapt list podman
):podman-1.6.2-2.fc31.x86_64
Additional environment details (AWS, VirtualBox, physical, etc.):
This is fedora 31 machine.
The text was updated successfully, but these errors were encountered: