Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watchtower Fails Recreating Containers that Share a Network Stack #1286

Closed
MeCJay12 opened this issue May 11, 2022 · 9 comments · Fixed by #1429
Closed

Watchtower Fails Recreating Containers that Share a Network Stack #1286

MeCJay12 opened this issue May 11, 2022 · 9 comments · Fixed by #1429

Comments

@MeCJay12
Copy link

MeCJay12 commented May 11, 2022

Hello! I have a VPN container and torrent container that share a network stack (torrent container has --network container:PIA). I'm noticing that whenever something happens to the "parent" VPN container that the torrent container errors out. This can come in a few different ways:

  1. Both are re-created

In this case, it seems that the torrent container is created before the VPN container:

time="2022-05-11T02:43:54Z" level=info msg="Found new linuxserver/qbittorrent:latest image (e3ce01e9d9f9)"
time="2022-05-11T02:44:17Z" level=info msg="Found new linuxserver/sonarr:latest image (e4571e1de8bb)"
time="2022-05-11T02:44:33Z" level=info msg="Found new linuxserver/jackett:latest image (9209598da4bc)"
time="2022-05-11T02:44:39Z" level=info msg="Found new qmcgaw/gluetun:latest image (dc68aaf68f41)"
time="2022-05-11T02:45:04Z" level=info msg="Found new linuxserver/unifi-controller:latest image (b6f250fead08)"
time="2022-05-11T02:45:08Z" level=info msg="Found new guacamole/guacamole:latest image (772b60108cca)"
time="2022-05-11T02:45:15Z" level=info msg="Found new guacamole/guacd:latest image (4969201c0757)"
time="2022-05-11T02:45:17Z" level=info msg="Found new itzg/minecraft-server:latest image (f7c85977cfb8)"
time="2022-05-11T02:45:20Z" level=info msg="Stopping /Guacd (11395aded9fc) with SIGTERM"
time="2022-05-11T02:45:30Z" level=info msg="Stopping /Guac (4401e75cc7cf) with SIGTERM"
time="2022-05-11T02:45:35Z" level=info msg="Stopping /Unifi (68e49fc241d3) with SIGTERM"
time="2022-05-11T02:45:47Z" level=info msg="Stopping /PIA (7f374bb5714f) with SIGTERM"
time="2022-05-11T02:45:49Z" level=info msg="Stopping /Jackett (dc8a5ad9dde7) with SIGTERM"
time="2022-05-11T02:45:53Z" level=info msg="Stopping /Sonarr (340cec5a4766) with SIGTERM"
time="2022-05-11T02:45:58Z" level=info msg="Stopping /Nginx (ec102dfb297d) with SIGTERM"
time="2022-05-11T02:46:08Z" level=info msg="Stopping /qBittorrent (9324abf693e5) with SIGTERM"
time="2022-05-11T02:46:17Z" level=info msg="Creating /qBittorrent"
time="2022-05-11T02:46:17Z" level=error msg="Error response from daemon: No such container: 7f374bb5714f1b081a6da007fea491d4d8eb586ee6541be406b9dbb666bcabef"
time="2022-05-11T02:46:17Z" level=info msg="Creating /Nginx"
time="2022-05-11T02:46:18Z" level=info msg="Creating /Sonarr"
time="2022-05-11T02:46:19Z" level=info msg="Creating /Jackett"
time="2022-05-11T02:46:21Z" level=info msg="Creating /PIA"
time="2022-05-11T02:46:22Z" level=info msg="Creating /Unifi"
time="2022-05-11T02:46:24Z" level=info msg="Creating /Guac"
time="2022-05-11T02:46:25Z" level=info msg="Creating /Guacd"
time="2022-05-11T02:46:27Z" level=info msg="Creating /Minecraft"
time="2022-05-11T02:46:27Z" level=info msg="Session done" Failed=1 Scanned=26 Updated=7 notify=no
  1. Parent container is updated without the child container.

In this case the containers are recreated but the child container looses the connection to the parent container (presumably because the container hash changes).

I understand that a similar issue used to happen with the --link command but that was resolved back in March. This seems to be the same issue just with the new command (--network container:<> replaced --link).

@github-actions
Copy link

Hi there! 👋🏼 As you're new to this repo, we'd like to suggest that you read our code of conduct as well as our contribution guidelines. Thanks a bunch for opening your first issue! 🙏

@ljo123
Copy link

ljo123 commented Jun 12, 2022

I'm having the same issue and cannot get watchtower to stop and start linked containers in the correct order. I've tried:

  1. Not putting any watchtower-specific labels in the containers config and just relied on compose syntax like this:
     depends_on:
      gluetun:
        condition: service_healthy
  1. Using the watchtower label com.centurylinklabs.watchtower.depends-on: parent in each child container
  2. Using the watchtower label com.centurylinklabs.watchtower.depends-on: "child1,child2" in the parent container

In each case, watchtower correctly identifies that there are linked containers but still shuts the parent down first, then the children, then starts up the children (which error out because of the missing parent), then the parent and then deletes all the dangling images which now includes the non-running child images.

Edit: actually I did have some improvement using no. 2 setup above (depends on parent in each child) and using a leading slash in the container name (so label is com.centurylinklabs.watchtower.depends-on: "/gluetun") results in the correct shutdown and startup order.

However, the child containers failed to start with the image not found error. Strangely this occurs before watchtower removes all the dangling images so I'm not entirely sure what the problem is. I'm re-trying with WATCHTOWER_CLEANUP set to false and if that doesn't work, I'll just turn off auto-updates for Gluetun.

@juanra
Copy link

juanra commented Jun 13, 2022

Same thing here, and also with gluetun. I disabled watchtower on those containers for now.

@marcosvrs
Copy link

Same here.
It looks like it's still looking for the old container hash after the new container is created.
Even if we link the containers via com.centurylinklabs.watchtower.depends-on label.

watchtower         | time="2022-07-06T22:34:30+02:00" level=debug msg="container is linked to restarting" linked=/traefik restarting=/cloudflared
watchtower         | time="2022-07-06T22:34:30+02:00" level=debug msg="container is linked to restarting" linked=/jaeger restarting=/cloudflared
watchtower         | time="2022-07-06T22:34:30+02:00" level=debug msg="This is the watchtower container /watchtower"
watchtower         | time="2022-07-06T22:34:41+02:00" level=info msg="Stopping /jaeger (8e6c7b35d73b) with SIGTERM"
watchtower         | time="2022-07-06T22:34:42+02:00" level=debug msg="Removing container 8e6c7b35d73b"
watchtower         | time="2022-07-06T22:34:48+02:00" level=info msg="Stopping /traefik (b2fbecebe9e0) with SIGTERM"
watchtower         | time="2022-07-06T22:34:50+02:00" level=debug msg="Removing container b2fbecebe9e0"
watchtower         | time="2022-07-06T22:34:50+02:00" level=info msg="Stopping /cloudflared (56e4448aee54) with SIGTERM"
watchtower         | time="2022-07-06T22:34:51+02:00" level=debug msg="Removing container 56e4448aee54"
watchtower         | time="2022-07-06T22:34:58+02:00" level=info msg="Creating /cloudflared"
watchtower         | time="2022-07-06T22:34:58+02:00" level=debug msg="Starting container /cloudflared (a2a50d353447)"
watchtower         | time="2022-07-06T22:34:58+02:00" level=info msg="Creating /traefik"
watchtower         | time="2022-07-06T22:34:58+02:00" level=debug msg="Starting container /traefik (357ab2814dcb)"
watchtower         | time="2022-07-06T22:34:58+02:00" level=error msg="Error response from daemon: No such container: 56e4448aee54525c60a2167dec01bb0ee371abb46859c17d8f89d99fbba84574"
watchtower         | time="2022-07-06T22:34:59+02:00" level=info msg="Creating /jaeger"
watchtower         | time="2022-07-06T22:34:59+02:00" level=debug msg="Starting container /jaeger (7f3a418b9346)"
watchtower         | time="2022-07-06T22:34:59+02:00" level=error msg="Error response from daemon: No such container: 56e4448aee54525c60a2167dec01bb0ee371abb46859c17d8f89d99fbba84574"

@piksel
Copy link
Member

piksel commented Jul 8, 2022

Maybe there is a reference to the old container in the config somewhere. Could you post a docker inspect of the jaeger container?
The deepends-on just recreates the containers when one of their dependencies are recreated. If there is some kind of explicit reference to the container ID in their config, it has to be updated as well. Perhaps it's added in the network config? Are you using docker compose?

@marcosvrs
Copy link

marcosvrs commented Jul 8, 2022

Maybe there is a reference to the old container in the config somewhere. Could you post a docker inspect of the jaeger container? The deepends-on just recreates the containers when one of their dependencies are recreated. If there is some kind of explicit reference to the container ID in their config, it has to be updated as well. Perhaps it's added in the network config? Are you using docker compose?

I'm facing exactly what @ljo123 described.
Since I posted my logs, I already recreated the jaeger container. So, it’s hash won't match the log I sent earlier.

docker inspect:

[
    {
        "Id": "7f89de80221494c2fdca6dca286b15eedb3c9af975a51039304528345b63cc2b",
        "Created": "2022-07-08T09:08:08.548100679Z",
        "Path": "/go/bin/all-in-one-linux",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 2243640,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2022-07-08T09:08:08.878087076Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:5011eb6cadf176aa8ca70812a17499e132b985bc203b4e5d566976943cd1eca0",
        "ResolvConfPath": "/var/lib/docker/containers/02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2/hostname",
        "HostsPath": "/var/lib/docker/containers/02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2/hosts",
        "LogPath": "/var/lib/docker/containers/7f89de80221494c2fdca6dca286b15eedb3c9af975a51039304528345b63cc2b/7f89de80221494c2fdca6dca286b15eedb3c9af975a51039304528345b63cc2b-json.log",
        "Name": "/jaeger",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "container:02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "unless-stopped",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": [],
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "host",
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b-init/diff:/var/lib/docker/overlay2/70a39c423ebe007a3cc2e1a2c3fb1c9d60cfc9c3117ac9e3fd50cedc434d1da0/diff:/var/lib/docker/overlay2/35f1782638eaff2f56e39677c3d11f69841decee009d501d6463f6be8873605f/diff:/var/lib/docker/overlay2/28f4b48752d7a3f77cd689c3390463ffb8cad62e8e24904bbde0faf370b4aa28/diff:/var/lib/docker/overlay2/5eedcaf13c27e39123b0277274d15e0d920d152810ae1a959299a0a874e42e1b/diff:/var/lib/docker/overlay2/f98556203bf805f7592608d96e10d84862bf42840852459af978abcdbcd80cfc/diff",
                "MergedDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b/merged",
                "UpperDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b/diff",
                "WorkDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "volume",
                "Name": "5a99f2e145e0e426dae0b6a3d56bce224459cd7c809d24408eb6244c5f75e134",
                "Source": "/var/lib/docker/volumes/5a99f2e145e0e426dae0b6a3d56bce224459cd7c809d24408eb6244c5f75e134/_data",
                "Destination": "/tmp",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "02cea43ed47c",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "14250/tcp": {},
                "14268/tcp": {},
                "16686/tcp": {},
                "5775/udp": {},
                "5778/tcp": {},
                "6831/udp": {},
                "6832/udp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "SAMPLING_STRATEGIES_FILE=/etc/jaeger/sampling_strategies.json"
            ],
            "Cmd": null,
            "Image": "jaegertracing/all-in-one",
            "Volumes": {
                "/tmp": {}
            },
            "WorkingDir": "",
            "Entrypoint": [
                "/go/bin/all-in-one-linux"
            ],
            "OnBuild": null,
            "Labels": {
                "com.centurylinklabs.watchtower.depends-on": "/cloudflared",
                "com.docker.compose.config-hash": "c2b318c7497a806ecdf583ead8cc28d591ad6de32393fb9786f20e7aff6bf188",
                "com.docker.compose.container-number": "1",
                "com.docker.compose.oneoff": "False",
                "com.docker.compose.project": "user",
                "com.docker.compose.project.config_files": "docker-compose.yml",
                "com.docker.compose.project.working_dir": "/home/user",
                "com.docker.compose.service": "jaeger",
                "com.docker.compose.version": "1.29.2",
                "traefik.enable": "True",
                "traefik.http.middlewares.jaegerauth.basicauth.users": "secret:secret",
                "traefik.http.routers.jaeger.middlewares": "jaegerauth@docker",
                "traefik.http.routers.jaeger.rule": "Host(`jaeger.secret.com`)",
                "traefik.http.services.jaeger.loadbalancer.server.port": "16686"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {}
        }
    }
]

docker-compose

version: "3.8"

services:

  cloudflared:
    image: cloudflare/cloudflared
    container_name: cloudflared
    command:
      - tunnel
      - --url=http://localhost:80
      - run
      - --token=secret
    extra_hosts:
      - host.docker.internal:172.177.0.1
    restart: unless-stopped

  jaeger:
    image: jaegertracing/all-in-one
    container_name: jaeger
    # ports:
    #   - 16686:16686
    network_mode: service:cloudflared
    labels:
      com.centurylinklabs.watchtower.depends-on: /cloudflared
      traefik.enable: true
      traefik.http.middlewares.jaegerauth.basicauth.users: secret:secret
      traefik.http.routers.jaeger.middlewares: jaegerauth@docker
      traefik.http.routers.jaeger.rule: Host(`jaeger.secret.com`)
      traefik.http.services.jaeger.loadbalancer.server.port: 16686
    restart: unless-stopped

  traefik:
    image: traefik
    container_name: traefik
    command:
      - --api.dashboard
      - --entrypoints.web.address=:80
      - --entryPoints.web.forwardedHeaders.trustedIPs=127.0.0.1/32
      - --experimental.hub=true
      - --global.checkNewVersion=true
      - --hub.tls.insecure=true
      # - --log.level=DEBUG
      - --metrics.prometheus.addrouterslabels=true
      - --providers.docker
      - --providers.docker.exposedbydefault=false
      - --tracing.jaeger=true
    # ports:
    #   - 8080:8080
    volumes:
      - /run/docker.sock:/var/run/docker.sock:ro
    network_mode: service:cloudflared
    depends_on:
      - jaeger
    labels:
      com.centurylinklabs.watchtower.depends-on: /cloudflared
      traefik.enable: true
      traefik.http.middlewares.traefikauth.basicauth.users: secret:secret
      traefik.http.routers.traefik.middlewares: traefikauth@docker
      traefik.http.routers.traefik.rule: Host(`traefik.secret.com`)
      traefik.http.routers.traefik.service: api@internal
      traefik.http.services.traefik.loadbalancer.server.port: 8080
    restart: unless-stopped

  watchtower:
    image: containrrr/watchtower
    container_name: watchtower
    command:
      - --cleanup
      - --debug
      - --include-restarting
      - --include-stopped
      - --remove-volumes
      - --trace
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /run/docker.sock:/var/run/docker.sock
    restart: unless-stopped

networks:
  default:
    ipam:
      config:
        - subnet: 172.177.0.0/16

@piksel
Copy link
Member

piksel commented Jul 8, 2022

Yeah, that's exactly what I suspected:

"NetworkMode": "container:02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2",

That means that network_mode: service:* is not supported right now. It should be possible to both support it and infer the depends-on from the property though.

Note: NetworkMode: container:CONTAINER_NAME would still work, but docker-compose puts the explicit container ID in the field instead :/

@stunrelay
Copy link

Any workaround for this short of adding monitor-only to the parent container and manually updating the stack periodically?
It would be nice if watchtower could redeploy a whole stack if the parent container needed an update...

@piksel
Copy link
Member

piksel commented Aug 14, 2022

The only workaround is to use another networking mode afaik.

It would be nice, a PR is welcomed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants