Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fedora 31 vm-driver=podman fail to start trying to start docker service #6795

Closed
thobianchi opened this issue Feb 25, 2020 · 31 comments
Closed
Labels
co/podman-driver podman driver issues co/runtime/docker Issues specific to a docker runtime kind/support Categorizes issue or PR as a support question. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. long-term-support Long-term support issues that can't be fixed in code os/linux

Comments

@thobianchi
Copy link

thobianchi commented Feb 25, 2020

The exact command to reproduce the issue:
minikube --vm-driver=podman start

The full output of the command that failed:


[root@thomas-work]~# minikube --vm-driver=podman start
😄 minikube v1.7.3 on Fedora 31
✨ Using the podman (experimental) driver based on user configuration
🔥 Creating Kubernetes in podman container with (CPUs=2), Memory=2000MB (15719MB available) ...

💣 Unable to start VM. Please investigate and run 'minikube delete' if possible: creating host: create: provisioning: ssh command error:
command : sudo systemctl -f restart docker
err : Process exited with status 1
output : Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.

😿 minikube is exiting due to an error. If the above message is not useful, open an issue:
👉 https://github.com/kubernetes/minikube/issues/new/choose

The operating system version:

Fedora 31 : 5.5.5-200.fc31.x86_64
minikube version: v1.7.3
commit: 436667c
podman version 1.8.0
Selinux on permissive

It seems that even with podman driver, minikube is trying to restart docker service.

@afbjorklund
Copy link
Collaborator

It seems that even with podman driver, minikube is trying to restart docker service.

The driver uses podman to create a fake VM, in this node there is another container runtime...

@afbjorklund afbjorklund added the co/docker-driver Issues related to kubernetes in container label Feb 25, 2020
@priyawadhwa priyawadhwa added the kind/support Categorizes issue or PR as a support question. label Feb 25, 2020
@medyagh
Copy link
Member

medyagh commented Feb 26, 2020

even though the driver is podman, the run-time is docker,

we recently added support for cri-o runtime (will be in next release)

I think it would be a good default behavior to set cri-o as default run time for podman. since they are built together.

@tstromberg tstromberg added co/podman-driver podman driver issues and removed co/docker-driver Issues related to kubernetes in container labels Mar 18, 2020
@tstromberg
Copy link
Contributor

Could you share the output of minikube logs?

@thobianchi
Copy link
Author

In the meanwhile I updated to 1.9.2

Output of: sudo minikube --vm-driver=podman start

😄  minikube v1.9.2 on Fedora 31
✨  Using the podman (experimental) driver based on existing profile
👍  Starting control plane node m01 in cluster minikube
🚜  Pulling base image ...
E0416 09:44:49.897829    8747 cache.go:114] Error downloading kic artifacts:  error loading image: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
🤦  StartHost failed, but will try again: boot lock: unable to open /tmp/juju-mka00e65579c2b557a802898fd1cf03ec4ab30a1: permission denied

❌  [JUJU_LOCK_DENIED] Failed to start podman container. "minikube start" may fix it. boot lock: unable to open /tmp/juju-mka00e65579c2b557a802898fd1cf03ec4ab30a1: permission denied
💡  Suggestion: Run 'sudo sysctl fs.protected_regular=1', or try a driver which does not require root, such as '--driver=docker'
⁉️   Related issue: https://github.com/kubernetes/minikube/issues/6391

sudo sysctl fs.protected_regular=1 seems to produce no effects, the error is the same.

Minikube logs:

💣  Unable to get machine status: state: "podman inspect -f {{.State.Status}} minikube" failed: exit status 125: Error: error getting image "minikube": unable to find a name and tag match for minikube in repotags: no such image


😿  minikube is exiting due to an error. If the above message is not useful, open an issue:
👉  https://github.com/kubernetes/minikube/issues/new/choose

@afbjorklund
Copy link
Collaborator

@thobianchi : this has been fixed in PR #7631 and other ongoing work on master

But it (podman driver) is not working yet, with the current minikube 1.9.x releases...

I don't think containerd or crio works yet, though.

So initially it will be running docker-in-podman ☺️

@afbjorklund afbjorklund added os/linux co/runtime/docker Issues specific to a docker runtime labels Apr 17, 2020
@afbjorklund
Copy link
Collaborator

The juju stuff is a known bug when running sudo under newer systemd, by the way

See #7053 (see #6391 (comment))

@thobianchi
Copy link
Author

thobianchi commented Apr 18, 2020

With the current master version:

❯ sudo cat /proc/sys/fs/protected_regular
1

❯ sudo ./minikube  --vm-driver=podman start
😄  minikube v1.9.2 on Fedora 31
✨  Using the podman (experimental) driver based on user configuration
👍  Starting control plane node minikube in cluster minikube

❌  [JUJU_LOCK_DENIED] error provisioning host Failed to save config: failed to acquire lock for /root/.minikube/profiles/minikube/config.json: {Name:mk270d1b5db5965f2dc9e9e25770a63417031943 Clock:{} Delay:500ms Timeout:1m0s Cancel:<nil>}: unable to open /tmp/juju-mk270d1b5db5965f2dc9e9e25770a63417031943: permission denied
💡  Suggestion: Run 'sudo sysctl fs.protected_regular=1', or try a driver which does not require root, such as '--driver=docker'
⁉️   Related issue: https://github.com/kubernetes/minikube/issues/6391

❯ sudo ./minikube logs
🤷  There is no local cluster named "minikube"
👉  To fix this, run: "minikube start"

@afbjorklund
Copy link
Collaborator

Fix is not merged yet (WIP). Hopefully it will be included in minikube v1.10, though.

The suggested "fix" looks bogus, since you want to use 0 to avoid the systemd "feature".
https://www.phoronix.com/scan.php?page=news_item&px=Systemd-241-Linux-419-Sysctl

Once merged, we will require that sudo is set up to run podman without a password.

@thobianchi
Copy link
Author

Oh I'm sorry, I was sure that PR was merged. I will look forward the merge.

Yes sudo sysctl fs.protected_regular=0 fixed the juju error

@afbjorklund
Copy link
Collaborator

There is a new version of the podman-sudo branch now, updated to v1.10

@thobianchi
Copy link
Author

thobianchi commented Apr 25, 2020

So I'm trying on commit 947dc21 but there are still errors:

❯ sudo ./minikube  --vm-driver=podman start
😄  minikube v1.10.0-beta.1 on Fedora 31
✨  Using the podman (experimental) driver based on user configuration
👍  Starting control plane node minikube in cluster minikube
🔥  Creating podman container (CPUs=2, Memory=2200MB) ...
🤦  StartHost failed, but will try again: creating host: create: creating: prepare kic ssh: apply authorized_keys file ownership, output 
** stderr ** 
Error: can only create exec sessions on running containers: container state improper

** /stderr **: chown docker:docker /home/docker/.ssh/authorized_keys: exit status 255
stdout:

stderr:
Error: can only create exec sessions on running containers: container state improper

✋  Stopping "minikube" in podman ...
🔥  Deleting "minikube" in podman ...
🔥  Creating podman container (CPUs=2, Memory=2200MB) ...
😿  Failed to start podman container. "minikube start" may fix it: creating host: create: creating: create kic node: check container "minikube" running: temporary error created container "minikube" is not running yet

💣  error provisioning host: Failed to start host: creating host: create: creating: create kic node: check container "minikube" running: temporary error created container "minikube" is not running yet

😿  minikube is exiting due to an error. If the above message is not useful, open an issue:
👉  https://github.com/kubernetes/minikube/issues/new/choose

@afbjorklund
Copy link
Collaborator

So I'm trying on commit 947dc21 but there are still errors:

That looks like wrong commit, was supposed to be 6644c5c

Otherwise it should have complained about your use of sudo...

I updated it again, so the latest available is currently 28106fa

Make sure to use refs/pull/7631/head, and you get the latest


Those errors look temporary, can you delete it explicitly (and make sure it does not exist) ?

Error: can only create exec sessions on running containers: container state improper

check container "minikube" running: temporary error created container "minikube" is not running yet

Can check with sudo podman ps -a and sudo volume ls to make sure that it is cleaned

I think they were both fixed in 22aa1af

Apparently podman status is a bit broken...

@thobianchi
Copy link
Author

I'm sorry.
I checked out your fork, on 28106fa:

❯ out/minikube  --vm-driver=podman start
😄  minikube v1.10.0-beta.1 on Fedora 31
✨  Using the podman (experimental) driver based on user configuration
👍  Starting control plane node minikube in cluster minikube
💾  Downloading Kubernetes v1.18.0 preload ...
    > preloaded-images-k8s-v3-v1.18.0-docker-overlay2-amd64.tar.lz4: 525.45 MiB
🔥  Creating podman container (CPUs=2, Memory=3900MB) ...
🤦  StartHost failed, but will try again: creating host: create: provisioning: ssh command error:
command : sudo diff -u /lib/systemd/system/docker.service /lib/systemd/system/docker.service.new || { sudo mv /lib/systemd/system/docker.service.new /lib/systemd/system/docker.service; sudo systemctl -f daemon-reload && sudo systemctl -f enable docker && sudo systemctl -f restart docker; }
err     : Process exited with status 1
output  : --- /lib/systemd/system/docker.service        2019-08-29 04:42:14.000000000 +0000
+++ /lib/systemd/system/docker.service.new      2020-04-26 14:50:21.515585356 +0000
@@ -8,24 +8,22 @@
 
 [Service]
 Type=notify
-# the default is not to use systemd for cgroups because the delegate issues still
-# exists and systemd currently does not support the cgroup feature set required
-# for containers run by docker
-ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
-ExecReload=/bin/kill -s HUP $MAINPID
-TimeoutSec=0
-RestartSec=2
-Restart=always
-
-# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
-# Both the old, and new location are accepted by systemd 229 and up, so using the old location
-# to make them work for either version of systemd.
-StartLimitBurst=3
-
-# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
-# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
-# this option work for either version of systemd.
-StartLimitInterval=60s
+
+
+
+# This file is a systemd drop-in unit that inherits from the base dockerd configuration.
+# The base configuration already specifies an 'ExecStart=...' command. The first directive
+# here is to clear out that command inherited from the base configuration. Without this,
+# the command from the base configuration and the command specified here are treated as
+# a sequence of commands, which is not the desired behavior, nor is it valid -- systemd
+# will catch this invalid input and refuse to start the service with an error like:
+#  Service has more than one ExecStart= setting, which is only allowed for Type=oneshot services.
+
+# NOTE: default-ulimit=nofile is set to an arbitrary number for consistency with other
+# container runtimes. If left unlimited, it may result in OOM issues with MySQL.
+ExecStart=
+ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --default-ulimit=nofile=1048576:1048576 --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=podman --insecure-registry 10.96.0.0/12 
+ExecReload=/bin/kill -s HUP 
 
 # Having non-zero Limit*s causes performance problems due to accounting overhead
 # in the kernel. We recommend using cgroups to do container-local accounting.
@@ -33,9 +31,10 @@
 LimitNPROC=infinity
 LimitCORE=infinity
 
-# Comment TasksMax if your systemd version does not support it.
-# Only systemd 226 and above support this option.
+# Uncomment TasksMax if your systemd version supports it.
+# Only systemd 226 and above support this version.
 TasksMax=infinity
+TimeoutStartSec=0
 
 # set delegate yes so that systemd does not reset the cgroups of docker containers
 Delegate=yes
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.

✋  Stopping "minikube" in podman ...
🛑  Powering off "minikube" via SSH ...
🔥  Deleting "minikube" in podman ...
🔥  Creating podman container (CPUs=2, Memory=3900MB) ...
😿  Failed to start podman container. "minikube start" may fix it: creating host: create: provisioning: ssh command error:
command : sudo diff -u /lib/systemd/system/docker.service /lib/systemd/system/docker.service.new || { sudo mv /lib/systemd/system/docker.service.new /lib/systemd/system/docker.service; sudo systemctl -f daemon-reload && sudo systemctl -f enable docker && sudo systemctl -f restart docker; }
err     : Process exited with status 1
output  : --- /lib/systemd/system/docker.service        2019-08-29 04:42:14.000000000 +0000
+++ /lib/systemd/system/docker.service.new      2020-04-26 14:50:36.188837518 +0000
@@ -8,24 +8,22 @@
 
 [Service]
 Type=notify
-# the default is not to use systemd for cgroups because the delegate issues still
-# exists and systemd currently does not support the cgroup feature set required
-# for containers run by docker
-ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
-ExecReload=/bin/kill -s HUP $MAINPID
-TimeoutSec=0
-RestartSec=2
-Restart=always
-
-# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
-# Both the old, and new location are accepted by systemd 229 and up, so using the old location
-# to make them work for either version of systemd.
-StartLimitBurst=3
-
-# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
-# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
-# this option work for either version of systemd.
-StartLimitInterval=60s
+
+
+
+# This file is a systemd drop-in unit that inherits from the base dockerd configuration.
+# The base configuration already specifies an 'ExecStart=...' command. The first directive
+# here is to clear out that command inherited from the base configuration. Without this,
+# the command from the base configuration and the command specified here are treated as
+# a sequence of commands, which is not the desired behavior, nor is it valid -- systemd
+# will catch this invalid input and refuse to start the service with an error like:
+#  Service has more than one ExecStart= setting, which is only allowed for Type=oneshot services.
+
+# NOTE: default-ulimit=nofile is set to an arbitrary number for consistency with other
+# container runtimes. If left unlimited, it may result in OOM issues with MySQL.
+ExecStart=
+ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --default-ulimit=nofile=1048576:1048576 --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=podman --insecure-registry 10.96.0.0/12 
+ExecReload=/bin/kill -s HUP 
 
 # Having non-zero Limit*s causes performance problems due to accounting overhead
 # in the kernel. We recommend using cgroups to do container-local accounting.
@@ -33,9 +31,10 @@
 LimitNPROC=infinity
 LimitCORE=infinity
 
-# Comment TasksMax if your systemd version does not support it.
-# Only systemd 226 and above support this option.
+# Uncomment TasksMax if your systemd version supports it.
+# Only systemd 226 and above support this version.
 TasksMax=infinity
+TimeoutStartSec=0
 
 # set delegate yes so that systemd does not reset the cgroups of docker containers
 Delegate=yes
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.


❌  [DOCKER_RESTART_FAILED] error provisioning host Failed to start host: creating host: create: provisioning: ssh command error:
command : sudo diff -u /lib/systemd/system/docker.service /lib/systemd/system/docker.service.new || { sudo mv /lib/systemd/system/docker.service.new /lib/systemd/system/docker.service; sudo systemctl -f daemon-reload && sudo systemctl -f enable docker && sudo systemctl -f restart docker; }
err     : Process exited with status 1
output  : --- /lib/systemd/system/docker.service        2019-08-29 04:42:14.000000000 +0000
+++ /lib/systemd/system/docker.service.new      2020-04-26 14:50:36.188837518 +0000
@@ -8,24 +8,22 @@
 
 [Service]
 Type=notify
-# the default is not to use systemd for cgroups because the delegate issues still
-# exists and systemd currently does not support the cgroup feature set required
-# for containers run by docker
-ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
-ExecReload=/bin/kill -s HUP $MAINPID
-TimeoutSec=0
-RestartSec=2
-Restart=always
-
-# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
-# Both the old, and new location are accepted by systemd 229 and up, so using the old location
-# to make them work for either version of systemd.
-StartLimitBurst=3
-
-# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
-# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
-# this option work for either version of systemd.
-StartLimitInterval=60s
+
+
+
+# This file is a systemd drop-in unit that inherits from the base dockerd configuration.
+# The base configuration already specifies an 'ExecStart=...' command. The first directive
+# here is to clear out that command inherited from the base configuration. Without this,
+# the command from the base configuration and the command specified here are treated as
+# a sequence of commands, which is not the desired behavior, nor is it valid -- systemd
+# will catch this invalid input and refuse to start the service with an error like:
+#  Service has more than one ExecStart= setting, which is only allowed for Type=oneshot services.
+
+# NOTE: default-ulimit=nofile is set to an arbitrary number for consistency with other
+# container runtimes. If left unlimited, it may result in OOM issues with MySQL.
+ExecStart=
+ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --default-ulimit=nofile=1048576:1048576 --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=podman --insecure-registry 10.96.0.0/12 
+ExecReload=/bin/kill -s HUP 
 
 # Having non-zero Limit*s causes performance problems due to accounting overhead
 # in the kernel. We recommend using cgroups to do container-local accounting.
@@ -33,9 +31,10 @@
 LimitNPROC=infinity
 LimitCORE=infinity
 
-# Comment TasksMax if your systemd version does not support it.
-# Only systemd 226 and above support this option.
+# Uncomment TasksMax if your systemd version supports it.
+# Only systemd 226 and above support this version.
 TasksMax=infinity
+TimeoutStartSec=0
 
 # set delegate yes so that systemd does not reset the cgroups of docker containers
 Delegate=yes
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.

💡  Suggestion: Remove the incompatible --docker-opt flag if one was provided
⁉️   Related issue: https://github.com/kubernetes/minikube/issues/7070

Verified that there were not existing container nor volume.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Apr 26, 2020

That's weird, were you able to look at the logs ? (to see what the real docker startup error was)

Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.

Do minikube ssh first. (You can also look at the boot log, with sudo podman logs minikube)

@thobianchi
Copy link
Author

minikube ssh does not work, theres no container created, the start process ends very soon.
for the logs is the same story:

❯ sudo podman logs minikube
Error: no container with name or ID minikube found: no such container

@medyagh
Copy link
Member

medyagh commented May 13, 2020

@thobianchi minikube v1.10.1 includes a lot of fixes for podman driver, do you mind giving it another try?

@thobianchi
Copy link
Author

I think is the same error:

❯ minikube  --vm-driver=podman start
😄  minikube v1.10.1 on Fedora 32
✨  Using the podman (experimental) driver based on user configuration
👍  Starting control plane node minikube in cluster minikube
💾  Downloading Kubernetes v1.18.2 preload ...
    > preloaded-images-k8s-v3-v1.18.2-docker-overlay2-amd64.tar.lz4: 525.43 MiB
🔥  Creating podman container (CPUs=2, Memory=3900MB) ...
✋  Stopping "minikube" in podman ...
🛑  Powering off "minikube" via SSH ...
🔥  Deleting "minikube" in podman ...
🤦  StartHost failed, but will try again: creating host: create: provisioning: ssh command error:
command : sudo diff -u /lib/systemd/system/docker.service /lib/systemd/system/docker.service.new || { sudo mv /lib/systemd/system/docker.service.new /lib/systemd/system/docker.service; sudo systemctl -f daemon-reload && sudo systemctl -f enable docker && sudo systemctl -f restart docker; }
err     : Process exited with status 1
output  : --- /lib/systemd/system/docker.service        2019-08-29 04:42:14.000000000 +0000
+++ /lib/systemd/system/docker.service.new      2020-05-14 21:22:55.123375319 +0000
@@ -8,24 +8,22 @@
 
 [Service]
 Type=notify
-# the default is not to use systemd for cgroups because the delegate issues still
-# exists and systemd currently does not support the cgroup feature set required
-# for containers run by docker
-ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
-ExecReload=/bin/kill -s HUP $MAINPID
-TimeoutSec=0
-RestartSec=2
-Restart=always
-
-# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
-# Both the old, and new location are accepted by systemd 229 and up, so using the old location
-# to make them work for either version of systemd.
-StartLimitBurst=3
-
-# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
-# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
-# this option work for either version of systemd.
-StartLimitInterval=60s
+
+
+
+# This file is a systemd drop-in unit that inherits from the base dockerd configuration.
+# The base configuration already specifies an 'ExecStart=...' command. The first directive
+# here is to clear out that command inherited from the base configuration. Without this,
+# the command from the base configuration and the command specified here are treated as
+# a sequence of commands, which is not the desired behavior, nor is it valid -- systemd
+# will catch this invalid input and refuse to start the service with an error like:
+#  Service has more than one ExecStart= setting, which is only allowed for Type=oneshot services.
+
+# NOTE: default-ulimit=nofile is set to an arbitrary number for consistency with other
+# container runtimes. If left unlimited, it may result in OOM issues with MySQL.
+ExecStart=
+ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --default-ulimit=nofile=1048576:1048576 --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=podman --insecure-registry 10.96.0.0/12 
+ExecReload=/bin/kill -s HUP 
 
 # Having non-zero Limit*s causes performance problems due to accounting overhead
 # in the kernel. We recommend using cgroups to do container-local accounting.
@@ -33,9 +31,10 @@
 LimitNPROC=infinity
 LimitCORE=infinity
 
-# Comment TasksMax if your systemd version does not support it.
-# Only systemd 226 and above support this option.
+# Uncomment TasksMax if your systemd version supports it.
+# Only systemd 226 and above support this version.
 TasksMax=infinity
+TimeoutStartSec=0
 
 # set delegate yes so that systemd does not reset the cgroups of docker containers
 Delegate=yes
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.

🔥  Creating podman container (CPUs=2, Memory=3900MB) ...
😿  Failed to start podman container. "minikube start" may fix it: creating host: create: creating: setting up container node: creating volume for minikube container: sudo -n podman volume create minikube --label name.minikube.sigs.k8s.io=minikube --label created_by.minikube.sigs.k8s.io=true: exit status 125
stdout:

stderr:
Error: volume with name minikube already exists: volume already exists


💣  error provisioning host: Failed to start host: creating host: create: creating: setting up container node: creating volume for minikube container: sudo -n podman volume create minikube --label name.minikube.sigs.k8s.io=minikube --label created_by.minikube.sigs.k8s.io=true: exit status 125
stdout:

stderr:
Error: volume with name minikube already exists: volume already exists


😿  minikube is exiting due to an error. If the above message is not useful, open an issue:
👉  https://github.com/kubernetes/minikube/issues/new/choose

@afbjorklund
Copy link
Collaborator

Looks like the same issue. We need to capture the logs, before the container is torn down.

Possibly you could start the container again (podman start), and try to start docker service ?

But it seems that something goes wrong the first time, the logs for which aren't shown here.

🔥  Creating podman container (CPUs=2, Memory=3900MB) ...
❔ <<<something goes very wrong here>>>
✋  Stopping "minikube" in podman ...
🛑  Powering off "minikube" via SSH ...
🔥  Deleting "minikube" in podman ...
🔥  Creating podman container (CPUs=2, Memory=3900MB) ...

And when it deletes the first container and tries again, there is nothing deleting the volume ?

Error: volume with name minikube already exists: volume already exists

It could probably keep the volume from the first time, and just avoid trying to create it again.

I'm not 100% convinced about the auto-kill feature, it might just as well have stayed down...

@thobianchi
Copy link
Author

🔥  Deleting "minikube" in podman ...

The container is deleted so I can't do podman start. Is too quick the failure to allow me to exec in the container.
Is there a method to launch manually the container?

@afbjorklund
Copy link
Collaborator

afbjorklund commented May 16, 2020

You can see all the logs with minikube -v9 --alsologtostderr start minikube --driver podman
There are other levels inbetween, from -v1. The logs are also available in files under /tmp.

The actual start is something like: sudo -n podman start --cgroup-manager cgroupfs minikube
But for troubleshooting, the code propably needs to be modified in order to not just delete it.

@berenddeschouwer
Copy link

(I'm on F32, minikube 1.10.1, same issue)

volume is created, container isn't.
sudo -n podman volume inspect minikube # works
sudo -n podman inspect minikube # error getting image minikube

The logs seem to indicate that there's nothing between 'volume create' and 'inspect' (sans volume)

sudo logs confirm "podman volume create" then "podman inspect". Is there a missing command?

logs attached, I hope they help.
minikube.log

@afbjorklund
Copy link
Collaborator

There is supposed to be a matching "sudo -n podman run" in there somewhere.

sudo -n podman run --cgroup-manager cgroupfs -d -t --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname podman --name podman --label created_by.minikube.sigs.k8s.io=true --label name.minikube.sigs.k8s.io=podman --label role.minikube.sigs.k8s.io= --label mode.minikube.sigs.k8s.io=podman --volume podman:/var:exec --cpus=2 -e container=podman --expose 8443 --publish=127.0.0.1::8443 --publish=127.0.0.1::22 --publish=127.0.0.1::2376 --publish=127.0.0.1::5000 gcr.io/k8s-minikube/kicbase:v0.0.10

@thobianchi
Copy link
Author

Executed that command I exec'ed into the container:

root@podman:/# systemctl start docker
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
root@podman:/# journalctl -u docker
-- Logs begin at Mon 2020-05-18 16:13:22 UTC, end at Mon 2020-05-18 16:14:09 UTC. --
May 18 16:13:22 podman systemd[1]: Starting Docker Application Container Engine...
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.530216013Z" level=info msg="Starting up"
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.531646252Z" level=info msg="parsed scheme: \"unix\"" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.531668594Z" level=info msg="scheme \"unix\" not registered, fallback to default s
cheme" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.531697042Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///r
un/containerd/containerd.sock 0  <nil>}] }" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.531729967Z" level=info msg="ClientConn switching balancer to \"pick_first\"" modu
le=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.532044061Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc00015
08f0, CONNECTING" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.532051233Z" level=info msg="blockingPicker: the picked transport is not ready, lo
op back to repick" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.533018890Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc00015
08f0, READY" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.533818139Z" level=info msg="parsed scheme: \"unix\"" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.533834422Z" level=info msg="scheme \"unix\" not registered, fallback to default s
cheme" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.533846738Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///r
un/containerd/containerd.sock 0  <nil>}] }" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.533855779Z" level=info msg="ClientConn switching balancer to \"pick_first\"" modu
le=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.533901801Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc00066
2a30, CONNECTING" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.534138403Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc00066
2a30, READY" module=grpc
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.610753605Z" level=warning msg="Your kernel does not support cgroup memory limit"
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.610794989Z" level=warning msg="Unable to find cpu cgroup in mounts"
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.610808601Z" level=warning msg="Unable to find blkio cgroup in mounts"
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.610819665Z" level=warning msg="Unable to find cpuset cgroup in mounts"
May 18 16:13:22 podman dockerd[78]: time="2020-05-18T16:13:22.610829962Z" level=warning msg="mountpoint for pids not found"
May 18 16:13:22 podman dockerd[78]: failed to start daemon: Devices cgroup isn't mounted
May 18 16:13:22 podman systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
May 18 16:13:22 podman systemd[1]: docker.service: Failed with result 'exit-code'.
May 18 16:13:22 podman systemd[1]: Failed to start Docker Application Container Engine.
May 18 16:13:22 podman systemd[1]: docker.service: Consumed 121ms CPU time.
May 18 16:13:24 podman systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart.
May 18 16:13:24 podman systemd[1]: docker.service: Scheduled restart job, restart counter is at 1.
May 18 16:13:24 podman systemd[1]: Stopped Docker Application Container Engine.

I think the error here is failed to start daemon: Devices cgroup isn't mounted

@afbjorklund
Copy link
Collaborator

afbjorklund commented May 18, 2020

Are you trying to run it with cgroups v2 ? Because you need to revert Fedora 31+ to cgroups v1

systemd.unified_cgroup_hierarchy=0 (see https://bugzilla.redhat.com/show_bug.cgi?id=1746355)

@thobianchi
Copy link
Author

oh.. I'm using podman because on Fedora I can't use docker driver with cgroups2. If podman driver has the same dependency as docker I have to continue to use kvm... :(

@afbjorklund
Copy link
Collaborator

You can use the cri-o container runtime instead of the docker runtime, but I don't think that Kubernetes works with cgroups v2 just yet. So I'm not sure it helps in this case.

@afbjorklund
Copy link
Collaborator

I think that we need to extend some of the warnings for the "none" driver #7905, to also cover the "docker" and "podman" drivers. Especially things like these, with kernel and cgroups limitations...

@tstromberg tstromberg added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. long-term-support Long-term support issues that can't be fixed in code and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jul 22, 2020
This was referenced Sep 3, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 19, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/podman-driver podman driver issues co/runtime/docker Issues specific to a docker runtime kind/support Categorizes issue or PR as a support question. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. long-term-support Long-term support issues that can't be fixed in code os/linux
Projects
None yet
Development

No branches or pull requests

8 participants