Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change cgroup driver from cgroupfs to systemd #6651

Merged
merged 1 commit into from
Feb 23, 2020

Conversation

afbjorklund
Copy link
Collaborator

The minikube iso is using systemd, so change the container runtime
to use the same cgroup manager instead of the default (cgroupfs).

Avoids kubeadm init message:

    [WARNING IsDockerSystemdCheck]:
        detected "cgroupfs" as the Docker cgroup driver.
        The recommended driver is "systemd".
        Please follow the guide at https://kubernetes.io/docs/setup/cri/

Also change the configuration for the containerd and cri-o runtimes.

Closes #4770

The minikube iso is using systemd, so change the container runtime
to use the same cgroup manager instead of the default (cgroupfs).

Avoids kubeadm init message:
    [WARNING IsDockerSystemdCheck]:
        detected "cgroupfs" as the Docker cgroup driver.
        The recommended driver is "systemd".

Also change the configuration for the containerd and cri-o runtimes.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 15, 2020
@afbjorklund afbjorklund requested a review from medyagh February 15, 2020 11:55
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 15, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afbjorklund

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 15, 2020
@medyagh
Copy link
Member

medyagh commented Feb 16, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 16, 2020
@minikube-pr-bot
Copy link

Error: running mkcmp: exit status 1

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented Feb 16, 2020

Seems like the crio restart is taking a really long time to complete.
Which is weird, since it wasn't running and nothing much changed...

Two minutes, just for the restart ?

I0216 09:21:31.255945   11439 ssh_runner.go:101] Run: sudo sysctl net.netfilter.nf_conntrack_count
I0216 09:21:31.265389   11439 ssh_runner.go:265] ! sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_count: No such file or directory
I0216 09:21:31.265679   11439 cruntime.go:172] couldn't verify netfilter by "sudo sysctl net.netfilter.nf_conntrack_count" which might be okay. error: sudo sysctl net.netfilter.nf_conntrack_count: Process exited with status 255
stdout:

stderr:
sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_count: No such file or directory
I0216 09:21:31.265824   11439 ssh_runner.go:101] Run: sudo modprobe br_netfilter
I0216 09:21:31.319895   11439 ssh_runner.go:101] Run: sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward"
I0216 09:21:31.331856   11439 ssh_runner.go:101] Run: sudo systemctl restart crio
I0216 09:23:31.699791   11439 ssh_runner.go:141] Completed: sudo systemctl restart crio: (2m0.36787738s)
I0216 09:23:31.700109   11439 ssh_runner.go:101] Run: crio --version
I0216 09:23:31.751381   11439 ssh_runner.go:265] > crio version 1.17.0

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented Feb 16, 2020

Apparently systemd thinks that the network is offline, and crio service depends on it.

Feb 16 08:21:31 minikube systemd[1]: Starting Wait for Network to be Configured.

Feb 16 08:21:31 minikube systemd-networkd-wait-online[2345]: ignoring: lo

Feb 16 08:21:31 minikube systemd[1]: Starting CRI-O Auto Update Script...
Feb 16 08:21:32 minikube systemd[1]: Started CRI-O Auto Update Script.

Feb 16 08:23:32 minikube systemd[1]: [[0;1;39m[[0;1;31m[[0;1;39msystemd-networkd
-wait-online.service: Main process exited, code=exited, status=1/FAILURE[[0m
Feb 16 08:23:32 minikube systemd[1]: [[0;1;39m[[0;1;31m[[0;1;39msystemd-networkd
-wait-online.service: Failed with result 'exit-code'.[[0m
Feb 16 08:23:32 minikube systemd[1]: [[0;1;39m[[0;1;31m[[0;1;39msystemd-networkd
-wait-online.service: Failed with result 'exit-code'.[[0m
Feb 16 08:23:32 minikube systemd[1]: [[0;1;31m[[0;1;39m[[0;1;31mFailed to start 
Wait for Network to be Configured.[[0m

Feb 16 08:23:32 minikube systemd[1]: Reached target Network is Online.

Feb 16 08:23:32 minikube systemd[1]: Starting Container Runtime Interface for OC
I (CRI-O)...
Feb 16 08:23:32 minikube systemd[1]: Started Container Runtime Interface for OCI
 (CRI-O).

So we are hitting a 120 second systemd timeout, before it gives up on the service...

static bool arg_quiet = false;
static usec_t arg_timeout = 120 * USEC_PER_SEC;

@afbjorklund
Copy link
Collaborator Author

Apparently systemd is either buggy, or needs to be informed better about eth0 and eth1:

$ sudo networkctl
IDX LINK             TYPE               OPERATIONAL SETUP     
  1 lo               loopback           carrier     unmanaged 
  2 eth0             ether              [[0;1;32mroutable   [[0m [[0;1;33mconfig
uring[[0m
  3 eth1             ether              [[0;1;32mroutable   [[0m [[0;1;33mconfig
uring[[0m
  4 sit0             sit                off         unmanaged 
  5 mybridge         bridge             [[0;1;32mroutable   [[0m unmanaged 
  6 veth11d292ea     ether              carrier     unmanaged 
  7 vethf775b924     ether              carrier     unmanaged 

Possibly related to not liking the DHCP server much:

Feb 16 08:21:23 minikube systemd-networkd[2033]: eth0: Gained carrier
Feb 16 08:21:23 minikube systemd-networkd[2033]: eth1: DHCPv4 address 192.168.99
.100/24
Feb 16 08:21:23 minikube systemd-networkd[2033]: eth0: DHCPv4 address 10.0.2.15/
24 via 10.0.2.2
Feb 16 08:21:23 minikube systemd-networkd[2033]: eth1: DHCP: No gateway received
 from DHCP server: No data available

Possible workarounds: systemd/systemd#5154

@afbjorklund
Copy link
Collaborator Author

Opened #6655 about the startup being slow, I think that was the reason for the test failures ?

@afbjorklund afbjorklund self-assigned this Feb 16, 2020
@medyagh
Copy link
Member

medyagh commented Feb 16, 2020

@afbjorklund the kic docker tests are 70 min. They usually run in 20 min.

In podman also we explicitly specify the cgroups to be cgroups.

Does that mean we need to keep separate logic for vm and contrainers?

Could we make everything use same type of cgroups?

@afbjorklund
Copy link
Collaborator Author

Could we make everything use same type of cgroups?

As far as I know, kubernetes recommends using the same cgroup manager as the host OS.
But I heard there was some issues with it when running docker-in-docker, so I'm not sure...

Knee deep in systemd bugs already, even before it was trying to run inside a container.
But there should be some resources online, on how it can be achieved - maybe Red Hat ?

@medyagh
Copy link
Member

medyagh commented Feb 17, 2020

Could we make everything use same type of cgroups?

As far as I know, kubernetes recommends using the same cgroup manager as the host OS.
But I heard there was some issues with it when running docker-in-docker, so I'm not sure...

Knee deep in systemd bugs already, even before it was trying to run inside a container.
But there should be some resources online, on how it can be achieved - maybe Red Hat ?

I wonder if there is a correlation between his PR and the docker tests running in 70 minutes (more than 3 times than normal times which is 20 mins)

@afbjorklund
Copy link
Collaborator Author

@medyagh : note that this PR only changes deploy/iso/minikube-iso

@afbjorklund
Copy link
Collaborator Author

@medyagh : did you find the issue for the slowdown ? probably not anything on the ISO, right ?

For me it seems like the "CI / docker_*" and "CI / podman_*" tests are always failing (timeout)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch cgroup driver, from cgroupfs to systemd
4 participants