Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kic base image: install stable containerd and clean up packages #9135

Closed
wants to merge 2 commits into from

Conversation

afbjorklund
Copy link
Collaborator

@afbjorklund afbjorklund commented Aug 31, 2020

Remove local things from kind, replace with packages.

Use the "clean-install" script for a nicer Dockerfile.

Fixes #8767


Here is the diff, from container-diff:

-----File-----

These entries have been added to gcr.io/k8s-minikube/kicbase:v0.0.12-snapshot3:
FILE                                        SIZE
/usr/bin/crictl                             20.8M
/var/lib/dpkg/info/cri-tools.list           263B
/var/lib/dpkg/info/cri-tools.md5sums        325B

These entries have been deleted from gcr.io/k8s-minikube/kicbase:v0.0.12-snapshot3:
FILE                                          SIZE
/usr/local/bin/containerd                     52.8M
/usr/local/bin/containerd-shim                7.1M
/usr/local/bin/containerd-shim-runc-v2        8.7M
/usr/local/bin/crictl                         27.1M
/usr/local/bin/ctr                            26.3M
/usr/local/sbin/runc                          9.4M

These entries have been changed between gcr.io/k8s-minikube/kicbase:v0.0.12-snapshot3 and local/kicbase:v0.0.12-snapshot3-snapshot:
FILE                                                          SIZE1        SIZE2
/usr/lib/cri-o-runc/sbin/runc                                 9.4M         7.1M
/opt/cni/bin/dhcp                                             8.6M         8.9M
/opt/cni/bin/firewall                                         4.2M         4.2M
/opt/cni/bin/bridge                                           3.2M         3.2M
/opt/cni/bin/ptp                                              3.2M         3.2M
/opt/cni/bin/macvlan                                          3M           3.1M
/opt/cni/bin/ipvlan                                           3M           3M
/opt/cni/bin/vlan                                             3M           3M
/opt/cni/bin/bandwidth                                        2.9M         2.9M
/opt/cni/bin/host-device                                      2.9M         2.9M
/opt/cni/bin/portmap                                          2.7M         2.7M
/opt/cni/bin/host-local                                       2.5M         2.5M
/opt/cni/bin/sbr                                              2.3M         2.5M
/opt/cni/bin/tuning                                           2.3M         2.5M
/opt/cni/bin/loopback                                         2.2M         2.4M
/opt/cni/bin/flannel                                          2.1M         2.4M
/opt/cni/bin/static                                           2M           2.2M
/var/lib/dpkg/status                                          187K         187.3K
/var/lib/dpkg/status-old                                      187K         187.3K
/usr/share/containers/containers.conf                         12.7K        12.7K
/var/cache/ldconfig/aux-cache                                 9.8K         9.8K
/var/lib/apt/extended_states                                  4.8K         4.8K
/etc/ssh/ssh_host_rsa_key                                     2.5K         2.5K
/var/lib/dpkg/info/containernetworking-plugins.md5sums        1K           1K
/etc/shadow-                                                  772B         772B
/etc/shadow                                                   772B         772B
/etc/ssh/ssh_host_rsa_key.pub                                 571B         571B
/etc/ssh/ssh_host_ecdsa_key                                   513B         513B
/etc/ssh/ssh_host_ed25519_key                                 411B         411B
/var/lib/dpkg/info/containers-common.md5sums                  300B         300B
/etc/ssh/ssh_host_ecdsa_key.pub                               179B         179B
/etc/ssh/ssh_host_ed25519_key.pub                             99B          99B

Remove local things from kind, replace with packages.

Use the "clean-install" script for a nicer Dockerfile.
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 31, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afbjorklund

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 31, 2020
Copy link
Contributor

@tstromberg tstromberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @medyagh - what do you think?

@afbjorklund afbjorklund changed the title Clean up the docker installation Clean up the kicbase installation Aug 31, 2020
@afbjorklund
Copy link
Collaborator Author

afbjorklund commented Aug 31, 2020

A future "optimization" would be to merge the kindbase and the kicbase, to avoid having to first install containerd only to remove it again. We talked about this earlier, but then we didn't want to fork KIND - then again they didn't want to split their "base" either*

* into one base, and one containerd

Much more details about it in #7788

@medyagh
Copy link
Member

medyagh commented Aug 31, 2020

I am re-running the kic base test to see why the failure rate is high.

@medyagh medyagh changed the title Clean up the kicbase installation kic base image: clean up pre-installed packages Aug 31, 2020
@medyagh medyagh changed the title kic base image: clean up pre-installed packages kic base image: install stable containerd and clean up packages Aug 31, 2020
@afbjorklund
Copy link
Collaborator Author

docker@minikube:~$ /usr/local/bin/containerd --version
containerd github.com/containerd/containerd v1.3.3-14-g449e9269 449e926990f8539fd00844b26c07e2f1e306c760
docker@minikube:~$ /usr/bin/containerd --version
containerd github.com/containerd/containerd 1.3.3-0ubuntu2 
docker@minikube:~$ /usr/local/bin/crictl --version
crictl version v1.18.0
docker@minikube:~$ /usr/bin/crictl --version
crictl version unknown
docker@minikube:~$ /usr/local/sbin/runc --version
runc version 1.0.0-rc10
spec: 1.0.1-dev
docker@minikube:~$ /usr/sbin/runc --version
runc version spec: 1.0.1-dev

Package versions:

containerd/focal,now 1.3.3-0ubuntu2 amd64 [installed,automatic]
cri-tools/unknown,now 1.17.0~3 amd64 [installed]
runc/focal,now 1.0.0~rc10-0ubuntu1 amd64 [installed]

@priyawadhwa
Copy link

Hey @afbjorklund a lot of the kic image tests are still failing -- could it be because of the change in versions?

@afbjorklund
Copy link
Collaborator Author

Hey @afbjorklund a lot of the kic image tests are still failing -- could it be because of the change in versions?

Possibly, although the cri-tools version was the only one that actually changed (from 1.18 to 1.17).

I don't think that any of the patches that kind did on top of containerd 1.3.3 would have that impact ?

  • e71c7d0d bugfix: cleanup dangling shim by brand new context
  • a2d1cbf6 Update .mailmap with changes from master
  • de5b1b83 script: use github.com/kubernetes-sigs/cri-tools directly
  • 6a341644 Update Golang 1.12.17
  • 9a428a3c Fix incorrect comment from copy/paste of starting script
  • 09b3b4fc Set octet-stream content-type on put request
  • 37b9a347 Improve host fallback behaviour in docker remote

https://github.com/containerd/containerd/compare/v1.3.3..v1.3.3-14-g449e9269

And there are lots of 1.3.x releases available since, even if docker 19.03.12 still uses containerd 1.2.13.

@tstromberg
Copy link
Contributor

As best as I can tell, the failure is emitted by sudo systemctl -f restart docker after the docker config is updated. stderr shows:

Job for docker.service canceled.

Looking at https://serverfault.com/questions/936220/what-could-cause-a-systemd-service-stop-to-end-with-job-being-canceled - this seems to occur when there is a dependency between a service and an unavailable service -- in particular with BindsTo

It sort of sounds like containerd may not be healthy?

@tstromberg
Copy link
Contributor

This is easily repeatable locally:

docker build -t kicbase:experiment deploy/kicbase; minikube delete; minikube start --base-image=kicbase:experiment --driver=docker

Results in:

Job for docker.service canceled.

sudo journalctl -xe -u docker shows:

-- Logs begin at Fri 2020-09-25 18:24:55 UTC, end at Fri 2020-09-25 18:32:30 UTC. --
Sep 25 18:24:56 minikube systemd[1]: docker.service: Bound to unit containerd.service, but unit isn't active.
Sep 25 18:24:56 minikube systemd[1]: Dependency failed for Docker Application Container Engine.
-- Subject: A start job for unit docker.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit docker.service has finished with a failure.
-- 
-- The job identifier is 51 and the job result is dependency.
Sep 25 18:24:56 minikube systemd[1]: docker.service: Job docker.service/start failed with result 'dependency'.
Sep 25 18:25:03 minikube systemd[1]: Starting Docker Application Container Engine...
-- Subject: A start job for unit docker.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit docker.service has begun execution.
-- 
-- The job identifier is 260.
Sep 25 18:25:03 minikube systemd[1]: docker.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit docker.service has successfully entered the 'dead' state.
Sep 25 18:25:03 minikube systemd[1]: Stopped Docker Application Container Engine.
-- Subject: A stop job for unit docker.service has finished
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A stop job for unit docker.service has finished.
-- 
-- The job identifier is 344 and the job result is done.

sudo journalctl -xe -u containerd shows:

-- Logs begin at Fri 2020-09-25 18:24:55 UTC, end at Fri 2020-09-25 18:33:03 UTC. --
Sep 25 18:30:28 minikube systemd[1]: containerd.service: Main process exited, code=exited, status=203/EXEC
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- An ExecStart= process belonging to unit containerd.service has exited.
-- 
-- The process' exit code is 'exited' and its exit status is 203.
Sep 25 18:30:28 minikube systemd[1]: containerd.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- The unit containerd.service has entered the 'failed' state with result 'exit-code'.
Sep 25 18:30:29 minikube systemd[1]: containerd.service: Scheduled restart job, restart counter is at 271.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- Automatic restarting of the unit containerd.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Sep 25 18:30:29 minikube systemd[1]: Stopped containerd container runtime.
-- Subject: A stop job for unit containerd.service has finished
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A stop job for unit containerd.service has finished.
-- 
-- The job identifier is 10910 and the job result is done.
Sep 25 18:30:29 minikube systemd[1]: Starting containerd container runtime...
-- Subject: A start job for unit containerd.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit containerd.service has begun execution.
-- 
-- The job identifier is 10910.
Sep 25 18:30:29 minikube systemd[1]: Started containerd container runtime.
-- Subject: A start job for unit containerd.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- A start job for unit containerd.service has finished successfully.
-- 
-- The job identifier is 10910.
Sep 25 18:30:29 minikube systemd[2315]: containerd.service: Failed to execute command: No such file or directory
Sep 25 18:30:29 minikube systemd[2315]: containerd.service: Failed at step EXEC spawning /usr/local/bin/containerd: No such file or directory

@tstromberg
Copy link
Contributor

containerd state is stuck on:

containerd.service loaded activating start-pre start containerd container runtime

I think the location of containerd changed from /usr/local/bin/containerd to /usr/bin/containerd with this PR:

$ sudo dpkg-query -L containerd
/.
/lib
/lib/systemd
/lib/systemd/system
/lib/systemd/system/containerd.service
/usr
/usr/bin
/usr/bin/containerd
/usr/bin/containerd-shim
/usr/bin/containerd-shim-runc-v1
/usr/bin/containerd-shim-runc-v2
/usr/bin/containerd-stress
/usr/bin/ctr
/usr/share
/usr/share/doc
/usr/share/doc/containerd
/usr/share/doc/containerd/README.md.gz
/usr/share/doc/containerd/SECURITY_AUDIT.pdf.gz
/usr/share/doc/containerd/changelog.Debian.gz
/usr/share/doc/containerd/client-opts.md
/usr/share/doc/containerd/copyright
/usr/share/doc/containerd/garbage-collection.md.gz
/usr/share/doc/containerd/getting-started.md.gz
/usr/share/doc/containerd/managed-opt.md.gz
/usr/share/doc/containerd/namespaces.md
/usr/share/doc/containerd/ops.md.gz
/usr/share/doc/containerd/rootless.md
/usr/share/doc/containerd/stream_processors.md
/usr/share/man
/usr/share/man/man1
/usr/share/man/man1/containerd-config.1.gz
/usr/share/man/man1/containerd.1.gz
/usr/share/man/man1/ctr.1.gz
/usr/share/man/man5

@tstromberg
Copy link
Contributor

sudo systemctl cat containerd.service shows that the unit still points to /usr/local/bin/containerd:

# /etc/systemd/system/containerd.service
# derived containerd systemd service file from the official:
# https://github.com/containerd/containerd/blob/master/containerd.service
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target
# disable rate limiting
StartLimitIntervalSec=0

[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
Restart=always
RestartSec=1

Delegate=yes
KillMode=process
Restart=always
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=1048576
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity

[Install]
WantedBy=multi-user.target

@tstromberg
Copy link
Contributor

tstromberg commented Sep 25, 2020

/etc/systemd/system/containerd.service - the active file - points to /usr/local/bin/containerd

/usr/lib/systemd/system/containerd.service - installed by the containerd package, seems to point to the correct location.

I hope this helps!

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented Sep 27, 2020

/etc/systemd/system/containerd.service - the active file - points to /usr/local/bin/containerd

/usr/lib/systemd/system/containerd.service - installed by the containerd package, seems to point to the correct location.

I hope this helps!

It does, only removed the symlink and not the actual configuration file

Normal installations don't clobber /etc, but install it under /usr/local/lib

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 30, 2020
@k8s-ci-robot
Copy link
Contributor

@afbjorklund: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@afbjorklund
Copy link
Collaborator Author

Included in PR #9330

@afbjorklund afbjorklund closed this Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace upstream binaries with packages in the kindbase image
5 participants