coredns pods sometimes fail to start due to trying to bind privileged ports as non-root user #11366

spantaleev · 2024-07-08T14:15:59Z

What happened?

On some of my nodes, coredns Pods (currently using the v1.11.1container image) fail to start with an error:

Listen: listen tcp :53: bind: permission denied

On others, it runs fine.

As far as I could tell, all my nodes are identical (same OS, same kernel version, same containerd version, same sysctl parameter for net.ipv4.ip_unprivileged_port_start = 1024).

I am not sure why binding on privileged ports works as a non-root user on some nodes and not on others.

What did you expect to happen?

I would expect that coredns would reliably run on all my cluster's nodes.

How can we reproduce it (as minimally and precisely as possible)?

Since my Kuberspray config yields working & non-working nodes, I was trying to reproduce the issue in another way.

I've used the following Corefile (inspired by the coredns config map but with the kubernetes plugin disabled):

.:53 {
    errors {
    }
    health {
        lameduck 5s
    }
    ready

    # Disable Kubernetes plugin, as we'll run in a non-Kubernetes context for testing purposes.
    #kubernetes cluster.local in-addr.arpa ip6.arpa {
    #  pods insecure
    #  fallthrough in-addr.arpa ip6.arpa
    #}

    prometheus :9153
    forward . 8.8.8.8 8.8.4.4 {
        prefer_udp
        max_concurrent 1000
    }
    cache 30

    loop
    reload
    loadbalance
}

and I try to run this with:

nerdctl run \
-it \
--rm \
--network=none \
--mount type=bind,src=$(pwd)/Corefile,dst=/etc/coredns/Corefile,ro \
--cap-add=NET_BIND_SERVICE \
registry.k8s.io/coredns/coredns:v1.11.1 \
-conf /etc/coredns/Corefile

On some nodes it works, on others I get the aforementioned error.

It appears that NET_BIND_SERVICE does not do anything.

Workarounds:

adding --sysctl net.ipv4.ip_unprivileged_port_start=0 to the nerdctl run command
- I cannot apply a similar workaround to the Deployment, because Kubespray does not let me override the coredns Deployment to add this under securityContext.sysctls
adding --user=0:0 to the nerdctl run command
- I cannot apply a similar workaround to the Deployment, because Kubespray does not let me override the coredns Deployment to add this under securityContext
adjusting the Corefile configuration to use a port higher than 1023
using an older version of coredns (older than v1.11.0), like v1.10.1

As this comment states, coredns was made to run as non-root user since v1.11.0.

It appears that Kubespray sets up the coredns Deployment to run as the default user and does not explicitly adjust sysctl for net.ipv4.ip_unprivileged_port_start. It also doesn't provide much control of the securityContext, so applying any of these workarounds is difficult.

It would probably be good if one of these workarounds is applied by default.

OS

Linux 5.15.0-113-generic x86_64
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version of Ansible

Irrelevant

Version of Python

Irrelevant

Version of Kubespray (commit)

v2.25.0

Network plugin used

cilium

Full inventory with variables

My configuration is not customized much - using the containerd runtime, etc.

Command used to invoke ansible

Irrelevant

Output of ansible run

Ansible run is all good

Anything else we need to know

No response

The text was updated successfully, but these errors were encountered:

spantaleev · 2024-07-08T14:22:26Z

For now, I work around this issue by pinning coredns to an older version (older than v1.11.0 which landed support for running as non-root here coredns/coredns#5969).

These older coredns versions still run as root by default, so binding to privileged ports works reliably on all my nodes.

coredns_version: v1.10.1

k8s-triage-robot · 2024-10-06T14:49:23Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-11-05T15:35:59Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-12-05T16:31:57Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-12-05T16:32:04Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

spantaleev added the kind/bug Categorizes issue or PR as related to a bug. label Jul 8, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 6, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 5, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coredns pods sometimes fail to start due to trying to bind privileged ports as non-root user #11366

coredns pods sometimes fail to start due to trying to bind privileged ports as non-root user #11366

spantaleev commented Jul 8, 2024

spantaleev commented Jul 8, 2024

k8s-triage-robot commented Oct 6, 2024

k8s-triage-robot commented Nov 5, 2024

k8s-triage-robot commented Dec 5, 2024

k8s-ci-robot commented Dec 5, 2024

coredns pods sometimes fail to start due to trying to bind privileged ports as non-root user #11366

coredns pods sometimes fail to start due to trying to bind privileged ports as non-root user #11366

Comments

spantaleev commented Jul 8, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

OS

Version of Ansible

Version of Python

Version of Kubespray (commit)

Network plugin used

Full inventory with variables

Command used to invoke ansible

Output of ansible run

Anything else we need to know

spantaleev commented Jul 8, 2024

k8s-triage-robot commented Oct 6, 2024

k8s-triage-robot commented Nov 5, 2024

k8s-triage-robot commented Dec 5, 2024

k8s-ci-robot commented Dec 5, 2024