Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coredns pods sometimes fail to start due to trying to bind privileged ports as non-root user #11366

Closed
spantaleev opened this issue Jul 8, 2024 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@spantaleev
Copy link
Contributor

What happened?

On some of my nodes, coredns Pods (currently using the v1.11.1container image) fail to start with an error:

Listen: listen tcp :53: bind: permission denied

On others, it runs fine.

As far as I could tell, all my nodes are identical (same OS, same kernel version, same containerd version, same sysctl parameter for net.ipv4.ip_unprivileged_port_start = 1024).

I am not sure why binding on privileged ports works as a non-root user on some nodes and not on others.

What did you expect to happen?

I would expect that coredns would reliably run on all my cluster's nodes.

How can we reproduce it (as minimally and precisely as possible)?

Since my Kuberspray config yields working & non-working nodes, I was trying to reproduce the issue in another way.

I've used the following Corefile (inspired by the coredns config map but with the kubernetes plugin disabled):

.:53 {
    errors {
    }
    health {
        lameduck 5s
    }
    ready

    # Disable Kubernetes plugin, as we'll run in a non-Kubernetes context for testing purposes.
    #kubernetes cluster.local in-addr.arpa ip6.arpa {
    #  pods insecure
    #  fallthrough in-addr.arpa ip6.arpa
    #}

    prometheus :9153
    forward . 8.8.8.8 8.8.4.4 {
        prefer_udp
        max_concurrent 1000
    }
    cache 30

    loop
    reload
    loadbalance
}

and I try to run this with:

nerdctl run \
-it \
--rm \
--network=none \
--mount type=bind,src=$(pwd)/Corefile,dst=/etc/coredns/Corefile,ro \
--cap-add=NET_BIND_SERVICE \
registry.k8s.io/coredns/coredns:v1.11.1 \
-conf /etc/coredns/Corefile

On some nodes it works, on others I get the aforementioned error.

It appears that NET_BIND_SERVICE does not do anything.

Workarounds:

  • adding --sysctl net.ipv4.ip_unprivileged_port_start=0 to the nerdctl run command

    • I cannot apply a similar workaround to the Deployment, because Kubespray does not let me override the coredns Deployment to add this under securityContext.sysctls
  • adding --user=0:0 to the nerdctl run command

    • I cannot apply a similar workaround to the Deployment, because Kubespray does not let me override the coredns Deployment to add this under securityContext
  • adjusting the Corefile configuration to use a port higher than 1023

  • using an older version of coredns (older than v1.11.0), like v1.10.1

As this comment states, coredns was made to run as non-root user since v1.11.0.

It appears that Kubespray sets up the coredns Deployment to run as the default user and does not explicitly adjust sysctl for net.ipv4.ip_unprivileged_port_start. It also doesn't provide much control of the securityContext, so applying any of these workarounds is difficult.

It would probably be good if one of these workarounds is applied by default.

OS

Linux 5.15.0-113-generic x86_64
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version of Ansible

Irrelevant

Version of Python

Irrelevant

Version of Kubespray (commit)

v2.25.0

Network plugin used

cilium

Full inventory with variables

My configuration is not customized much - using the containerd runtime, etc.

Command used to invoke ansible

Irrelevant

Output of ansible run

Ansible run is all good

Anything else we need to know

No response

@spantaleev spantaleev added the kind/bug Categorizes issue or PR as related to a bug. label Jul 8, 2024
@spantaleev
Copy link
Contributor Author

For now, I work around this issue by pinning coredns to an older version (older than v1.11.0 which landed support for running as non-root here coredns/coredns#5969).

These older coredns versions still run as root by default, so binding to privileged ports works reliably on all my nodes.

coredns_version: v1.10.1

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 6, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 5, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants