Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Build Test: Ran into FATA[0005] subnet 10.4.0.0/24 overlaps with other one on this address space #2935

Open
gunamata opened this issue Sep 19, 2022 · 14 comments

Comments

@gunamata
Copy link
Contributor

Actual Behavior

Ran into below error after running a container.

FATA[0005] subnet 10.4.0.0/24 overlaps with other one on this address space

I observed this behavior with the CI build:
https://github.com/rancher-sandbox/rancher-desktop/actions/runs/3071306322

Steps to Reproduce

  • Run a container
    nerdctl run -d -p 85:80 --restart=always nginx
  • Downgrade Kubernetes version
  • Reset Kubernetes with Images
  • Run container
    nerdctl run -d -p 85:80 --restart=always nginx

Result

Ran into below error after running a container.

FATA[0005] subnet 10.4.0.0/24 overlaps with other one on this address space

Expected Behavior

The container should run with out errors

Additional Information

No response

Rancher Desktop Version

https://github.com/rancher-sandbox/rancher-desktop/actions/runs/3071306322

Rancher Desktop K8s Version

1.21.4

Which container engine are you using?

containerd (nerdctl)

What operating system are you using?

Windows

Operating System / Build Version

Windows 10 Enterprise

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

No response

Windows User Only

No response

@gunamata gunamata added kind/bug Something isn't working runtime/containerd labels Sep 19, 2022
@gaktive gaktive added this to the Next milestone Sep 19, 2022
@Nino-K
Copy link
Member

Nino-K commented Sep 19, 2022

The problem here is that the underlying container engine (CNI) checks any newly created network route against the existing routes on the system. If a route rule with an IP address from a conflicting subnet on the Iptables exists it will yeild to this error. The conflicting routes could be either from the host network (bridge mode) or Kube network in this case.
A long-term workaround would be, we can either detect conflicting addresses and change the available network pools to the container engine accordingly. Or, as a short term solution we can document on how to manually change the network pool address.

@jandubois
Copy link
Member

I have not been able to repro this because #2934 is blocking me from getting to a working system.

@jandubois
Copy link
Member

Doing a Factory Reset allowed me to go past #2934, but I still cannot repro this.

I got an error once:

e:\home\jan>nerdctl run -d -p 85:80 --restart=always nginx
FATA[0002] OCI runtime start failed: cannot start a container that has stopped: unknown

But that was maybe while containerd was still starting up.

Afterwards I could run the command repeatedly without getting any error. I'm somewhat surprised though that nerdctl didn't tell me that the port was already in use:

e:\home\jan>nerdctl ps -a
CONTAINER ID    IMAGE                             COMMAND                   CREATED               STATUS    PORTS                 NAMES
0bb0a5a0de03    docker.io/library/nginx:latest    "/docker-entrypoint.…"    8 minutes ago         Up        0.0.0.0:85->80/tcp    nginx-0bb0a
0fb32c58e483    docker.io/library/nginx:latest    "/docker-entrypoint.…"    13 seconds ago        Up        0.0.0.0:85->80/tcp    nginx-0fb32
3d66a24f31ce    docker.io/library/nginx:latest    "/docker-entrypoint.…"    10 seconds ago        Up        0.0.0.0:85->80/tcp    nginx-3d66a
5449641543de    docker.io/library/nginx:latest    "/docker-entrypoint.…"    About a minute ago    Up        0.0.0.0:85->80/tcp    nginx-54496
bc5eb13c4ab8    docker.io/library/nginx:latest    "/docker-entrypoint.…"    3 minutes ago         Up        0.0.0.0:85->80/tcp    nginx-bc5eb
d33650d61a92    docker.io/library/nginx:latest    "/docker-entrypoint.…"    7 seconds ago         Up        0.0.0.0:85->80/tcp    nginx-d3365

FWIW, 10.4.0.0/24 is the network created by nerdctl itself:

16: nerdctl0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether b2:76:c0:57:f0:ea brd ff:ff:ff:ff:ff:ff
    inet 10.4.0.1/24 brd 10.4.0.255 scope global nerdctl0

So any reported conflict would be with the network already set up by the previous container starting while setting up a new one at the same time.

@jandubois
Copy link
Member

I forgot the "Reset Kubernetes with Images" step. After I've done this, I get the error too:

e:\home\jan>nerdctl run -d -p 85:80 --restart=always nginx
docker.io/library/nginx:latest:                                                   resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:0b970013351304af46f322da1263516b188318682b2ab1091862497591189ff1:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:79c77eb7ca32f9a117ef91bc6ac486014e0d0e75f2f06683ba24dc298f9f4dd4: done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:2d389e545974d4a93ebdef09b650753a55f72d1ab4518d17a30c0e1b3e297444:   done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:600c24b8ba3900f029e02f62ad9d14a04880ffdf7b8c57bfc74d569477002d67:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:31b3f1ad4ce1f369084d0f959813c51df0ca17d9877d5ee88c2db6ff88341430:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:fd42b079d0f818ce0687ee4290715b1b4843a1d5e6ebe7e3144c55ed11a215ca:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:30585fbbebc6bc3f81cb80830fe83b04613cda93ea449bb3465a08bdec8e2e43:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:18f4ffdd25f46fa28f496efb7949b137549b35cb441fb671c1f7fa4e081fd925:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:9dc932c8fba266219fd16728c9e3f632296d043407e77d6af626c5119f021b42:    done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 15.7s                                                                    total:  30.0 M (1.9 MiB/s)
FATA[0017] subnet 10.4.0.0/24 overlaps with other one on this address space

@gunamata
Copy link
Contributor Author

I could repro this on 1.5.1 (latest release at this time) too. Here are the steps: (Same steps as I mentioned in the initial issue description, just that I captured some additional info about the Kubernetes versions I used)

  1. Reset Kubernetes to start fresh, I am on Kubernetes version v1.25.0
  2. Run a container
    nerdctl run -d -p 85:80 --restart=always nginx
  3. Downgrade to V1.20.15
  4. Reset Kubernetes with Images
  5. Run a container
    nerdctl run -d -p 85:80 --restart=always nginx

@jandubois
Copy link
Member

Out of curiosity I tried this on macOS as well, and I couldn't repro it there.

I didn't really expect it anyways; earlier discussion with @mook-as produced the theory that the problem comes because "deleting" the VM on WSL does not really restart WSL, and since networking is shared between distros, it is possible that the old network definitions were not cleaned up properly.

@jandubois
Copy link
Member

This has nothing to do with k8s and downgrading I can repro with this simplified steps:

  1. Fresh install of 1.5.1 (with k8s disabled)
  2. Run nerdctl run -d -p 85:80 --restart=always nginx
  3. Reset Kubernetes with Images 1
  4. Run nerdctl run -d -p 85:80 --restart=always nginx

So it seems indeed like the nerdctl0 network is lingering even though the rancher-desktop distro got deleted and recreated.

Since it is not a regression, I think this could be moved to the "Later" milestone.

Footnotes

  1. It doesn't really make sense to call this "Reset Kubernetes" while Kubernetes is disabled.

@gunamata
Copy link
Contributor Author

Doing Factory Reset or restarting the machine resolved this issue for me on Windows 10 Enterprise.. Just sharing if it helps with the investigation of the problem..

@jandubois
Copy link
Member

Doing Factory Reset or restarting the machine resolved this issue for me on Windows 10 Enterprise..

I would think that anything that shuts down the WSL VM (and not just the individual distro) would fix it because I don't see how a network definition would survive the restart.

So I think wsl --shutdown would fix the problem, but it is rather heavy-handed, as it will stop all other distros as well. At the very least we would need an extra warning/confirmation from the user.

@gaktive
Copy link
Contributor

gaktive commented Sep 21, 2022

@gunamata to provide material to update the FAQ around this. We should test a WSL shutdown around this too.

@Nino-K
Copy link
Member

Nino-K commented Sep 21, 2022

So I think wsl --shutdown would fix the problem, but it is rather heavy-handed, as it will stop all other distros as well. At the very least we would need an extra warning/confirmation from the user.

If we are taking the short-term approach, we should be able to change the default network pool (/etc/cni/net.d if exist, if not create one) available to containerd instead of shutting down WSL. This would be very similar to docker's default-address-pool.
e.g.

"default-address-pools": [
       {
           "base": "10.17.0.1/16",
            "size": 16
        }
]

This example might be useful: https://github.com/containerd/containerd/blob/main/script/setup/install-cni

@Nino-K
Copy link
Member

Nino-K commented Sep 22, 2022

@gunamata this (custom networks) should be sufficient for documentation purposes. Although I have not tested it with our version of nerdctl.

@jandubois
Copy link
Member

Although I have not tested it with our version of nerdctl.

Please test before adding to docs! We should be sure it actually works. 😺

@gaktive
Copy link
Contributor

gaktive commented Nov 15, 2022

Based on a comment in #3365, it looks like nerdctl introduced this in containerd/nerdctl#1245.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants