Add integration test for a VM changing IP and cert not working #10593

mconner · 2021-02-24T19:08:13Z

Steps to reproduce the issue:

minikube stop
minikube start (got the errors, as shown in minikube start below)
minikube stop
minikube start. Everything (pods, ingress, volumes) is gone.

I had been running minikube, starting and stopping without any problem, and sometimes restarting windows, for a few days without issue. I've had similar problems in the past when shutting down windows without first stopping minikube, but that was not the case, here. (And I had not saved the error, so I'm not sure it was the same thing.)

minikube was created with the following config settings:

PS C:\WINDOWS\system32> minikube config view
- cpus: 4
- driver: hyperv
- memory: 4096

The current IP address is:

Ethernet adapter vEthernet (Default Switch):
   IPv4 Address. . . . . . . . . . . : 172.17.194.49
   Subnet Mask . . . . . . . . . . . : 255.255.255.240
   Default Gateway . . . . . . . . . :

It looks very much like #8936, which was supposedly fixed by PR #9294, but as far as I can tell, that was released in 1.15, or earlier, while I'm running 1.16.0

Full output of failed command:

PS C:\WINDOWS\system32> minikube start
* minikube v1.16.0 on Microsoft Windows 10 Enterprise 10.0.17763 Build 17763
* Using the hyperv driver based on existing profile
* Starting control plane node minikube in cluster minikube
* Restarting existing hyperv VM for "minikube" ...
* Found network options:
  - NO_PROXY=192.168.99.100
  - no_proxy=192.168.99.100
* Preparing Kubernetes v1.20.0 on Docker 20.10.0 ...
  - env NO_PROXY=192.168.99.100
! Unable to restart cluster, will reset it: apiserver health: controlPlane never updated to v1.20.0
  - Generating certificates and keys ...
  - Booting up control plane ...
  - Configuring RBAC rules ...
* Verifying Kubernetes components...
! Enabling 'default-storageclass' returned an error: running callbacks: [Error making standard the default storage class: Error listing StorageClasses: Get "https://172.17.194.55:8443/apis/storage.k8s.io/v1/storageclasses": x509: certificate is valid for 172.17.194.60, 10.96.0.1, 127.0.0.1, 10.0.0.1, not 172.17.194.55]
* Verifying ingress addon...
! Enabling 'storage-provisioner' returned an error: running callbacks: [sudo KUBECONFIG=/var/lib/minikube/kubeconfig /var/lib/minikube/binaries/v1.20.0/kubectl apply -f /etc/kubernetes/addons/storage-provisioner.yaml: Process exited with status 1
stdout:
serviceaccount/storage-provisioner unchanged
clusterrolebinding.rbac.authorization.k8s.io/storage-provisioner unchanged
role.rbac.authorization.k8s.io/system:persistent-volume-provisioner unchanged
rolebinding.rbac.authorization.k8s.io/system:persistent-volume-provisioner unchanged
endpoints/k8s.io-minikube-hostpath unchanged

stderr:
Error from server (ServerTimeout): error when creating "/etc/kubernetes/addons/storage-provisioner.yaml": No API token found for service account "storage-provisioner", retry after the token is automatically created and added to the service account
]

X Exiting due to GUEST_START: wait 6m0s for node: wait for healthy API server: controlPlane never updated to v1.20.0
*
* If the above advice does not help, please let us know:
  - https://github.com/kubernetes/minikube/issues/new/choose




PS C:\WINDOWS\system32> minikube stop
* Stopping node "minikube"  ...
* Powering off "minikube" via SSH ...
* 1 nodes stopped.
PS C:\WINDOWS\system32>
PS C:\WINDOWS\system32>
PS C:\WINDOWS\system32>
PS C:\WINDOWS\system32> minikube start
* minikube v1.16.0 on Microsoft Windows 10 Enterprise 10.0.17763 Build 17763
* Using the hyperv driver based on existing profile
* Starting control plane node minikube in cluster minikube
* Restarting existing hyperv VM for "minikube" ...
* Found network options:
  - NO_PROXY=192.168.99.100
  - no_proxy=192.168.99.100
* Preparing Kubernetes v1.20.0 on Docker 20.10.0 ...
  - env NO_PROXY=192.168.99.100
* Verifying Kubernetes components...
* Verifying ingress addon...
* Enabled addons: storage-provisioner, default-storageclass, ingress
* Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

Full output of minikube start command used, if not already included:

Optional: Full output of minikube logs command: Note that this was taken after the second minikube stop, minikube start

PS C:\WINDOWS\system32> minikube logs E0224 13:03:06.670377 9296 logs.go:203] Failed to list containers for "kube-apiserver": docker: docker ps -a --filter=name=k8s_kube-apiserver --format={{.ID}}: Process exited with status 1 stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:06.808368 9296 logs.go:203] Failed to list containers for "etcd": docker: docker ps -a --filter=name=k8s_etcd --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:06.872371 9296 logs.go:203] Failed to list containers for "coredns": docker: docker ps -a --filter=name=k8s_coredns --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:06.951372 9296 logs.go:203] Failed to list containers for "kube-scheduler": docker: docker ps -a --filter=name=k8s_kube-scheduler --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:07.050372 9296 logs.go:203] Failed to list containers for "kube-proxy": docker: docker ps -a --filter=name=k8s_kube-proxy --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:07.109380 9296 logs.go:203] Failed to list containers for "kubernetes-dashboard": docker: docker ps -a --filter=name=k8s_kubernetes-dashboard --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:07.195377 9296 logs.go:203] Failed to list containers for "storage-provisioner": docker: docker ps -a --filter=name=k8s_storage-provisioner --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:07.287378 9296 logs.go:203] Failed to list containers for "kube-controller-manager": docker: docker ps -a --filter=name=k8s_kube-controller-manager --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

==> Docker <==
-- Logs begin at Wed 2021-02-24 17:48:53 UTC, end at Wed 2021-02-24 19:03:07 UTC. --
-- No entries --
==> container status <==
time="2021-02-24T19:03:07Z" level=warning msg="runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock]. As the default settings are now deprecated, you should set the endpoint instead."
time="2021-02-24T19:03:09Z" level=error msg="connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded"
time="2021-02-24T19:03:11Z" level=error msg="connect endpoint 'unix:///run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded"
time="2021-02-24T19:03:11Z" level=warning msg="image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock]. As the default settings are now deprecated, you should set the endpoint instead."
time="2021-02-24T19:03:13Z" level=error msg="connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded"
time="2021-02-24T19:03:15Z" level=error msg="connect endpoint 'unix:///run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded"
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
==> describe nodes <==
E0224 13:03:15.844489 9296 logs.go:181] command /bin/bash -c "sudo /var/lib/minikube/binaries/v1.20.0/kubectl describe nodes --kubeconfig=/var/lib/minikube/kubeconfig" failed with error: /bin/bash -c "sudo /var/lib/minikube/binaries/v1.20.0/kubectl describe nodes --kubeconfig=/var/lib/minikube/kubeconfig": Process exited with status 1
stdout:

stderr:
The connection to the server localhost:8443 was refused - did you specify the right host or port?
output: "\n** stderr ** \nThe connection to the server localhost:8443 was refused - did you specify the right host or port?\n\n** /stderr **"
*

==> dmesg <==
[Feb24 17:48] smpboot: 128 Processors exceeds NR_CPUS limit of 64
[ +0.192488] You have booted with nomodeset. This means your GPU drivers are DISABLED
[ +0.000001] Any video related functionality will be severely degraded, and you may not even be able to suspend the system properly
[ +0.000001] Unless you actually understand what nomodeset does, you should reboot without enabling it
[ +0.124182] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ +0.000096] Support mounting host directories into pods #2 Support kubernetes dashboard. #3
[ +0.064624] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ +0.019355] * Found PM-Timer Bug on the chipset. Due to workarounds for a bug,

          * this clock source is slow. Consider trying other clock sources

[ +3.846489] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[ +0.000048] Unstable clock detected, switching default tracing clock to "global"

          If you want to keep using the local clock, then add:

```
            "trace_clock=local"
```
```
          on the kernel command line
```
[ +1.055338] psmouse serio1: trackpoint: failed to get extended button data, assuming 3 buttons
[ +0.973247] systemd-fstab-generator[1318]: Ignoring "noauto" for root device
[ +0.049344] systemd[1]: system-getty.slice: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
[ +0.000003] systemd[1]: (This warning is only shown for the first unit using IP firewalling.)
[ +1.030336] SELinux: unrecognized netlink message: protocol=0 nlmsg_type=106 sclass=netlink_route_socket pid=2059 comm=systemd-network
[ +2.477572] NFSD: the nfsdcld client tracking upcall will be removed in 3.10. Please transition to using nfsdcltrack.
[ +1.281517] vboxguest: loading out-of-tree module taints kernel.
[ +0.009848] vboxguest: PCI device not found, probably running on physical hardware.
[Feb24 17:51] NFSD: Unable to end grace period: -110
==> kernel <==
19:03:16 up 1:14, 0 users, load average: 0.00, 0.00, 0.00
Linux minikube 4.19.157 Need a reliable and low latency local cluster setup for Kubernetes #1 SMP Mon Dec 14 11:34:36 PST 2020 x86_64 GNU/Linux
PRETTY_NAME="Buildroot 2020.02.8"
==> kubelet <==
-- Logs begin at Wed 2021-02-24 17:48:53 UTC, end at Wed 2021-02-24 19:03:16 UTC. --
-- No entries --

! unable to fetch logs for: describe nodes
PS C:\WINDOWS\system32>

The text was updated successfully, but these errors were encountered:

priyawadhwa · 2021-02-24T19:43:42Z

Hey @mconner thanks for opening this issue. It looks similar to #8765 -- unfortunately I'm not sure why this is happening, but it looks like running minikube delete && minikube start resolved it for most people.

Would that work for you?

mconner · 2021-02-24T21:27:05Z

@priyawadhwa,
re

It looks similar to #8765
I don't think so.

They were trying to start for the first time. I am usually able to start and stop successfully for a few days.
There's nothing in their start command output about "certificate is valid for these, not this", (that I could find, at least).
Like I said, it looks very much like docker: start fails on storage provisioner addon "x509: certificate is valid for 172.17.0.3" #8936, which was supposedly fixed by PR add dedicated network for docker driver #9294. That is, the IP changed, the cert was for the old IP, and it is now no good.

re:

running minikube delete && minikube start ... Would that work for you?
Would having to rebuild everything I've set up work for me? No. At least not yet. We are still working through figuring out how to set up our applications, and quite a way from automating it. I need to shut down minikube when I need to use our VPN because something gets messed up otherwise. Also, we are loading data into elasticsearch as part of our system, which will take some time. So a rebuild and reload will take a bit of time. So having this randomly fail to start because the IP changes for some reason is a real pain in the neck.

priyawadhwa · 2021-02-25T22:12:10Z

@mconner gotcha. You pointed out #9294 in the issue description -- that won't fix your bug because it only applies to the docker driver, and you're running on hyperv.

Would the docker driver be an option for you? (I'm not very familiar with hyperv or Windows so unfortunately this is the best suggestion I currently have).

mconner · 2021-03-01T14:31:02Z

@priyawadhwa,
According to the documentation:
The ingress, and ingress-dns addons are currently only supported on Linux. See #7332

We are using ingress, on Windows, so I assume docker driver is therefore not an option.

medyagh · 2021-03-16T03:57:25Z

@mconner do u have this issue with latest version of minikube ? 1.18.1 ?

sharifelgamal · 2021-03-31T18:46:54Z

@mconner Apologies, it seems our documentation for ingress is somewhat out of date. Ingress should work fine for the docker driver on Windows now, if it doesn't that's an issue we need to look at.

medyagh · 2021-05-19T18:33:52Z

@mconner I suspect deleting minikube and updating to latest version shoudl fix this issue, the problem seems to be the IP of minikube changed

medyagh · 2021-05-19T18:36:36Z

I would love to add a way to reproduce this problem manually by creating a vm minikube and changing the IP

k8s-triage-robot · 2021-08-17T19:40:10Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-09-16T20:17:29Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

priyawadhwa added the kind/support Categorizes issue or PR as a support question. label Feb 24, 2021

priyawadhwa added the triage/needs-information Indicates an issue needs more information in order to work on it. label Feb 24, 2021

medyagh changed the title ~~minikube start fails on stop/start~~ windows: hyperv minikube start fails on stop/start Mar 16, 2021

medyagh changed the title ~~windows: hyperv minikube start fails on stop/start~~ Add integertion test for a VM changing IP and cert not working May 19, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 16, 2021

sharifelgamal changed the title ~~Add integertion test for a VM changing IP and cert not working~~ Add integration test for a VM changing IP and cert not working Sep 22, 2021

sharifelgamal added area/testing co/hyperv HyperV related issues lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Sep 22, 2021

spowelljr added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jan 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration test for a VM changing IP and cert not working #10593

Add integration test for a VM changing IP and cert not working #10593

mconner commented Feb 24, 2021

priyawadhwa commented Feb 24, 2021

mconner commented Feb 24, 2021

priyawadhwa commented Feb 25, 2021 •

edited

Loading

mconner commented Mar 1, 2021 •

edited

Loading

medyagh commented Mar 16, 2021

sharifelgamal commented Mar 31, 2021

medyagh commented May 19, 2021

medyagh commented May 19, 2021

k8s-triage-robot commented Aug 17, 2021

k8s-triage-robot commented Sep 16, 2021

Add integration test for a VM changing IP and cert not working #10593

Add integration test for a VM changing IP and cert not working #10593

Comments

mconner commented Feb 24, 2021

priyawadhwa commented Feb 24, 2021

mconner commented Feb 24, 2021

priyawadhwa commented Feb 25, 2021 • edited Loading

mconner commented Mar 1, 2021 • edited Loading

medyagh commented Mar 16, 2021

sharifelgamal commented Mar 31, 2021

medyagh commented May 19, 2021

medyagh commented May 19, 2021

k8s-triage-robot commented Aug 17, 2021

k8s-triage-robot commented Sep 16, 2021

priyawadhwa commented Feb 25, 2021 •

edited

Loading

mconner commented Mar 1, 2021 •

edited

Loading