Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration test for a VM changing IP and cert not working #10593

Open
mconner opened this issue Feb 24, 2021 · 10 comments
Open

Add integration test for a VM changing IP and cert not working #10593

mconner opened this issue Feb 24, 2021 · 10 comments
Labels
area/testing co/hyperv HyperV related issues kind/process Process oriented issues, like setting up CI lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@mconner
Copy link

mconner commented Feb 24, 2021

Steps to reproduce the issue:

  1. minikube stop
  2. minikube start (got the errors, as shown in minikube start below)
  3. minikube stop
  4. minikube start. Everything (pods, ingress, volumes) is gone.

I had been running minikube, starting and stopping without any problem, and sometimes restarting windows, for a few days without issue. I've had similar problems in the past when shutting down windows without first stopping minikube, but that was not the case, here. (And I had not saved the error, so I'm not sure it was the same thing.)

minikube was created with the following config settings:

PS C:\WINDOWS\system32> minikube config view
- cpus: 4
- driver: hyperv
- memory: 4096

The current IP address is:

Ethernet adapter vEthernet (Default Switch):
   IPv4 Address. . . . . . . . . . . : 172.17.194.49
   Subnet Mask . . . . . . . . . . . : 255.255.255.240
   Default Gateway . . . . . . . . . :

It looks very much like #8936, which was supposedly fixed by PR #9294, but as far as I can tell, that was released in 1.15, or earlier, while I'm running 1.16.0

Full output of failed command:

PS C:\WINDOWS\system32> minikube start
* minikube v1.16.0 on Microsoft Windows 10 Enterprise 10.0.17763 Build 17763
* Using the hyperv driver based on existing profile
* Starting control plane node minikube in cluster minikube
* Restarting existing hyperv VM for "minikube" ...
* Found network options:
  - NO_PROXY=192.168.99.100
  - no_proxy=192.168.99.100
* Preparing Kubernetes v1.20.0 on Docker 20.10.0 ...
  - env NO_PROXY=192.168.99.100
! Unable to restart cluster, will reset it: apiserver health: controlPlane never updated to v1.20.0
  - Generating certificates and keys ...
  - Booting up control plane ...
  - Configuring RBAC rules ...
* Verifying Kubernetes components...
! Enabling 'default-storageclass' returned an error: running callbacks: [Error making standard the default storage class: Error listing StorageClasses: Get "https://172.17.194.55:8443/apis/storage.k8s.io/v1/storageclasses": x509: certificate is valid for 172.17.194.60, 10.96.0.1, 127.0.0.1, 10.0.0.1, not 172.17.194.55]
* Verifying ingress addon...
! Enabling 'storage-provisioner' returned an error: running callbacks: [sudo KUBECONFIG=/var/lib/minikube/kubeconfig /var/lib/minikube/binaries/v1.20.0/kubectl apply -f /etc/kubernetes/addons/storage-provisioner.yaml: Process exited with status 1
stdout:
serviceaccount/storage-provisioner unchanged
clusterrolebinding.rbac.authorization.k8s.io/storage-provisioner unchanged
role.rbac.authorization.k8s.io/system:persistent-volume-provisioner unchanged
rolebinding.rbac.authorization.k8s.io/system:persistent-volume-provisioner unchanged
endpoints/k8s.io-minikube-hostpath unchanged

stderr:
Error from server (ServerTimeout): error when creating "/etc/kubernetes/addons/storage-provisioner.yaml": No API token found for service account "storage-provisioner", retry after the token is automatically created and added to the service account
]

X Exiting due to GUEST_START: wait 6m0s for node: wait for healthy API server: controlPlane never updated to v1.20.0
*
* If the above advice does not help, please let us know:
  - https://github.com/kubernetes/minikube/issues/new/choose




PS C:\WINDOWS\system32> minikube stop
* Stopping node "minikube"  ...
* Powering off "minikube" via SSH ...
* 1 nodes stopped.
PS C:\WINDOWS\system32>
PS C:\WINDOWS\system32>
PS C:\WINDOWS\system32>
PS C:\WINDOWS\system32> minikube start
* minikube v1.16.0 on Microsoft Windows 10 Enterprise 10.0.17763 Build 17763
* Using the hyperv driver based on existing profile
* Starting control plane node minikube in cluster minikube
* Restarting existing hyperv VM for "minikube" ...
* Found network options:
  - NO_PROXY=192.168.99.100
  - no_proxy=192.168.99.100
* Preparing Kubernetes v1.20.0 on Docker 20.10.0 ...
  - env NO_PROXY=192.168.99.100
* Verifying Kubernetes components...
* Verifying ingress addon...
* Enabled addons: storage-provisioner, default-storageclass, ingress
* Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

Full output of minikube start command used, if not already included:

Optional: Full output of minikube logs command: Note that this was taken after the second minikube stop, minikube start

PS C:\WINDOWS\system32> minikube logs E0224 13:03:06.670377 9296 logs.go:203] Failed to list containers for "kube-apiserver": docker: docker ps -a --filter=name=k8s_kube-apiserver --format={{.ID}}: Process exited with status 1 stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:06.808368 9296 logs.go:203] Failed to list containers for "etcd": docker: docker ps -a --filter=name=k8s_etcd --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:06.872371 9296 logs.go:203] Failed to list containers for "coredns": docker: docker ps -a --filter=name=k8s_coredns --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:06.951372 9296 logs.go:203] Failed to list containers for "kube-scheduler": docker: docker ps -a --filter=name=k8s_kube-scheduler --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:07.050372 9296 logs.go:203] Failed to list containers for "kube-proxy": docker: docker ps -a --filter=name=k8s_kube-proxy --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:07.109380 9296 logs.go:203] Failed to list containers for "kubernetes-dashboard": docker: docker ps -a --filter=name=k8s_kubernetes-dashboard --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:07.195377 9296 logs.go:203] Failed to list containers for "storage-provisioner": docker: docker ps -a --filter=name=k8s_storage-provisioner --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
E0224 13:03:07.287378 9296 logs.go:203] Failed to list containers for "kube-controller-manager": docker: docker ps -a --filter=name=k8s_kube-controller-manager --format={{.ID}}: Process exited with status 1
stdout:

stderr:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

  • ==> Docker <==
  • -- Logs begin at Wed 2021-02-24 17:48:53 UTC, end at Wed 2021-02-24 19:03:07 UTC. --
  • -- No entries --
  • ==> container status <==
  • time="2021-02-24T19:03:07Z" level=warning msg="runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock]. As the default settings are now deprecated, you should set the endpoint instead."
  • time="2021-02-24T19:03:09Z" level=error msg="connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded"
  • time="2021-02-24T19:03:11Z" level=error msg="connect endpoint 'unix:///run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded"
  • time="2021-02-24T19:03:11Z" level=warning msg="image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock]. As the default settings are now deprecated, you should set the endpoint instead."
  • time="2021-02-24T19:03:13Z" level=error msg="connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded"
  • time="2021-02-24T19:03:15Z" level=error msg="connect endpoint 'unix:///run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded"
  • CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
  • ==> describe nodes <==
    E0224 13:03:15.844489 9296 logs.go:181] command /bin/bash -c "sudo /var/lib/minikube/binaries/v1.20.0/kubectl describe nodes --kubeconfig=/var/lib/minikube/kubeconfig" failed with error: /bin/bash -c "sudo /var/lib/minikube/binaries/v1.20.0/kubectl describe nodes --kubeconfig=/var/lib/minikube/kubeconfig": Process exited with status 1
    stdout:

stderr:
The connection to the server localhost:8443 was refused - did you specify the right host or port?
output: "\n** stderr ** \nThe connection to the server localhost:8443 was refused - did you specify the right host or port?\n\n** /stderr **"
*

  • ==> dmesg <==
  • [Feb24 17:48] smpboot: 128 Processors exceeds NR_CPUS limit of 64
  • [ +0.192488] You have booted with nomodeset. This means your GPU drivers are DISABLED
  • [ +0.000001] Any video related functionality will be severely degraded, and you may not even be able to suspend the system properly
  • [ +0.000001] Unless you actually understand what nomodeset does, you should reboot without enabling it
  • [ +0.124182] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
  • [ +0.000096] Support mounting host directories into pods #2 Support kubernetes dashboard. #3
  • [ +0.064624] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
  • [ +0.019355] * Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
  •           * this clock source is slow. Consider trying other clock sources
    
  • [ +3.846489] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
  • [ +0.000048] Unstable clock detected, switching default tracing clock to "global"
  •           If you want to keep using the local clock, then add:
    
  •             "trace_clock=local"
    
  •           on the kernel command line
    
  • [ +1.055338] psmouse serio1: trackpoint: failed to get extended button data, assuming 3 buttons
  • [ +0.973247] systemd-fstab-generator[1318]: Ignoring "noauto" for root device
  • [ +0.049344] systemd[1]: system-getty.slice: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
  • [ +0.000003] systemd[1]: (This warning is only shown for the first unit using IP firewalling.)
  • [ +1.030336] SELinux: unrecognized netlink message: protocol=0 nlmsg_type=106 sclass=netlink_route_socket pid=2059 comm=systemd-network
  • [ +2.477572] NFSD: the nfsdcld client tracking upcall will be removed in 3.10. Please transition to using nfsdcltrack.
  • [ +1.281517] vboxguest: loading out-of-tree module taints kernel.
  • [ +0.009848] vboxguest: PCI device not found, probably running on physical hardware.
  • [Feb24 17:51] NFSD: Unable to end grace period: -110
  • ==> kernel <==
  • 19:03:16 up 1:14, 0 users, load average: 0.00, 0.00, 0.00
  • Linux minikube 4.19.157 Need a reliable and low latency local cluster setup for Kubernetes  #1 SMP Mon Dec 14 11:34:36 PST 2020 x86_64 GNU/Linux
  • PRETTY_NAME="Buildroot 2020.02.8"
  • ==> kubelet <==
  • -- Logs begin at Wed 2021-02-24 17:48:53 UTC, end at Wed 2021-02-24 19:03:16 UTC. --
  • -- No entries --

! unable to fetch logs for: describe nodes
PS C:\WINDOWS\system32>

@priyawadhwa priyawadhwa added the kind/support Categorizes issue or PR as a support question. label Feb 24, 2021
@priyawadhwa
Copy link

Hey @mconner thanks for opening this issue. It looks similar to #8765 -- unfortunately I'm not sure why this is happening, but it looks like running minikube delete && minikube start resolved it for most people.

Would that work for you?

@priyawadhwa priyawadhwa added the triage/needs-information Indicates an issue needs more information in order to work on it. label Feb 24, 2021
@mconner
Copy link
Author

mconner commented Feb 24, 2021

@priyawadhwa,
re

It looks similar to #8765
I don't think so.

re:

running minikube delete && minikube start ... Would that work for you?
Would having to rebuild everything I've set up work for me? No. At least not yet. We are still working through figuring out how to set up our applications, and quite a way from automating it. I need to shut down minikube when I need to use our VPN because something gets messed up otherwise. Also, we are loading data into elasticsearch as part of our system, which will take some time. So a rebuild and reload will take a bit of time. So having this randomly fail to start because the IP changes for some reason is a real pain in the neck.

@priyawadhwa
Copy link

priyawadhwa commented Feb 25, 2021

@mconner gotcha. You pointed out #9294 in the issue description -- that won't fix your bug because it only applies to the docker driver, and you're running on hyperv.

Would the docker driver be an option for you? (I'm not very familiar with hyperv or Windows so unfortunately this is the best suggestion I currently have).

@mconner
Copy link
Author

mconner commented Mar 1, 2021

@priyawadhwa,
According to the documentation:
The ingress, and ingress-dns addons are currently only supported on Linux. See #7332

We are using ingress, on Windows, so I assume docker driver is therefore not an option.

@medyagh medyagh changed the title minikube start fails on stop/start windows: hyperv minikube start fails on stop/start Mar 16, 2021
@medyagh
Copy link
Member

medyagh commented Mar 16, 2021

@mconner do u have this issue with latest version of minikube ? 1.18.1 ?

@sharifelgamal
Copy link
Collaborator

@mconner Apologies, it seems our documentation for ingress is somewhat out of date. Ingress should work fine for the docker driver on Windows now, if it doesn't that's an issue we need to look at.

@medyagh
Copy link
Member

medyagh commented May 19, 2021

@mconner I suspect deleting minikube and updating to latest version shoudl fix this issue, the problem seems to be the IP of minikube changed

@medyagh medyagh changed the title windows: hyperv minikube start fails on stop/start Add integertion test for a VM changing IP and cert not working May 19, 2021
@medyagh medyagh added kind/process Process oriented issues, like setting up CI priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it. labels May 19, 2021
@medyagh
Copy link
Member

medyagh commented May 19, 2021

I would love to add a way to reproduce this problem manually by creating a vm minikube and changing the IP

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 16, 2021
@sharifelgamal sharifelgamal changed the title Add integertion test for a VM changing IP and cert not working Add integration test for a VM changing IP and cert not working Sep 22, 2021
@sharifelgamal sharifelgamal added area/testing co/hyperv HyperV related issues lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Sep 22, 2021
@spowelljr spowelljr added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing co/hyperv HyperV related issues kind/process Process oriented issues, like setting up CI lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

7 participants