Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hyperv: bad retry behavior: config.json: The system cannot find the file specified #5941

Closed
brainfull opened this issue Nov 18, 2019 · 4 comments · Fixed by #6129
Closed
Labels
co/hyperv HyperV related issues help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. os/windows

Comments

@brainfull
Copy link

brainfull commented Nov 18, 2019

The exact command to reproduce the issue:

  • create a new VM (assuming you have 8GB memory available, else decrease --memory)
    minikube start --vm-driver hyperv --memory 8192
  • stop the VM, don't delete it
    minikube stop
  • open many consuming applications, like many tabs of Youtube in Chrome, so you don't have enough memory anymore
  • restart the same VM
    minikube start --vm-driver hyperv --memory 8192
  • BUG number 1 you will end up with minikube deleting the VM and retry many times, what a poor error handling mechanism
  • BUG number 2 in this specific log below there is even an additional bug because %userprofile%\machines\minikube\config.json cannot be found anymore, the only solution to that other bug is to delete manually %userprofile%/.minikube directory in order to reset minikube completely, anyway you already lost your VM with test data so why not starting from scratch?

The full output of the command that failed:

  • minikube v1.5.2 on Microsoft Windows 10 Pro 10.0.17763 Build 17763
  • Tip: Use 'minikube start -p ' to create a new cluster, or 'minikube delete' to delete this one.
  • Starting existing hyperv VM for "minikube" ...
  • Retriable failure: start: exit status 1E1118 09:53:44.086570 23544 main.go:106] libmachine: [stderr =====>] : Hyper-V\Start-VM : 'minikube' failed to change state.

The operation failed with error code '32788'.
At line:1 char:1

  • Hyper-V\Start-VM minikube
  •   + CategoryInfo          : NotSpecified: (:) [Start-VM], VirtualizationException
      + FullyQualifiedErrorId : Unspecified,Microsoft.HyperV.PowerShell.Commands.StartVM
    
    
    
  • Successfully powered off Hyper-V. minikube driver -- hyperv
  • Deleting "minikube" in hyperv ...
  • Creating hyperv VM (CPUs=4, Memory=8192MB, Disk=50000MB) ...
    E1118 09:54:09.160314 23544 main.go:106] libmachine: [stderr =====>] : Hyper-V\Start-VM : 'minikube' failed to change state.
    The operation failed with error code '32788'.
    At line:1 char:1
  • Hyper-V\Start-VM minikube
  •   + CategoryInfo          : NotSpecified: (:) [Start-VM], VirtualizationException
      + FullyQualifiedErrorId : Unspecified,Microsoft.HyperV.PowerShell.Commands.StartVM
    
    
    
  • Retriable failure: create: creating: exit status 1
  • Successfully powered off Hyper-V. minikube driver -- hyperv
  • Deleting "minikube" in hyperv ...
  • Retriable failure: Error loading existing host. Please try running [minikube delete], then run [minikube start] again.: filestore "minikube": open C:\Users\luc.minikube\machines\minikube\config.json: The system cannot find the file specified.
  • Retriable failure: Error loading existing host. Please try running [minikube delete], then run [minikube start] again.: filestore "minikube": open C:\Users\luc.minikube\machines\minikube\config.json: The system cannot find the file specified.

X Unable to start VM: Error loading existing host. Please try running [minikube delete], then run [minikube start] again.: filestore "minikube": open C:\Users\luc.minikube\machines\minikube\config.json: The system cannot find the file specified.
*

The code responsible for that poor error handling mechanism:

I don't think the code below make any sense. 'minikube start' should never delete the VM. We should explicitly use 'minikube delete' if we ever think the solution is to delete the minikube VM.

https://github.com/kubernetes/minikube/blob/master/cmd/minikube/cmd/start.go

you can see the culprit line : cluster.DeleteHost(api)

start := func() (err error) {
	host, err = cluster.StartHost(api, mc)
	if err != nil {
		out.T(out.Resetting, "Retriable failure: {{.error}}", out.V{"error": err})
		if derr := cluster.DeleteHost(api); derr != nil {
			glog.Warningf("DeleteHost: %v", derr)
		}
	}
	return err
}

if err = retry.Expo(start, 5*time.Second, 3*time.Minute, 3); err != nil {
	exit.WithError("Unable to start VM", err)
}

The operating system version:

that would happen on any OS but I'm using Windows 10 Pro

Also see issue #5072, #5884

@tstromberg tstromberg changed the title Minikube VM is deleted because of "retriable failure" on startup hyperv: 'minikube' failed to change state. The operation failed with error code '32788' Nov 20, 2019
@tstromberg
Copy link
Contributor

tstromberg commented Nov 20, 2019

Here's some background on the Hyper-V error you are seeing:

https://www.sysprobs.com/fixed-failed-to-change-state-error-code-32788-on-hyper-v-while-attempting-to-start

We'll need to take a closer look at the retry issue. The intent was to automatically help users with corrupt VM state, but I can agree that it requires more thought. I'm mostly confused about the config.json issue mentioned here.

@tstromberg tstromberg changed the title hyperv: 'minikube' failed to change state. The operation failed with error code '32788' hyperv: bad retry behavior: config.json: The system cannot find the file specified Nov 20, 2019
@tstromberg tstromberg added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. labels Nov 20, 2019
@brainfull
Copy link
Author

Yeah, get it. Error 32788 is related to any HyperV changed state that requires the user to do something with his computer in order to fix the problem that prevents the HyperV VM to start. It is not a Retriable Failure since it explicitly require the user to do something.

Possible solutions are:

  1. Classify error 32788 as Non-Retriable Failure. So it doesn't end up deleting the VM and retry 5 times.
  2. Remove that mechanism that deletes the VM when calling minikube start

As for the config.json that disappears. It seems to be the consequence of minikube VM being restarted 5 times in a row. So I really believe Solution 2 is the best.

Please, let's get rid of that mechanism that retries 5 times and often end up in the config.json being deleted.

@priyawadhwa priyawadhwa added the kind/support Categorizes issue or PR as a support question. label Nov 20, 2019
@blueelvis blueelvis added co/hyperv HyperV related issues os/windows labels Nov 25, 2019
@blueelvis
Copy link
Contributor

@brainfull - Thank you very much for reporting this! (And the detailed explanation of your findings on one of the issues you posted regarding major user-gap issues) Fully agreed that this should be fixed and the cluster should not be deleted.

Let me see if I can find a list of error codes for Hyper-V and do some more experimenting. We can look at removing that code as well but would have to discuss with others.

The config.json is a separate issue which we are tracking as well and is not related to this.

@brainfull
Copy link
Author

@blueelvis If you need help with Minikube on Windows I would be glad to help. Our company official dev environment is now based on Minikube and we are automating all our processes around it. Our goal is to ensure that once we tested in Minikube on Windows it goes well on the staging and prod environment, which are kubernetes clusters on cloud providers. thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/hyperv HyperV related issues help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. kind/support Categorizes issue or PR as a support question. os/windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants