-
Notifications
You must be signed in to change notification settings - Fork 74
[wmco] Improve error handling #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wmco] Improve error handling #42
Conversation
8ee7ea7 to
3251715
Compare
VaishnaviHire
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @sebsoto . Please address my comments
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
aravindhp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, @sebsoto. Please address my comments.
| windowsVMs := make(map[types.WindowsVM]bool) | ||
| vmTracker, err := tracker.NewTracker(clientset, windowsVMs) | ||
| if err != nil { | ||
| return nil, errors.Wrap(err, "tracker instantiation failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failed to instantiate tracker
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
mansikulkarni96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @sebsoto. PTAL at my comments.
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| // Terminate the instance via its instance id | ||
| id := vm.GetCredentials().GetInstanceId() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename id to instancedID for specific naming
3251715 to
83e7f53
Compare
This commit changes the creation of the WMC controller to be more diligent about handling errors. This commit changes two things * If the kubernetes clientset cannot be created, that error will be handled instead of just being logged. * Error messages around the previous addition have been made more verbose
This commit moves the creation of our node tracker out of the reconcile function to where we create the reconciler. This is being done to create a distinction between the setup of the controller and the functionality of the controller, allowing for cleaner error handling.
83e7f53 to
3b67eb1
Compare
|
/retest Testing openshift/release#8323 |
|
@sebsoto when you get a chance please introduce a failure in one of the e2e tests. I want to see if we will get to see the test output in that case. It looks like with the step registry e2e test output is not show on success. |
e3569ae to
7f0f8ba
Compare
8671abb to
260d1e8
Compare
|
LGTM |
|
|
||
| // deleteWindowsVMs deletes the required number of Windows VMs from the cluster and returns a bool indicating the | ||
| // status of deletion. This method will return false if any of the VMs fail to get deleted. | ||
| // remove removes the given vm from the list of VMs and terminates the underlying VM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be pedantic I would change order. i.e. terminate first and then remove from the list.
How about renaming the function to removeWorkerNode() or removeWorker and sync with rest of the functions? addWorker addWorkers etc.
I also think we don't need to call out that it is Windows as it obvious given our operator is WMCO :-) But I am open to suggestions.
| // TODO: This method should return a slice of errors that we collected. | ||
| // Jira story: https://issues.redhat.com/browse/WINC-266 | ||
| func (r *ReconcileWindowsMachineConfig) deleteWindowsVMs(count int) bool { | ||
| func (r *ReconcileWindowsMachineConfig) removeWindowsWorkerNodes(count int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the previous comment this would become removeWorkerNodes
| } | ||
|
|
||
| // createWindowsVMs creates the required number of windows Windows VM and configure them to make | ||
| // addWindowsWorker creates a new Windows VM and configures it, adding it as a node object to the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addWorkerNode()
| if err != nil { | ||
| return nil, errors.Wrap(err, "error creating windows VM") | ||
| } | ||
| log.V(1).Info("created a new Windows VM", "ID", vm.GetCredentials().GetInstanceId()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this as the next debug message implies the VM was created.
| // TODO: This method should return a slice of errors that we collected. | ||
| // Jira story: https://issues.redhat.com/browse/WINC-266 | ||
| func (r *ReconcileWindowsMachineConfig) createWindowsWorkerNodes(count int) bool { | ||
| func (r *ReconcileWindowsMachineConfig) addWindowsWorkerNodes(count int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addWorkerNode
go.mod
Outdated
| github.com/openshift/api v3.9.1-0.20190924102528-32369d4db2ad+incompatible | ||
| github.com/openshift/client-go v0.0.0-20190923180330-3b6373338c9b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot be pinned to 3.9. That version is too old.
| // tracker is used to track all the Windows nodes created via WMCO | ||
| tracker *tracker.Tracker | ||
| // nodeConfigurer is what is used to configure the created Windows VMs | ||
| nodeConfigurer nodeconfig.Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please call this nodeConfig
| ) | ||
|
|
||
| type Interface interface { | ||
| Configure(types.WindowsVM) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to pass in types.WindowsVM? Can't you create a mocked version of nodeConfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do create a mocked version, but in order to use the mocked version within the reconciler we need to pass in the nodeconfig entity in some way. As it was before the regular nodeconfig was being created within the reconciliation process. That doesn't give us any way to switch to using the mocked one.
By making the nodeconfig part of the reconciler we can switch between using the regular and mocked version. Passing in the types.WindowsVM becomes necessary as we will now be using the same nodeconfig for multiple VMs, instead of a new one for every VM. We need to pass in the VM to select which VM will be configured.
| k8sclientset: clientset, | ||
| tracker: vmTracker, | ||
| windowsVMs: windowsVMs, | ||
| nodeConfigurer: nodeconfig.NewNodeConfig(clientset), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not able to understand why you need to do this. I don't see the utility of having generic node configuration entities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I explained this in the above message, let me know if this is a separate issue.
38d9a9e to
bd40393
Compare
aravindhp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, @sebsoto. Just a few changes in the error messages.
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/windowsmachineconfig/windowsmachineconfig_controller.go
Outdated
Show resolved
Hide resolved
Currently VMs that fail to become a node are left orphaned. This commit changes that, so that if a VM fails to become a node via bootstrapping it is deleted properly. This is being done so that we can properly track the status of the VMs we create.
bd40393 to
dbb8a21
Compare
aravindhp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Thanks for adding this, @sebsoto
|
/hold |
|
/lgtm cancel |
ravisantoshgudimetla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks for working on this PR @sebsoto
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aravindhp, ravisantoshgudimetla, sebsoto The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
[logging] Install Fluentd on Windows instance
This PR includes various commits to clean up the how the operator handles errors both in the creation of the controller, and the creation of nodes.