Add Preparing state to do manual clean#763
Conversation
|
Hi @Hellcatlk. Thanks for your PR. I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
andfasano
left a comment
There was a problem hiding this comment.
Thanks for this new smaller PR, that is surely more manageable for reviewing.
In general changes looks fine to me, there's only one point requiring a little bit of management in the default case.
In addition, it would be really useful to have a dedicate test for Prepare to verify all the various cases (and it will be also a good addition for the review) - hopefully the new mock used to cover other Provisioner parts could be used for that
zaneb
left a comment
There was a problem hiding this comment.
Thanks for opening this! This really does make it 1000x easier to review.
b818e01 to
798c104
Compare
|
@zaneb PTAL. |
|
@Hellcatlk It appears the PR has merge conflicts with master. And thank you for creating this smaller PR, it is definitely easier to handle for the reviewers. |
|
/test-integration @dtantsur @andfasano @zaneb Thank you! |
andfasano
left a comment
There was a problem hiding this comment.
Apart a couple of minor issues, I think the main point it's about determining the completion of the manual cleanup and avoiding to expose it as a technical/internal detail. I'd agree with @zaneb comment that flag adoption could be problematic - and very likely not be exposed directly to the end user, as it seems more fragile if not carefully managed in all the possible conditions. The suggested approach (working on a Status diff) seems more robust (as probably it could also cover the clean fail) and worth to be explored, especially in the case where it could be reused of other scenarios, like the early cleaning. Given that, I'll leave the final comment to @zaneb
zaneb
left a comment
There was a problem hiding this comment.
Thanks, I think this is getting really close.
12cbb90 to
3126fca
Compare
|
/test-integration |
zaneb
left a comment
There was a problem hiding this comment.
Thanks for making these changes. Eliminating the flag in the API makes this much easier to land, because bugs can always be fixed later but API changes are forever :)
|
/test-integration |
|
|
||
| case nodes.Manageable: | ||
| if unprepared { | ||
| started, result, err = p.startManualCleaning(ironicNode) |
There was a problem hiding this comment.
nit: if you changed the order of the return values, you could just return p.startManualCleaning(ironicNode).
9bac13c to
b86adbc
Compare
zaneb
left a comment
There was a problem hiding this comment.
This looks basically good to me - just a cosmetic comment inline because one of my earlier suggestions was ambiguous.
I'll approve now and somebody else can review once you've switched that argument order and squashed.
@andfasano this would probably be a good time to take another look over it.
/approve
/label tide/merge-method-squash
/test-integration
Thanks for all your work on this! Not only does this mean that we can hopefully get RAID support in without any more of the constant rebasing, but it's actually a big improvement to the bmo design in general, that opens up a path to make even more improvements in future.
| // Prepare remove existing configuration and set new configuration. | ||
| // If `started` is true, it means that we successfully executed `tryChangeNodeProvisionState`. | ||
| func (p *ironicProvisioner) Prepare(unprepared bool) (result provisioner.Result, started bool, err error) { | ||
| func (p *ironicProvisioner) Prepare(unprepared bool) (started bool, result provisioner.Result, err error) { |
There was a problem hiding this comment.
Sorry, I was unclear in my previous comment. I meant change the order of the return values of startManualCleaning(), not of Prepare(). It should be result, started, error for consistency with other functions in the Provisioner interface that return 3 values (ValidateManagementAccess and InspectHardware).
|
|
||
| // Save provisioning settings. | ||
| provisioningSettings := info.host.Status.Provisioning.DeepCopy() | ||
| dirty, err := saveHostProvisioningSettings(info.host) |
There was a problem hiding this comment.
@Hellcatlk, just to clarify, IIUC @zaneb 's comment then saveHostProvisioningSettings will contains the necessary logic to trigger the manual cleaning in the Prepare. Right now in this method just the Provisioning.RootDeviceHints are checked. If it is expected to be enriched in future for the related RAID section, then probably I'd find more clear to add comment for that within such method
There was a problem hiding this comment.
That's correct. But note that this function already existed, and this is the right thing to do regardless of whether RAID is added.
There was a problem hiding this comment.
Right, I was thinking to something more like the // TODO: in the buildManualCleaningSteps
| if provResult.Dirty { | ||
| result := actionContinue{provResult.RequeueAfter} | ||
| if clearError(info.host) || (dirty && started) { | ||
| // If clearError return true, but started is false, restore provisioningSettings. |
There was a problem hiding this comment.
This comment is not really clear to me, can you please elaborate it a little bit?
There was a problem hiding this comment.
This is just describing how the if statement below can come to be true.
|
|
||
| // Prepare remove existing configuration and set new configuration. | ||
| // If `started` is true, it means that we successfully executed `tryChangeNodeProvisionState`. | ||
| func (p *ironicProvisioner) Prepare(unprepared bool) (result provisioner.Result, started bool, err error) { |
There was a problem hiding this comment.
What about covering this part using the new mocks? I think that for such relevant feature could be really useful
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andfasano, Hellcatlk, zaneb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9ced42a to
ec6f98a
Compare
|
/lgtm |
|
Tests are failing on: This is due to #725, which just merged this morning, changing the test code :( |
Hopefully it could be fixed easily by changing the match-profile -> preparing, and at this point probably it could be a good idea to add a new case for preparing -> ready in the same test.
|
Done. |
Co-Authored-By: Dao Cong Tien <tiendc@vn.fujitsu.com> Co-Authored-By: Nguyen Phuong An <annp@vn.fujitsu.com> Co-Authored-By: Kim Bao Long <longkb@vn.fujitsu.com> Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>
| result, err = operationContinuing(provisionRequeueDelay) | ||
|
|
||
| default: | ||
| result, err = transientError(fmt.Errorf("Have unexpected ironic node state %s", ironicNode.ProvisionState)) |
There was a problem hiding this comment.
Note: during the review I noticed that sometimes some nodes ended up here having the Provisioning state still set to inspectWait. This is due the fact that in the Inspection state BMO decides to move on by looking just at the Inspector result, without waiting for the node to go back to the manageable state. We could fix this problem in another PR, since the code here is handling the described scenario without any side effect
|
/lgtm Just a final note (@dtantsur @zaneb): this PR modifies the BMO's state machine by adding an extra step for the manual cleaning during the initial phase for provisioning a node. I'm not sure how it will be impacting, but sounds like that this new state could be considered when evaluating the provisioning limit introduced in #725 |
|
/test-integration |
See: #292 (review)
Co-Authored-By: Dao Cong Tien tiendc@vn.fujitsu.com
Co-Authored-By: Nguyen Phuong An annp@vn.fujitsu.com
Co-Authored-By: Kim Bao Long longkb@vn.fujitsu.com
Signed-off-by: zouyu zouy.fnst@cn.fujitsu.com