Add Preparing state to do manual clean by zouy414 · Pull Request #763 · metal3-io/baremetal-operator

zouy414 · 2021-01-14T01:33:22Z

Co-Authored-By: Dao Cong Tien tiendc@vn.fujitsu.com
Co-Authored-By: Nguyen Phuong An annp@vn.fujitsu.com
Co-Authored-By: Kim Bao Long longkb@vn.fujitsu.com
Signed-off-by: zouyu zouy.fnst@cn.fujitsu.com

metal3-io-bot · 2021-01-14T01:33:39Z

Hi @Hellcatlk. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

zouy414 · 2021-01-14T01:35:55Z

@andfasano @dtantsur @zaneb

andfasano · 2021-01-14T15:23:38Z

/ok-to-test

andfasano

Thanks for this new smaller PR, that is surely more manageable for reviewing.

In general changes looks fine to me, there's only one point requiring a little bit of management in the default case.

In addition, it would be really useful to have a dedicate test for Prepare to verify all the various cases (and it will be also a good addition for the review) - hopefully the new mock used to cover other Provisioner parts could be used for that

zaneb

Thanks for opening this! This really does make it 1000x easier to review.

zouy414 · 2021-01-20T01:24:36Z

@zaneb PTAL.

demonCoder95 · 2021-01-21T06:46:48Z

@Hellcatlk It appears the PR has merge conflicts with master. And thank you for creating this smaller PR, it is definitely easier to handle for the reviewers.

demonCoder95 · 2021-01-21T07:33:14Z

/test-integration

@dtantsur @andfasano @zaneb
Can you please take a last look on this one. We need to get this merged quickly to move onto RAID and firmware configuration functionality we've been meaning to implement for a very long time now. Unfortunately this work have been dragged on too long and it's been very inconvenient especially for the reviewers.

Thank you!

andfasano

Apart a couple of minor issues, I think the main point it's about determining the completion of the manual cleanup and avoiding to expose it as a technical/internal detail. I'd agree with @zaneb comment that flag adoption could be problematic - and very likely not be exposed directly to the end user, as it seems more fragile if not carefully managed in all the possible conditions. The suggested approach (working on a Status diff) seems more robust (as probably it could also cover the clean fail) and worth to be explored, especially in the case where it could be reused of other scenarios, like the early cleaning. Given that, I'll leave the final comment to @zaneb

zaneb

Thanks, I think this is getting really close.

demonCoder95 · 2021-01-25T14:24:00Z

/test-integration

zaneb

Thanks for making these changes. Eliminating the flag in the API makes this much easier to land, because bugs can always be fixed later but API changes are forever :)

demonCoder95 · 2021-01-26T14:57:38Z

/test-integration

zaneb · 2021-01-26T18:20:17Z

+
+	case nodes.Manageable:
+		if unprepared {
+			started, result, err = p.startManualCleaning(ironicNode)


nit: if you changed the order of the return values, you could just return p.startManualCleaning(ironicNode).

zaneb

This looks basically good to me - just a cosmetic comment inline because one of my earlier suggestions was ambiguous.
I'll approve now and somebody else can review once you've switched that argument order and squashed.
@andfasano this would probably be a good time to take another look over it.
/approve
/label tide/merge-method-squash
/test-integration

Thanks for all your work on this! Not only does this mean that we can hopefully get RAID support in without any more of the constant rebasing, but it's actually a big improvement to the bmo design in general, that opens up a path to make even more improvements in future.

zaneb · 2021-01-27T16:43:40Z

 // Prepare remove existing configuration and set new configuration.
 // If `started` is true,  it means that we successfully executed `tryChangeNodeProvisionState`.
-func (p *ironicProvisioner) Prepare(unprepared bool) (result provisioner.Result, started bool, err error) {
+func (p *ironicProvisioner) Prepare(unprepared bool) (started bool, result provisioner.Result, err error) {


Sorry, I was unclear in my previous comment. I meant change the order of the return values of startManualCleaning(), not of Prepare(). It should be result, started, error for consistency with other functions in the Provisioner interface that return 3 values (ValidateManagementAccess and InspectHardware).

Thanks, done.

andfasano · 2021-01-28T12:46:37Z

+
+	// Save provisioning settings.
+	provisioningSettings := info.host.Status.Provisioning.DeepCopy()
+	dirty, err := saveHostProvisioningSettings(info.host)


@Hellcatlk, just to clarify, IIUC @zaneb 's comment then saveHostProvisioningSettings will contains the necessary logic to trigger the manual cleaning in the Prepare. Right now in this method just the Provisioning.RootDeviceHints are checked. If it is expected to be enriched in future for the related RAID section, then probably I'd find more clear to add comment for that within such method

That's correct. But note that this function already existed, and this is the right thing to do regardless of whether RAID is added.

Right, I was thinking to something more like the // TODO: in the buildManualCleaningSteps

andfasano · 2021-01-28T12:48:41Z

+	if provResult.Dirty {
+		result := actionContinue{provResult.RequeueAfter}
+		if clearError(info.host) || (dirty && started) {
+			// If clearError return true, but started is false, restore provisioningSettings.


This comment is not really clear to me, can you please elaborate it a little bit?

This is just describing how the if statement below can come to be true.

andfasano · 2021-01-28T12:55:12Z

+
+// Prepare remove existing configuration and set new configuration.
+// If `started` is true,  it means that we successfully executed `tryChangeNodeProvisionState`.
+func (p *ironicProvisioner) Prepare(unprepared bool) (result provisioner.Result, started bool, err error) {


What about covering this part using the new mocks? I think that for such relevant feature could be really useful

metal3-io-bot · 2021-01-28T17:00:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andfasano, Hellcatlk, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [zaneb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zaneb · 2021-01-29T14:18:26Z

/lgtm
/test-integration

zaneb · 2021-01-29T16:26:24Z

Tests are failing on:

errors in package: [/home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:11: m.nextResult undefined (type *mockProvisioner has no field or method nextResult) /home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:25: m.nextResult undefined (type *mockProvisioner has no field or method nextResult)]"

This is due to #725, which just merged this morning, changing the test code :(

andfasano · 2021-01-29T17:04:42Z

Tests are failing on:

errors in package: [/home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:11: m.nextResult undefined (type *mockProvisioner has no field or method nextResult) /home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:25: m.nextResult undefined (type *mockProvisioner has no field or method nextResult)]"

This is due to #725, which just merged this morning, changing the test code :(

Hopefully it could be fixed easily by changing the mockProvisioner.Prepare() method. There will be anyhow another failure to fix:

baremetal-operator/controllers/metal3.io/host_state_machine_test.go

Line 382 in 3f79aae

Scenario: "matchprofile-to-ready",

since it changed to match-profile -> preparing, and at this point probably it could be a good idea to add a new case for preparing -> ready in the same test.

zouy414 · 2021-02-02T01:32:36Z

Tests are failing on:
errors in package: [/home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:11: m.nextResult undefined (type *mockProvisioner has no field or method nextResult) /home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:25: m.nextResult undefined (type *mockProvisioner has no field or method nextResult)]"
This is due to #725, which just merged this morning, changing the test code :(
Hopefully it could be fixed easily by changing the mockProvisioner.Prepare() method. There will be anyhow another failure to fix:

baremetal-operator/controllers/metal3.io/host_state_machine_test.go

Line 382 in 3f79aae

Scenario: "matchprofile-to-ready",

since it changed to match-profile -> preparing, and at this point probably it could be a good idea to add a new case for preparing -> ready in the same test.

Done.

Co-Authored-By: Dao Cong Tien <tiendc@vn.fujitsu.com> Co-Authored-By: Nguyen Phuong An <annp@vn.fujitsu.com> Co-Authored-By: Kim Bao Long <longkb@vn.fujitsu.com> Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>

andfasano · 2021-02-02T10:48:49Z

+		result, err = operationContinuing(provisionRequeueDelay)
+
+	default:
+		result, err = transientError(fmt.Errorf("Have unexpected ironic node state %s", ironicNode.ProvisionState))


Note: during the review I noticed that sometimes some nodes ended up here having the Provisioning state still set to inspectWait. This is due the fact that in the Inspection state BMO decides to move on by looking just at the Inspector result, without waiting for the node to go back to the manageable state. We could fix this problem in another PR, since the code here is handling the described scenario without any side effect

andfasano · 2021-02-02T10:55:30Z

/lgtm

Just a final note (@dtantsur @zaneb): this PR modifies the BMO's state machine by adding an extra step for the manual cleaning during the initial phase for provisioning a node. I'm not sure how it will be impacting, but sounds like that this new state could be considered when evaluating the provisioning limit introduced in #725

zaneb · 2021-02-03T13:52:10Z

/test-integration

metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 14, 2021

metal3-io-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 14, 2021

metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 14, 2021

andfasano reviewed Jan 14, 2021

View reviewed changes

Comment thread pkg/provisioner/ironic/ironic.go Outdated

Comment thread pkg/provisioner/ironic/ironic.go

Comment thread pkg/provisioner/ironic/ironic.go Outdated

zouy414 force-pushed the preparing branch from c1aca71 to 048a5e4 Compare January 15, 2021 05:03

zaneb reviewed Jan 15, 2021

View reviewed changes

zouy414 force-pushed the preparing branch 3 times, most recently from b818e01 to 798c104 Compare January 18, 2021 02:39

zouy414 force-pushed the preparing branch from 798c104 to b0d44af Compare January 21, 2021 07:10

andfasano reviewed Jan 21, 2021

View reviewed changes

Comment thread controllers/metal3.io/host_state_machine.go Outdated

Comment thread controllers/metal3.io/baremetalhost_controller.go Outdated

zouy414 force-pushed the preparing branch from b0d44af to 6b0961a Compare January 22, 2021 01:38

zaneb reviewed Jan 22, 2021

View reviewed changes

zouy414 force-pushed the preparing branch 2 times, most recently from 12cbb90 to 3126fca Compare January 25, 2021 03:35

zaneb reviewed Jan 25, 2021

View reviewed changes

zaneb reviewed Jan 26, 2021

View reviewed changes

Comment thread controllers/metal3.io/baremetalhost_controller.go

Comment thread pkg/provisioner/ironic/ironic.go

Comment thread pkg/provisioner/ironic/ironic.go Outdated

zouy414 force-pushed the preparing branch 2 times, most recently from 9bac13c to b86adbc Compare January 27, 2021 02:43

zaneb reviewed Jan 27, 2021

View reviewed changes

metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2021

zaneb added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 27, 2021

zouy414 force-pushed the preparing branch from b86adbc to 8a5dde1 Compare January 28, 2021 01:11

andfasano approved these changes Jan 28, 2021 •

edited

Loading

View reviewed changes

andfasano suggested changes Jan 28, 2021

View reviewed changes

Comment thread controllers/metal3.io/host_state_machine.go

zaneb removed the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 28, 2021

andfasano approved these changes Jan 28, 2021

View reviewed changes

zouy414 force-pushed the preparing branch 2 times, most recently from 9ced42a to ec6f98a Compare January 29, 2021 06:46

metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2021

zouy414 force-pushed the preparing branch from ec6f98a to 4f7ac53 Compare February 2, 2021 01:32

metal3-io-bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 2, 2021

Add Preparing state to do manual clean

4f7ac53

Co-Authored-By: Dao Cong Tien <tiendc@vn.fujitsu.com> Co-Authored-By: Nguyen Phuong An <annp@vn.fujitsu.com> Co-Authored-By: Kim Bao Long <longkb@vn.fujitsu.com> Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>

andfasano reviewed Feb 2, 2021

View reviewed changes

metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 2, 2021

andfasano mentioned this pull request Feb 2, 2021

Wait for inspection completion #785

Merged

metal3-io-bot merged commit e2f5e41 into metal3-io:master Feb 3, 2021

zaneb mentioned this pull request Apr 1, 2021

Add proposal to do bulk set of BIOS configuration metal3-io/metal3-docs#173

Merged

zouy414 deleted the preparing branch April 19, 2021 09:23

Conversation

zouy414 commented Jan 14, 2021

Uh oh!

metal3-io-bot commented Jan 14, 2021

Uh oh!

zouy414 commented Jan 14, 2021

Uh oh!

andfasano commented Jan 14, 2021

Uh oh!

andfasano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zaneb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zouy414 commented Jan 20, 2021

Uh oh!

demonCoder95 commented Jan 21, 2021

Uh oh!

demonCoder95 commented Jan 21, 2021

Uh oh!

andfasano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zaneb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

demonCoder95 commented Jan 25, 2021

Uh oh!

zaneb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

demonCoder95 commented Jan 26, 2021

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zaneb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment