Skip to content

Add Preparing state to do manual clean#763

Merged
metal3-io-bot merged 1 commit intometal3-io:masterfrom
zouy414:preparing
Feb 3, 2021
Merged

Add Preparing state to do manual clean#763
metal3-io-bot merged 1 commit intometal3-io:masterfrom
zouy414:preparing

Conversation

@zouy414
Copy link
Copy Markdown
Member

@zouy414 zouy414 commented Jan 14, 2021

See: #292 (review)

Co-Authored-By: Dao Cong Tien tiendc@vn.fujitsu.com
Co-Authored-By: Nguyen Phuong An annp@vn.fujitsu.com
Co-Authored-By: Kim Bao Long longkb@vn.fujitsu.com
Signed-off-by: zouyu zouy.fnst@cn.fujitsu.com

@metal3-io-bot metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 14, 2021
@metal3-io-bot
Copy link
Copy Markdown
Contributor

Hi @Hellcatlk. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 14, 2021
@zouy414
Copy link
Copy Markdown
Member Author

zouy414 commented Jan 14, 2021

@andfasano @dtantsur @zaneb

@andfasano
Copy link
Copy Markdown
Member

/ok-to-test

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 14, 2021
Copy link
Copy Markdown
Member

@andfasano andfasano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this new smaller PR, that is surely more manageable for reviewing.

In general changes looks fine to me, there's only one point requiring a little bit of management in the default case.

In addition, it would be really useful to have a dedicate test for Prepare to verify all the various cases (and it will be also a good addition for the review) - hopefully the new mock used to cover other Provisioner parts could be used for that

Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this! This really does make it 1000x easier to review.

Comment thread apis/metal3.io/v1alpha1/baremetalhost_types.go Outdated
Comment thread apis/metal3.io/v1alpha1/baremetalhost_types.go Outdated
Comment thread controllers/metal3.io/baremetalhost_controller.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go
Comment thread pkg/provisioner/ironic/ironic.go
Comment thread pkg/provisioner/ironic/ironic.go Outdated
@zouy414 zouy414 force-pushed the preparing branch 3 times, most recently from b818e01 to 798c104 Compare January 18, 2021 02:39
@zouy414
Copy link
Copy Markdown
Member Author

zouy414 commented Jan 20, 2021

@zaneb PTAL.

@demonCoder95
Copy link
Copy Markdown
Member

@Hellcatlk It appears the PR has merge conflicts with master. And thank you for creating this smaller PR, it is definitely easier to handle for the reviewers.

@demonCoder95
Copy link
Copy Markdown
Member

/test-integration

@dtantsur @andfasano @zaneb
Can you please take a last look on this one. We need to get this merged quickly to move onto RAID and firmware configuration functionality we've been meaning to implement for a very long time now. Unfortunately this work have been dragged on too long and it's been very inconvenient especially for the reviewers.

Thank you!

Copy link
Copy Markdown
Member

@andfasano andfasano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart a couple of minor issues, I think the main point it's about determining the completion of the manual cleanup and avoiding to expose it as a technical/internal detail. I'd agree with @zaneb comment that flag adoption could be problematic - and very likely not be exposed directly to the end user, as it seems more fragile if not carefully managed in all the possible conditions. The suggested approach (working on a Status diff) seems more robust (as probably it could also cover the clean fail) and worth to be explored, especially in the case where it could be reused of other scenarios, like the early cleaning. Given that, I'll leave the final comment to @zaneb

Comment thread controllers/metal3.io/host_state_machine.go Outdated
Comment thread controllers/metal3.io/baremetalhost_controller.go Outdated
Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this is getting really close.

Comment thread controllers/metal3.io/baremetalhost_controller.go Outdated
Comment thread controllers/metal3.io/baremetalhost_controller.go
Comment thread docs/BaremetalHost_ProvisioningState.dot
Comment thread docs/BaremetalHost_ProvisioningState.dot Outdated
Comment thread controllers/metal3.io/host_state_machine.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go
Comment thread apis/metal3.io/v1alpha1/baremetalhost_types.go Outdated
Comment thread apis/metal3.io/v1alpha1/baremetalhost_types.go Outdated
@zouy414 zouy414 force-pushed the preparing branch 2 times, most recently from 12cbb90 to 3126fca Compare January 25, 2021 03:35
@demonCoder95
Copy link
Copy Markdown
Member

/test-integration

Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes. Eliminating the flag in the API makes this much easier to land, because bugs can always be fixed later but API changes are forever :)

Comment thread apis/metal3.io/v1alpha1/baremetalhost_types.go Outdated
Comment thread controllers/metal3.io/baremetalhost_controller.go
Comment thread controllers/metal3.io/baremetalhost_controller.go
Comment thread controllers/metal3.io/baremetalhost_controller.go
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread controllers/metal3.io/baremetalhost_controller.go
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
@demonCoder95
Copy link
Copy Markdown
Member

/test-integration

Comment thread apis/metal3.io/v1alpha1/baremetalhost_types.go Outdated
Comment thread controllers/metal3.io/baremetalhost_controller.go

case nodes.Manageable:
if unprepared {
started, result, err = p.startManualCleaning(ironicNode)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you changed the order of the return values, you could just return p.startManualCleaning(ironicNode).

Comment thread controllers/metal3.io/host_state_machine_test.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread controllers/metal3.io/baremetalhost_controller.go
Comment thread pkg/provisioner/ironic/ironic.go
Comment thread pkg/provisioner/ironic/ironic.go Outdated
@zouy414 zouy414 force-pushed the preparing branch 2 times, most recently from 9bac13c to b86adbc Compare January 27, 2021 02:43
Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks basically good to me - just a cosmetic comment inline because one of my earlier suggestions was ambiguous.
I'll approve now and somebody else can review once you've switched that argument order and squashed.
@andfasano this would probably be a good time to take another look over it.
/approve
/label tide/merge-method-squash
/test-integration

Thanks for all your work on this! Not only does this mean that we can hopefully get RAID support in without any more of the constant rebasing, but it's actually a big improvement to the bmo design in general, that opens up a path to make even more improvements in future.

Comment thread pkg/provisioner/ironic/ironic.go Outdated
// Prepare remove existing configuration and set new configuration.
// If `started` is true, it means that we successfully executed `tryChangeNodeProvisionState`.
func (p *ironicProvisioner) Prepare(unprepared bool) (result provisioner.Result, started bool, err error) {
func (p *ironicProvisioner) Prepare(unprepared bool) (started bool, result provisioner.Result, err error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was unclear in my previous comment. I meant change the order of the return values of startManualCleaning(), not of Prepare(). It should be result, started, error for consistency with other functions in the Provisioner interface that return 3 values (ValidateManagementAccess and InspectHardware).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2021
@zaneb zaneb added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 27, 2021
andfasano
andfasano approved these changes Jan 28, 2021

// Save provisioning settings.
provisioningSettings := info.host.Status.Provisioning.DeepCopy()
dirty, err := saveHostProvisioningSettings(info.host)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hellcatlk, just to clarify, IIUC @zaneb 's comment then saveHostProvisioningSettings will contains the necessary logic to trigger the manual cleaning in the Prepare. Right now in this method just the Provisioning.RootDeviceHints are checked. If it is expected to be enriched in future for the related RAID section, then probably I'd find more clear to add comment for that within such method

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. But note that this function already existed, and this is the right thing to do regardless of whether RAID is added.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was thinking to something more like the // TODO: in the buildManualCleaningSteps

Comment thread controllers/metal3.io/baremetalhost_controller.go
if provResult.Dirty {
result := actionContinue{provResult.RequeueAfter}
if clearError(info.host) || (dirty && started) {
// If clearError return true, but started is false, restore provisioningSettings.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not really clear to me, can you please elaborate it a little bit?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just describing how the if statement below can come to be true.


// Prepare remove existing configuration and set new configuration.
// If `started` is true, it means that we successfully executed `tryChangeNodeProvisionState`.
func (p *ironicProvisioner) Prepare(unprepared bool) (result provisioner.Result, started bool, err error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about covering this part using the new mocks? I think that for such relevant feature could be really useful

Comment thread controllers/metal3.io/host_state_machine.go
@zaneb zaneb removed the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 28, 2021
@metal3-io-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andfasano, Hellcatlk, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zouy414 zouy414 force-pushed the preparing branch 2 times, most recently from 9ced42a to ec6f98a Compare January 29, 2021 06:46
@zaneb
Copy link
Copy Markdown
Member

zaneb commented Jan 29, 2021

/lgtm
/test-integration

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2021
@zaneb
Copy link
Copy Markdown
Member

zaneb commented Jan 29, 2021

Tests are failing on:

errors in package: [/home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:11: m.nextResult undefined (type *mockProvisioner has no field or method nextResult) /home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:25: m.nextResult undefined (type *mockProvisioner has no field or method nextResult)]"

This is due to #725, which just merged this morning, changing the test code :(

@andfasano
Copy link
Copy Markdown
Member

Tests are failing on:

errors in package: [/home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:11: m.nextResult undefined (type *mockProvisioner has no field or method nextResult) /home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:25: m.nextResult undefined (type *mockProvisioner has no field or method nextResult)]"

This is due to #725, which just merged this morning, changing the test code :(

Hopefully it could be fixed easily by changing the mockProvisioner.Prepare() method. There will be anyhow another failure to fix:

Scenario: "matchprofile-to-ready",
since it changed to match-profile -> preparing, and at this point probably it could be a good idea to add a new case for preparing -> ready in the same test.

@metal3-io-bot metal3-io-bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 2, 2021
@zouy414
Copy link
Copy Markdown
Member Author

zouy414 commented Feb 2, 2021

Tests are failing on:

errors in package: [/home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:11: m.nextResult undefined (type *mockProvisioner has no field or method nextResult) /home/prow/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine_test.go:621:25: m.nextResult undefined (type *mockProvisioner has no field or method nextResult)]"

This is due to #725, which just merged this morning, changing the test code :(

Hopefully it could be fixed easily by changing the mockProvisioner.Prepare() method. There will be anyhow another failure to fix:

Scenario: "matchprofile-to-ready",

since it changed to match-profile -> preparing, and at this point probably it could be a good idea to add a new case for preparing -> ready in the same test.

Done.

Co-Authored-By: Dao Cong Tien <tiendc@vn.fujitsu.com>
Co-Authored-By: Nguyen Phuong An <annp@vn.fujitsu.com>
Co-Authored-By: Kim Bao Long <longkb@vn.fujitsu.com>
Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>
result, err = operationContinuing(provisionRequeueDelay)

default:
result, err = transientError(fmt.Errorf("Have unexpected ironic node state %s", ironicNode.ProvisionState))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: during the review I noticed that sometimes some nodes ended up here having the Provisioning state still set to inspectWait. This is due the fact that in the Inspection state BMO decides to move on by looking just at the Inspector result, without waiting for the node to go back to the manageable state. We could fix this problem in another PR, since the code here is handling the described scenario without any side effect

@andfasano
Copy link
Copy Markdown
Member

/lgtm

Just a final note (@dtantsur @zaneb): this PR modifies the BMO's state machine by adding an extra step for the manual cleaning during the initial phase for provisioning a node. I'm not sure how it will be impacting, but sounds like that this new state could be considered when evaluating the provisioning limit introduced in #725

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 2, 2021
@zaneb
Copy link
Copy Markdown
Member

zaneb commented Feb 3, 2021

/test-integration

@metal3-io-bot metal3-io-bot merged commit e2f5e41 into metal3-io:master Feb 3, 2021
@zouy414 zouy414 deleted the preparing branch April 19, 2021 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants