Skip to content

Limit the number of hosts simultaneously provisioned#725

Merged
metal3-io-bot merged 2 commits intometal3-io:masterfrom
andfasano:limit-hosts
Jan 29, 2021
Merged

Limit the number of hosts simultaneously provisioned#725
metal3-io-bot merged 2 commits intometal3-io:masterfrom
andfasano:limit-hosts

Conversation

@andfasano
Copy link
Copy Markdown
Member

@andfasano andfasano commented Nov 19, 2020

This PR introduces a soft limit for the number of hosts being simultaneously being provisioned.

Design doc: https://github.com/metal3-io/metal3-docs/blob/master/design/baremetal-operator/limit-hosts-provisioning.md.

A new configuration variable is provided, called PROVISIONING_LIMIT (defaulted to 20), to specify the desired batch size that the operator will try to honor while provisioning new hosts. If there are no available slots, the current action for an host will be rescheduled using a fixed backoff with jittering to spread evenly the delayed requests.
Due the asynchronous behaviour of the Ironic subsystem, and the configurable level of concurrency of the operator, it's not possible to guarantee that such limit will be always respected, and occasional overflows could happen especially during the initial burst; nevertheless the operator will try to enforce such limit as much as possible, and for such reason it's recommended to keep a configuration where PROVISIONING_LIMIT is greater than BMO_CONCURRENCY.


Demo

Here's a quick demo of the current implementation, using a limit set to 8 and a BMO concurrency level of 3 (demo speeded up). A batch of additional 15 nodes is added to the cluster. It is possible to note the overflow on the initial burst, then the limit is maintained until the provisioning was completed.

asciicast

@metal3-io-bot metal3-io-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 19, 2020
Comment thread pkg/provisioner/ironic/ironic.go Outdated
continue
}

switch node.ProvisionState {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would looking at the target provisioning state instead/as well provide a faster response to changes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it didn't :/. In my tests also TargetState usually points just to manageable or active states

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the start of provisioning we are usually in manageable state (not always after #716, but always for the first deployment). So if we're seeing e.g. current state is manageable but target state is available, that machine is probably about to be provisioned. Ditto for current state is available and target state is active.
I assume the target state must be updated synchronously, so it really should be available whenever we look.

I suspect the real problem here is that there's a large delay between the point where you're making a go/no-go decision (on entry to the Provisioning state) and when we actually ask Ironic to go to active (not until after cleaning). And now it's even more fun because a node in the available state may or may not represent a Host in the provisioning state.

I think the solution to this is the Preparing state I proposed on the mailing list. That way all nodes will be in available before starting provisioning, so we'd only have to look for those with target state active.

Copy link
Copy Markdown
Member Author

@andfasano andfasano Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtantsur can you please confirm such behavior? From my local tests the state doesn't seem to be update synchronously, neither the target_provision_state (just after a provision change request).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current state is manageable but target state is available

To nitpick a bit: this is not possible. So called stable states always have target state None. Stable states are states where a node simply stays without external input (BMO is external input in this context). This includes manageable, available, active. Non-empty target state appears in so called transient states: wait call-back, all states ending with ..ing or .. wait.

Now let me check the code to provide a more targeted answer.

@andfasano andfasano force-pushed the limit-hosts branch 3 times, most recently from 63bf033 to cac0197 Compare December 4, 2020 13:40
Comment thread controllers/metal3.io/host_state_machine.go
Comment thread pkg/provisioner/ironic/ironic.go Outdated
@andfasano andfasano force-pushed the limit-hosts branch 3 times, most recently from 3c4bcc7 to e2cdf65 Compare December 7, 2020 11:34
Comment thread controllers/metal3.io/baremetalhost_controller.go Outdated
Comment thread controllers/metal3.io/baremetalhost_controller.go Outdated
Comment thread controllers/metal3.io/action_result.go Outdated
Comment thread controllers/metal3.io/action_result.go Outdated
Comment thread controllers/metal3.io/action_result.go Outdated
Comment thread pkg/provisioner/provisioner.go Outdated
Comment thread controllers/metal3.io/host_state_machine.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
continue
}

switch node.ProvisionState {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the start of provisioning we are usually in manageable state (not always after #716, but always for the first deployment). So if we're seeing e.g. current state is manageable but target state is available, that machine is probably about to be provisioned. Ditto for current state is available and target state is active.
I assume the target state must be updated synchronously, so it really should be available whenever we look.

I suspect the real problem here is that there's a large delay between the point where you're making a go/no-go decision (on entry to the Provisioning state) and when we actually ask Ironic to go to active (not until after cleaning). And now it's even more fun because a node in the available state may or may not represent a Host in the provisioning state.

I think the solution to this is the Preparing state I proposed on the mailing list. That way all nodes will be in available before starting provisioning, so we'd only have to look for those with target state active.

Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so if I'm following correctly here, the main thing you've changed is to check for capacity and block reconciling continuously, rather than just preventing the Host from entering the Provisioning state. (In fact, if we do it this way I think we should stop blocking the host from changing state - better to consistently block only in Provisioning than sometimes in Provisioning and sometimes in Ready.) This is suboptimal, because it may mean that we e.g. let a node start cleaning when another node that has already finished cleaning is waiting to start provisioning, but possibly necessary because of the very large delay between entering the Provisioning state and doing anything that looks like provisioning in Ironic. However, if we find that only a subset of states during provisioning are the ones that are actually placing strain on ironic (sitting there waiting for IPA to clean a node might be easy, for example) then this could still come out as a win.

The following combinations of current and target node provisioning states correspond to Hosts in the Provisioning state with what I'd expect (with one exception) to be a fairly high degree of consistency:

  • Manageable -> TargetProvide
  • Cleaning -> TargetProvide
  • CleanWait -> TargetProvide
  • CleanFail (could actually be deprovisioning, but it's brief enough that we can live with it)
  • Available -> TargetActive
  • Deploying
  • DeployWait

However, there is a large hole there - once the state becomes Available but before we request TargetActive, we cannot tell from the ironic states alone whether the node is Provisioning or Ready after deprovisioning. (Unless, of course, we were to store this information somewhere in ironic - which is totally possible.) While setting up the data and requesting a move to active is relatively quick, it may take some time for the operator to notice that the node has reached Available and in that window it may start allowing other Hosts to start provisioning.

If we implement my proposal to add a Preparing state before Ready, so that the request for TargetActive always happens right at the start of Provisioning, then I think that gating on state changes only would become viable again. Since we want that state anyway, it's worth at least considering whether our path would look easier if we did that first or merged this and then reworked it later, assuming we prefer to gate only on state changes.

Use of the error message and error type fields is effective (operational status would be another option, but this is more visible). However, after having written #749 I now think it's a mistake to increment the error count even once because it mixes these routine delays with other possible causes for error. I think all we really need here is a random backoff from a fixed interval. It's fine to add a new actionResult type for this - we actually need one for retry-after-delay anyway.

Comment thread pkg/provisioner/ironic/ironic.go Outdated
for _, node := range allNodes {

switch node.ProvisionState {
case "cleaning", "clean wait", "inspecting", "inspect wait",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be that clean wait and inspect wait are not states that are particularly taxing on ironic and could be left off the list. It would be worth checking with the Ironic team.

Comment thread pkg/provisioner/ironic/ironic.go
Copy link
Copy Markdown
Member

@dtantsur dtantsur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andfasano
Copy link
Copy Markdown
Member Author

The following combinations of current and target node provisioning states correspond to Hosts in the Provisioning state with what I'd expect (with one exception) to be a fairly high degree of consistency:

* `Manageable` -> `TargetProvide`

* `Cleaning` -> `TargetProvide`

* `CleanWait` -> `TargetProvide`

* `CleanFail` (could actually be deprovisioning, but it's brief enough that we can live with it)

* `Available` -> `TargetActive`

* `Deploying`

* `DeployWait`

Apart the manageable and available states that cannot be used as per the @dtantsur comment, the others have been already in use - the overall goal was to roughly define how busy is currently Ironic (in the initial proposal the subset was slightly larger, including the rescuing/unrescuing/deleting states, which probably could make sense to reintroduce now).

However, there is a large hole there - once the state becomes Available but before we request TargetActive, we cannot tell from the ironic states alone whether the node is Provisioning or Ready after deprovisioning. (Unless, of course, we were to store this information somewhere in ironic - which is totally possible.) While setting up the data and requesting a move to active is relatively quick, it may take some time for the operator to notice that the node has reached Available and in that window it may start allowing other Hosts to start provisioning.

If we implement my proposal to add a Preparing state before Ready, so that the request for TargetActive always happens right at the start of Provisioning, then I think that gating on state changes only would become viable again. Since we want that state anyway, it's worth at least considering whether our path would look easier if we did that first or merged this and then reworked it later, assuming we prefer to gate only on state changes.

Currently gating on state changes seems to work reasonably well, without affecting the convergence of the overall provisioning, so I'd prefer to proceed and merge with this approach and review it later once the new Preparing state will be available.

Use of the error message and error type fields is effective (operational status would be another option, but this is more visible). However, after having written #749 I now think it's a mistake to increment the error count even once because it mixes these routine delays with other possible causes for error. I think all we really need here is a random backoff from a fixed interval. It's fine to add a new actionResult type for this - we actually need one for retry-after-delay anyway.

The ProvisionerBusy error is now used to detect that an host have been previously delayed, so an initial check can be performed even before applying the state machine. If we want to move to a plain action result for the backoff - and also see the usefulness of that - then I suppose operational status could be used to track this situation

@andfasano
Copy link
Copy Markdown
Member Author

Use of the error message and error type fields is effective (operational status would be another option, but this is more visible). However, after having written #749 I now think it's a mistake to increment the error count even once because it mixes these routine delays with other possible causes for error. I think all we really need here is a random backoff from a fixed interval. It's fine to add a new actionResult type for this - we actually need one for retry-after-delay anyway.

The ProvisionerBusy error is now used to detect that an host have been previously delayed, so an initial check can be performed even before applying the state machine. If we want to move to a plain action result for the backoff - and also see the usefulness of that - then I suppose operational status could be used to track this situation

Update: I've added a new actionDelayed that randomly backoff in [1m, 2m] and switched to use OperationalStatus to track delayed hosts instead of the error type

@andfasano
Copy link
Copy Markdown
Member Author

/test-integration

2 similar comments
@andfasano
Copy link
Copy Markdown
Member Author

/test-integration

@andfasano
Copy link
Copy Markdown
Member Author

/test-integration

@andfasano andfasano force-pushed the limit-hosts branch 2 times, most recently from 6196aa5 to 4bd080b Compare December 21, 2020 16:57
@andfasano
Copy link
Copy Markdown
Member Author

/test-integration

@andfasano andfasano changed the title [WIP] Limit the number of hosts simultaneously provisioned Limit the number of hosts simultaneously provisioned Jan 5, 2021
@metal3-io-bot metal3-io-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2021
Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Comment thread controllers/metal3.io/action_result.go
Comment thread controllers/metal3.io/host_state_machine.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go Outdated
Comment thread pkg/provisioner/ironic/ironic.go
Comment thread controllers/metal3.io/action_result.go Outdated
// has any sort of error.
OperationalStatusError OperationalStatus = "error"

OperationalStatusDelayed = "delayed"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there is a better name for this, but I can't think of one.

@metal3-io-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andfasano, dtantsur, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 13, 2021
@andfasano
Copy link
Copy Markdown
Member Author

Commits squashed and demo cast updated

Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase required, but otherwise looks good to me. Love the demo!

Comment thread controllers/metal3.io/action_result.go Outdated
@metal3-io-bot metal3-io-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 18, 2021
@andfasano
Copy link
Copy Markdown
Member Author

/test-integration

@andfasano
Copy link
Copy Markdown
Member Author

/test-integration

@andfasano
Copy link
Copy Markdown
Member Author

/test-integration

@andfasano andfasano requested a review from zaneb January 25, 2021 10:09
@zaneb
Copy link
Copy Markdown
Member

zaneb commented Jan 26, 2021

I don't see any issues with the rebase. Who wants to LGTM?

@honza
Copy link
Copy Markdown
Member

honza commented Jan 29, 2021

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2021
@metal3-io-bot metal3-io-bot merged commit 3f79aae into metal3-io:master Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants