Limit the number of hosts simultaneously provisioned#725
Limit the number of hosts simultaneously provisioned#725metal3-io-bot merged 2 commits intometal3-io:masterfrom
Conversation
| continue | ||
| } | ||
|
|
||
| switch node.ProvisionState { |
There was a problem hiding this comment.
Would looking at the target provisioning state instead/as well provide a faster response to changes?
There was a problem hiding this comment.
Unfortunately it didn't :/. In my tests also TargetState usually points just to manageable or active states
There was a problem hiding this comment.
At the start of provisioning we are usually in manageable state (not always after #716, but always for the first deployment). So if we're seeing e.g. current state is manageable but target state is available, that machine is probably about to be provisioned. Ditto for current state is available and target state is active.
I assume the target state must be updated synchronously, so it really should be available whenever we look.
I suspect the real problem here is that there's a large delay between the point where you're making a go/no-go decision (on entry to the Provisioning state) and when we actually ask Ironic to go to active (not until after cleaning). And now it's even more fun because a node in the available state may or may not represent a Host in the provisioning state.
I think the solution to this is the Preparing state I proposed on the mailing list. That way all nodes will be in available before starting provisioning, so we'd only have to look for those with target state active.
There was a problem hiding this comment.
@dtantsur can you please confirm such behavior? From my local tests the state doesn't seem to be update synchronously, neither the target_provision_state (just after a provision change request).
There was a problem hiding this comment.
current state is manageable but target state is available
To nitpick a bit: this is not possible. So called stable states always have target state None. Stable states are states where a node simply stays without external input (BMO is external input in this context). This includes manageable, available, active. Non-empty target state appears in so called transient states: wait call-back, all states ending with ..ing or .. wait.
Now let me check the code to provide a more targeted answer.
63bf033 to
cac0197
Compare
3c4bcc7 to
e2cdf65
Compare
| continue | ||
| } | ||
|
|
||
| switch node.ProvisionState { |
There was a problem hiding this comment.
At the start of provisioning we are usually in manageable state (not always after #716, but always for the first deployment). So if we're seeing e.g. current state is manageable but target state is available, that machine is probably about to be provisioned. Ditto for current state is available and target state is active.
I assume the target state must be updated synchronously, so it really should be available whenever we look.
I suspect the real problem here is that there's a large delay between the point where you're making a go/no-go decision (on entry to the Provisioning state) and when we actually ask Ironic to go to active (not until after cleaning). And now it's even more fun because a node in the available state may or may not represent a Host in the provisioning state.
I think the solution to this is the Preparing state I proposed on the mailing list. That way all nodes will be in available before starting provisioning, so we'd only have to look for those with target state active.
zaneb
left a comment
There was a problem hiding this comment.
OK, so if I'm following correctly here, the main thing you've changed is to check for capacity and block reconciling continuously, rather than just preventing the Host from entering the Provisioning state. (In fact, if we do it this way I think we should stop blocking the host from changing state - better to consistently block only in Provisioning than sometimes in Provisioning and sometimes in Ready.) This is suboptimal, because it may mean that we e.g. let a node start cleaning when another node that has already finished cleaning is waiting to start provisioning, but possibly necessary because of the very large delay between entering the Provisioning state and doing anything that looks like provisioning in Ironic. However, if we find that only a subset of states during provisioning are the ones that are actually placing strain on ironic (sitting there waiting for IPA to clean a node might be easy, for example) then this could still come out as a win.
The following combinations of current and target node provisioning states correspond to Hosts in the Provisioning state with what I'd expect (with one exception) to be a fairly high degree of consistency:
Manageable->TargetProvideCleaning->TargetProvideCleanWait->TargetProvideCleanFail(could actually be deprovisioning, but it's brief enough that we can live with it)Available->TargetActiveDeployingDeployWait
However, there is a large hole there - once the state becomes Available but before we request TargetActive, we cannot tell from the ironic states alone whether the node is Provisioning or Ready after deprovisioning. (Unless, of course, we were to store this information somewhere in ironic - which is totally possible.) While setting up the data and requesting a move to active is relatively quick, it may take some time for the operator to notice that the node has reached Available and in that window it may start allowing other Hosts to start provisioning.
If we implement my proposal to add a Preparing state before Ready, so that the request for TargetActive always happens right at the start of Provisioning, then I think that gating on state changes only would become viable again. Since we want that state anyway, it's worth at least considering whether our path would look easier if we did that first or merged this and then reworked it later, assuming we prefer to gate only on state changes.
Use of the error message and error type fields is effective (operational status would be another option, but this is more visible). However, after having written #749 I now think it's a mistake to increment the error count even once because it mixes these routine delays with other possible causes for error. I think all we really need here is a random backoff from a fixed interval. It's fine to add a new actionResult type for this - we actually need one for retry-after-delay anyway.
| for _, node := range allNodes { | ||
|
|
||
| switch node.ProvisionState { | ||
| case "cleaning", "clean wait", "inspecting", "inspect wait", |
There was a problem hiding this comment.
It may be that clean wait and inspect wait are not states that are particularly taxing on ironic and could be left off the list. It would be worth checking with the Ironic team.
Apart the
Currently gating on state changes seems to work reasonably well, without affecting the convergence of the overall provisioning, so I'd prefer to proceed and merge with this approach and review it later once the new
The ProvisionerBusy error is now used to detect that an host have been previously delayed, so an initial check can be performed even before applying the state machine. If we want to move to a plain action result for the backoff - and also see the usefulness of that - then I suppose operational status could be used to track this situation |
fee645e to
fe2bdf9
Compare
Update: I've added a new actionDelayed that randomly backoff in [1m, 2m] and switched to use OperationalStatus to track delayed hosts instead of the error type |
|
/test-integration |
2 similar comments
|
/test-integration |
|
/test-integration |
6196aa5 to
4bd080b
Compare
|
/test-integration |
| // has any sort of error. | ||
| OperationalStatusError OperationalStatus = "error" | ||
|
|
||
| OperationalStatusDelayed = "delayed" |
There was a problem hiding this comment.
I'm wondering if there is a better name for this, but I can't think of one.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andfasano, dtantsur, zaneb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
000c4ac to
920e688
Compare
|
Commits squashed and demo cast updated |
zaneb
left a comment
There was a problem hiding this comment.
Rebase required, but otherwise looks good to me. Love the demo!
920e688 to
cd407b6
Compare
|
/test-integration |
cd407b6 to
613a4fc
Compare
|
/test-integration |
613a4fc to
2f6b99b
Compare
|
/test-integration |
|
I don't see any issues with the rebase. Who wants to LGTM? |
|
/lgtm |
This PR introduces a soft limit for the number of hosts being simultaneously being provisioned.
Design doc: https://github.com/metal3-io/metal3-docs/blob/master/design/baremetal-operator/limit-hosts-provisioning.md.
A new configuration variable is provided, called
PROVISIONING_LIMIT(defaulted to 20), to specify the desired batch size that the operator will try to honor while provisioning new hosts. If there are no available slots, the current action for an host will be rescheduled using a fixed backoff with jittering to spread evenly the delayed requests.Due the asynchronous behaviour of the Ironic subsystem, and the configurable level of concurrency of the operator, it's not possible to guarantee that such limit will be always respected, and occasional overflows could happen especially during the initial burst; nevertheless the operator will try to enforce such limit as much as possible, and for such reason it's recommended to keep a configuration where PROVISIONING_LIMIT is greater than BMO_CONCURRENCY.
Demo
Here's a quick demo of the current implementation, using a limit set to 8 and a BMO concurrency level of 3 (demo speeded up). A batch of additional 15 nodes is added to the cluster. It is possible to note the overflow on the initial burst, then the limit is maintained until the provisioning was completed.