RFE: ClusterPool hot spares #1434

2uasimojo · 2021-06-28T21:07:20Z

Add enhancement doc proposing ways to keep Active clusters in a ClusterPool for faster claiming.

HIVE-1576

petr-muller · 2021-06-28T21:19:27Z

/cc

docs/enhancements/cluster_pool_hot_spares.md

gurnben · 2021-07-30T12:46:46Z

Everything looks good to me @2uasimojo - you've done a particularly good job of enumerating the edge cases, reasoning, and alternatives! We're approaching a point where we might even keep a hot spare or two to speed up CI!

docs/enhancements/cluster_pool_hot_spares.md

abhinavdahiya · 2021-08-24T17:41:19Z

LGTM

dgoodwin

This looks pretty good to me

docs/enhancements/cluster_pool_hot_spares.md

dgoodwin · 2021-09-01T16:42:04Z

docs/enhancements/cluster_pool_hot_spares.md

+With this feature, we will create with `powerState` unset.
+Then in a separate code path:
+- Sort unassigned (installing + ready) CDs by `creationTimestamp`
+- Update the oldest `runningCount` CDs with `powerState=Running`


Will this conflict with the stale cluster deletion, which is also based on oldest?

Yes (which wasn't a thing when I wrote this :) )

Today we don't take staleness into account when assigning: we'll assign the oldest installed cluster even if it's stale; but then that takes that cluster out of consideration for deletion due to staleness.

If we don't treat stale clusters specially:

Kick the oldest cluster, which happens to be stale, to running

Delete the oldest stale cluster, which is ^ that one

Replace that cluster. While it's installing...

Kick the oldest cluster, which happens to be stale, to running

We won't delete another stale cluster until the replacement is done installing. As long as a claim comes in before then, we'll assign the now-running stale cluster that was the second-oldest when we started. If not, we'll repeat the above sequence. It's not really thrashy since it's only happening at worst on a ~40m interval.

Is this something you feel we need to solve?

I think the important thing to note here is that we're preserving all of these rules separately (because trying to coordinate them would be insanely complicated):

Claiming gets the oldest ready cluster -- FIFO

runningCount hits the oldest ready clusters -- again for FIFO purposes so a given ⬆️ claim is most likely to get a running cluster

Delete the oldest stale cluster first -- again for FIFO purposes.

docs/enhancements/cluster_pool_hot_spares.md

Add enhancement doc proposing a new field to keep Running clusters in a ClusterPool for faster claiming. HIVE-1576

2uasimojo · 2021-09-14T15:56:39Z

/assign @joelddiaz

joelddiaz · 2021-09-14T18:59:22Z

/lgtm

openshift-ci · 2021-09-14T18:59:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, joelddiaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [2uasimojo,joelddiaz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

In openshift#1434, we added the paradigm of keeping some number of ClusterPool clusters in Running state, so they don't have to suffer Resuming time when claimed. However, that work was narrowly focussed on conforming to the ClusterPool.Spec.RunningCount. This commit expands the concept to try to make sure we're always running enough clusters to satisfy claims. In particular, when we are servicing a number of claims in excess of the pool's capacity, we create that excess number of clusters... but we previously didn't account for that number when calculating how many should be Running. Now we do. To explain via a corner case: If your pool has a zero Size and you create a claim, we will create a cluster and immediately assign it. But because RunningCount is zero, prior to this commit, we would create the cluster with `PowerState=Hibernating`, and then kick it over to `Running` when we assigned the claim. Now we'll create it with `PowerState=Running` in the first place. HIVE-1651

openshift-ci bot requested review from abhinavdahiya and akhil-rane June 28, 2021 21:07

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 28, 2021

openshift-ci bot requested a review from petr-muller June 28, 2021 21:19

petr-muller reviewed Jun 29, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Outdated Show resolved Hide resolved

2uasimojo commented Jun 29, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Outdated Show resolved Hide resolved

docs/enhancements/cluster_pool_hot_spares.md Outdated Show resolved Hide resolved

2uasimojo commented Jul 27, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Outdated Show resolved Hide resolved

2uasimojo commented Jul 27, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Outdated Show resolved Hide resolved

2uasimojo commented Jul 27, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Outdated Show resolved Hide resolved

2uasimojo force-pushed the HIVE-1576 branch from 5a3b566 to b25fbe7 Compare July 29, 2021 22:55

abhinavdahiya reviewed Aug 4, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Outdated Show resolved Hide resolved

abhinavdahiya reviewed Aug 4, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Outdated Show resolved Hide resolved

abhinavdahiya reviewed Aug 4, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Show resolved Hide resolved

abhinavdahiya reviewed Aug 4, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Show resolved Hide resolved

abhinavdahiya reviewed Aug 4, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Show resolved Hide resolved

abhinavdahiya reviewed Aug 4, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Show resolved Hide resolved

dgoodwin reviewed Aug 4, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Outdated Show resolved Hide resolved

docs/enhancements/cluster_pool_hot_spares.md Show resolved Hide resolved

abhinavdahiya reviewed Aug 5, 2021

View reviewed changes

docs/enhancements/cluster_pool_hot_spares.md Outdated Show resolved Hide resolved

2uasimojo force-pushed the HIVE-1576 branch from b25fbe7 to 3d0ece0 Compare August 6, 2021 19:01

2uasimojo force-pushed the HIVE-1576 branch 2 times, most recently from 68b0c01 to 30e9fb2 Compare August 20, 2021 20:50

dgoodwin reviewed Sep 1, 2021

View reviewed changes

RFE: ClusterPool.Spec.RunningCount: hot spares

9eb7e10

Add enhancement doc proposing a new field to keep Running clusters in a ClusterPool for faster claiming. HIVE-1576

2uasimojo force-pushed the HIVE-1576 branch from 30e9fb2 to 9eb7e10 Compare September 10, 2021 00:06

openshift-ci bot assigned joelddiaz Sep 14, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 14, 2021

openshift-merge-robot merged commit 820ee2d into openshift:master Sep 14, 2021

2uasimojo deleted the HIVE-1576 branch September 14, 2021 19:03

2uasimojo mentioned this pull request Oct 5, 2021

ClusterPool: running clusters for excess claims #1567

Merged

RFE: ClusterPool hot spares #1434

RFE: ClusterPool hot spares #1434

Uh oh!

Conversation

2uasimojo commented Jun 28, 2021 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petr-muller commented Jun 28, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gurnben commented Jul 30, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abhinavdahiya commented Aug 24, 2021

Uh oh!

dgoodwin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dgoodwin Sep 1, 2021

Choose a reason for hiding this comment

Uh oh!

2uasimojo Sep 1, 2021

Choose a reason for hiding this comment

Uh oh!

2uasimojo Sep 14, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

2uasimojo commented Sep 14, 2021

Uh oh!

joelddiaz commented Sep 14, 2021

Uh oh!

openshift-ci bot commented Sep 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

2uasimojo commented Jun 28, 2021 •

edited by openshift-ci bot

Loading