Skip to content

Conversation

@2uasimojo
Copy link
Member

@2uasimojo 2uasimojo commented Aug 24, 2021

3dc3c3c / #1484 added setting and discovery of a version marker for ClusterPool and ClusterDeployment so we could tell whether they match. This commit goes a step further and uses that information to replace stale CDs such that the pool will eventually be consistent.

The algorithm we use is as follows:
If the pool is in steady state -- i.e. we're at capacity and all unclaimed CDs are finished installing -- delete one stale CD. This triggers another reconcile wherein the deleted CD is replaced. We're no longer in steady state until (at least) that CD finishes installing, whereupon we're eligible to repeat.

Reasoning:

  • We don't want to immediately trash all stale CDs, as this could result in significant downtime following a pool edit.
  • We don't want to exceed capacity (size or maxSize) to do this rotation, so we go "down" instead of "up".
  • It would have been nice to just wait for the replacement CDs to finish installing; but that would have entailed tracking those somehow as being distinct from CDs added through other code paths. That could get tricky, especially across multiple pool edits.

TODO:

  • As written, we'll even wait for stale CDs to finish installing before we start deleting. We should figure out a way to discount those when checking whether we're eligible for a deletion.
  • Consider exposing a knob allowing the consumer to tune how aggressively stale CDs are replaced.

HIVE-1058

@2uasimojo
Copy link
Member Author

/hold

Needs UT & live test

/cc @stevekuznetsov @gurnben

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 24, 2021
@openshift-ci openshift-ci bot requested review from abutcher and akhil-rane August 24, 2021 20:22
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 24, 2021
@codecov
Copy link

codecov bot commented Aug 24, 2021

Codecov Report

Merging #1512 (f3e437e) into master (8b4f8db) will increase coverage by 0.03%.
The diff coverage is 88.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1512      +/-   ##
==========================================
+ Coverage   41.47%   41.51%   +0.03%     
==========================================
  Files         335      336       +1     
  Lines       30538    30569      +31     
==========================================
+ Hits        12666    12691      +25     
- Misses      16790    16794       +4     
- Partials     1082     1084       +2     
Impacted Files Coverage Δ
pkg/controller/metrics/metrics.go 40.71% <ø> (ø)
...g/controller/clusterpool/clusterpool_controller.go 54.59% <72.72%> (+0.11%) ⬆️
pkg/controller/clusterpool/collections.go 75.91% <92.50%> (+0.67%) ⬆️
pkg/controller/clusterpool/metrics.go 100.00% <100.00%> (ø)

@2uasimojo 2uasimojo force-pushed the HIVE-1058/cycle-stale-clusters branch from 58cc192 to 2f12f8b Compare August 24, 2021 21:45
@dgoodwin
Copy link
Contributor

LGTM so far.

return reconcile.Result{}, err
}
// Special case for stale CDs: allow deleting one if all CDs are installed.
case drift == 0 && len(cds.Installing()) == 0 && len(cds.Stale()) > 0:
Copy link
Contributor

@dgoodwin dgoodwin Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually does this need a check to make sure none are deleting? Otherwise it kinda looks like it might immediately wipe them all out. (depending on how fast Installing clusters start showing up)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't need that:

  • On this reconcile we Delete one.
  • The next reconcile will miss this case because we'll be one short of pool.size so drift will be < 0. It'll add a CD via L382 which will go to Installing.
  • The next reconcile (and subsequent ones for the next 40m) will see nonzero Installing clusters, so will skip this case.

Did I logic that right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably right, there are cases where the cache isn't updated for new objects and may not see the new Installing cluster, which is where we have to use expectations. If I'm correct that might happen here, it's possible it could wipe out a couple at once before the installing start showing up. I'm not super confident in any of this so we can see how it goes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do a counter metric here so we can see the rate at which clusters are getting replaced due to pool changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use expectations

Then someone will have to explain to me what "expectations" are and how they work :P

a counter metric

Just a monotonic counter we increment every time we hit this branch? Or a gauge we reset once we're fully synced?

And the fancy rate calculation stuff happens in prometheus, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use expectations

Then someone will have to explain to me what "expectations" are and how they work :P

Best ref I've used for expectations was when Matthew first introduced them to Hive and why: #518 I don't necessarily think we need to do this here, but it may be an edge case we'll hit where a couple get deleted when we only expected one. I do think an unclaimed deleting check would likely rectify it though, that should be reflected in cache immediately, and by the time it's done deleting the new cluster provisions should definitely be in the cache.

a counter metric

Just a monotonic counter we increment every time we hit this branch? Or a gauge we reset once we're fully synced?

Just a counter we increment yes, it'll tell us how often this is happening.

And the fancy rate calculation stuff happens in prometheus, right?

Correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary from today's discussion re expectations: we don't need them here because the deletion hits the cache immediately, which would result in the next reconcile having a nonzero drift and would thus miss the new branch.

Digging into the metric a bit...

You suggested a Counter that would keep an running total of the number of stale CDs we've ever* deleted.

The other option is a Gauge that denotes how many stale clusters exist at any given time. Would that have value? Would it allow us to calculate the above by some prom magic? Or perhaps we could do both.

*Since the last time this controller started? Or like ever ever? I never really understood when data are in the controller and when they're in prometheus...

Copy link
Member Author

@2uasimojo 2uasimojo Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the Counter metric in a separate commit. LMK if the Gauge seems like a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counters are since the process started, prometheus handles the past data and adds the two together. In this case it's not so much for the value of the counter, more so for the rate at which it's occurring.

A gauge does sound useful for the total number of stale at any given time, rather than the rate at which we're doing replacements.

@2uasimojo 2uasimojo force-pushed the HIVE-1058/cycle-stale-clusters branch from 2f12f8b to 5d89d37 Compare August 26, 2021 21:35
@2uasimojo
Copy link
Member Author

(Still holding for test)

@2uasimojo 2uasimojo force-pushed the HIVE-1058/cycle-stale-clusters branch from 5d89d37 to 37fb6af Compare August 30, 2021 20:24
3dc3c3c / openshift#1484 added setting and discovery of a version marker for
ClusterPool and ClusterDeployment so we could tell whether they match.
This commit goes a step further and uses that information to replace
stale CDs such that the pool will eventually be consistent.

The algorithm we use is as follows:
If the pool is in steady state -- i.e. we're at capacity and all
unclaimed CDs are finished installing -- delete *one* stale CD. This
triggers another reconcile wherein the deleted CD is replaced. We're no
longer in steady state until (at least) that CD finishes installing,
whereupon we're eligible to repeat.

Reasoning:
- We don't want to immediately trash all stale CDs, as this could result
in significant downtime following a pool edit.
- We don't want to exceed capacity (size or maxSize) to do this
rotation, so we go "down" instead of "up".
- It would have been nice to just wait for the *replacement* CDs to
finish installing; but that would have entailed tracking those somehow
as being distinct from CDs added through other code paths. That could
get tricky, especially across multiple pool edits.

TODO:
- As written, we'll even wait for *stale* CDs to finish installing
before we start deleting. We should figure out a way to discount those
when checking whether we're eligible for a deletion.
- Consider exposing a knob allowing the consumer to tune how
aggressively stale CDs are replaced.

HIVE-1058
@2uasimojo 2uasimojo force-pushed the HIVE-1058/cycle-stale-clusters branch from 37fb6af to c48acb3 Compare August 30, 2021 21:12
@2uasimojo
Copy link
Member Author

UT is in. Still need to test live.

Add a CounterVec, labeled by pool name, incremented every time we delete
a stale ClusterDeployment.

Part of HIVE-1058
@2uasimojo 2uasimojo force-pushed the HIVE-1058/cycle-stale-clusters branch from c48acb3 to f3e437e Compare August 30, 2021 22:16
@2uasimojo
Copy link
Member Author

/test e2e

@2uasimojo
Copy link
Member Author

Tested live, works.

/hold cancel
/assign @dgoodwin

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 31, 2021
@dgoodwin
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 31, 2021
@openshift-merge-robot openshift-merge-robot merged commit 694e5a8 into openshift:master Aug 31, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 31, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, dgoodwin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@2uasimojo 2uasimojo deleted the HIVE-1058/cycle-stale-clusters branch August 31, 2021 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants