-
Notifications
You must be signed in to change notification settings - Fork 530
Installer: check operators for stability #1189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installer: check operators for stability #1189
Conversation
|
PoC implementation in openshift/installer#6124 |
| Cluster admin will begin installing cluster as usual. The Installation workflow will be: | ||
|
|
||
| 1. Cluster initializes as usual | ||
| 2. As usual, installer checks that cluster version is Available=true Progressing=False Degraded=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clusterversion doesn't assert a degraded condition, should this be failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, even though that's not yet wired up it was agreed to be.
Need to chase https://bugzilla.redhat.com/show_bug.cgi?id=1951835 and openshift/cluster-version-operator#662
@patrickdillon @jottofar lets make sure we sync up on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clusterversion doesn't assert a degraded condition, should this be failing?
Yes, you are correct the installer code checks against the failing condition.
|
|
||
| A fundamental design principle of 4.x and the operator framework has been | ||
| delegating responsibility away from the Installer to allow clean separation | ||
| of concerns and better maintainbility. Without this enhancement, it is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| of concerns and better maintainbility. Without this enhancement, it is the | |
| of concerns and better maintainability. Without this enhancement, it is the |
| ### Open Questions [optional] | ||
|
|
||
| 1. What is the correct definition of a stable operator? More importantly, how can we | ||
| refine this definition? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For purposes of this EP and the installer's perspective, it sounds like you're defining it as "progressing=false for 5 minutes"?
Is the installer going to look at any other conditions on the operator, e.g. available=false, or it will continue to rely on the CVO to proxy that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's progressing=False for the last 30 seconds but only waiting for all COs to achieve that up to 5 minutes. Only waiting up to five minutes because these aren't failed installs, the cluster is in all cases I'm aware of, a viable cluster that just differs from the exact spec defined in install-config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sorry i described it badly. but i'm still interested in what the full set of conditions that determine "install is complete (and healthy or unhealthy)" are, beyond this "30s window of progressing=false"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the installer going to look at any other conditions on the operator, e.g. available=false, or it will continue to rely on the CVO to proxy that?
No, it would continue to rely on CVO as a proxy which is a good way of putting it.
<edited to copy/paste/fix here: https://github.com/openshift/enhancements/pull/1189#discussion_r921698993>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the definition as put forward by this enhancement plus the existing logic that requires A=t and D=f.
| responsibility of the Cluster-Version Operator to determine whether given | ||
| Cluster Operators constitute a successful version. The idea of keeping the | ||
| cluster-version operator as the single responsible party is discussed in the | ||
| alternatives section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another aspect to this is it opens the door for differing behavior between which clusteroperators CVO cares about, vs installer.
Today my understanding is the CVO only watches clusteroperator objects that it created (came out of manifests in the payload). If another component creates its own CO for some reason, that will have no bearing on CVO's observation of the cluster and reporting of available/progressing/failing conditions. I imagine the installer is not going to make that distinction (though perhaps it could, via looking at annotations on the COs), so we could potentially end up with a situation where the installer has one view of "what matters" and the CVO has a different one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bparees Which of those, in the context of installation, makes the most sense given where we're going in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the installer has nothing to do w/ upgrades, and i'd expect similar semantics to function around upgrades as they do around install (in terms of when we think it is "done"), i'd be inclined towards the CVO owning this aspect over the installer.
As for future, i assume you mean w/ respect to platform operators? So far the PO plan is to have the platform operator manager proxy the individual PO status, via a single CO. So PO doesn't really care whether it's CVO or installer that is watching the COs. But part of why the POM has to proxy the status to a single CO is because the CVO doesn't watch non-payload COs. I could see us changing that in the future, which is another reason i'd have to see us have two different components watching potentially different sets of COs to make similar decisions.
| 1. Cluster initializes as usual | ||
| 2. As usual, installer checks that cluster version is Available=true Progressing=False Degraded=False | ||
| 3. Installer checks status of each cluster operator for stability | ||
| 4. If a cluster operator is not stable the installer logs a message and throws an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a summary of all the things the installer looks at today to determine that the installation is done (or failed), so we can better understand the delta+implications of that delta being proposed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serially, as I posted above, I believe it is:
Checks for bootstrap complete configmap. If that's good:
Checks for API. If that's good:
Checks cluster version. If that's good:
Enhancement: Sets a 5 minute context deadline then watches each CO to ensure they achieve Progress=False with LastTransition time > 30 seconds If that's good,
Checks console availability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fold the console check into the proposed CO checks?
What do we do in a cluster where the console is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks console availability
is this checked by looking at the console CO conditions, or by explicitly trying to access the console url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we do in a cluster where the console is disabled?
Since openshift/installer#5336, the installer will warning-log Cluster does not have a console available... and move on, without failing. For example, here is output from a no-caps run:
level=info msg=Checking to see if there is a route at openshift-console/console...
level=warning msg=Cluster does not have a console available: could not get openshift-console URL
level=info msg=Install complete!
level=info msg=To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/tmp/installer/auth/kubeconfig'
level=info msg=Time elapsed: 29m45s
|
|
||
| ``` | ||
| If a cluster operator does not maintain Progressing=False for at least 30 seconds, | ||
| during a five minute period it is unstable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get some more detail on how this will work in practice?
At what point will the installer start the "5 minute window" for example?
And presumably it will be looking for a 30 second period during which all operators are progressing=false, meaning you could have a situation where:
- installer starts watching
- all operators are progressing=false
- 25 seconds go by
- operatorA goes progressing=true
- some time passes (less than 5 mins total have expired)
- operator A goes progressing=false
- 25 seconds go by
- operator B goes progressing=true
- some time passes (less than 5 mins total have expired)
- operator B goes progressing=false
- 25 seconds go by
- operatorA (or C) goes progressing=true
etc.
right?
Not saying that's a problem, just trying to understand how the detection/determination will work in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we trust lastTransition Time?
- Wait for current install-complete signal
- Start's a five minute timer
- Every 5s check each ClusterOperators for Progressing=False and lastTransitionTime < now - 30s?
- Exit non-zero at end of five minute timer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is here:
https://github.com/openshift/installer/pull/6124/files#diff-56276d5381d618d46ec8d35d93210662c8fdd4c9bcd90fd36afbc6a59227eb0bR713-R738
The intent It is pretty much what scott said except we don't necessarily wait the whole 5 minutes. lastTransitionTime < now - 30s is the level for stability, once that is cleared the CO is cleared. If all COs are "stable" on the first check we exit immediately and the whole check took seconds at most.
The 5s detail in point 3 is abstracted away into library-go & watch events.
I will update these crucial details in the enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scott's explanation in https://github.com/openshift/enhancements/pull/1189/files#r921528565 makes sense to me. I think the logic in the PR may have a bug. I commented.
|
|
||
| This enhancement allows the Installer to identify and catch a class of errors | ||
| which are currently going unchecked. Without this enhancement, the Installer can | ||
| (and does) declare clusters successfully installed when the cluster has failing components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (and does) declare clusters successfully installed when the cluster has failing components. | |
| (and does) declare clusters successfully installed when the cluster still has progressing components. |
Just because failing here is not always the case. MAPI doesn't consider it a failure that you only have 4 of 5 worker instances because AWS is out of your instance type at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t realize this enhancement would cause installs to fail if a machine is not properly provisioned, but that makes sense. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regardless of MAPI behavior i agree w/ the sentiment behind the suggested edit. All we know about the components is they are still progressing, we don't actually know that they are failing (and if they were truly failing, they should probably be setting available=false or at least degraded=true).
In my mind this is more about ensuring that when we still the user their cluster is ready, it's actually ready and not still tidying up some things, than it is about reporting a failure that we currently ignore (not that that isn't also useful).
Do we actually have any data points about how often clusters never "settle", such that these changes to the installer would result in reported failures that today are ignored?
I think the more interesting aspect is that w/ this EP we'll no longer report install complete "prematurely"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good summary of why I suggested this change. I don't agree that still progressing is the same as failing.
WRT to timing The only thing that I know is service delivery waits up to 20 minutes before handing things off to the customer no matter what the result of their supplemental readiness checks. I'll ask if they've collected data on how long they generally wait and how often they hit the 20 minute cap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and if they were truly failing, they should probably be setting available=false or at least degraded=false).
@bparees I guess you meant degraded=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bparees I guess you meant degraded=true
whoops, yes, of course. will edit original comment.
|
|
||
| ### Goals | ||
|
|
||
| * Installer correctly identifies a failed cluster install where cluster operators are not stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Installer correctly identifies a failed cluster install where cluster operators are not stable | |
| * Installer correctly identifies a progressing cluster install where cluster operators are not stable |
|
|
||
| ### Non-Goals | ||
|
|
||
| * Installer handling other classes of errors, such as failure to provision machines by MAO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Installer handling other classes of errors, such as failure to provision machines by MAO | |
| * Installer handling of specific errors, such as failure to provision machines by MAO. The Installer only reports what ClusterOperators convey. |
| Cluster admin will begin installing cluster as usual. The Installation workflow will be: | ||
|
|
||
| 1. Cluster initializes as usual | ||
| 2. As usual, installer checks that cluster version is Available=true Progressing=False Degraded=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, even though that's not yet wired up it was agreed to be.
Need to chase https://bugzilla.redhat.com/show_bug.cgi?id=1951835 and openshift/cluster-version-operator#662
@patrickdillon @jottofar lets make sure we sync up on this.
|
|
||
| ``` | ||
| If a cluster operator does not maintain Progressing=False for at least 30 seconds, | ||
| during a five minute period it is unstable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we trust lastTransition Time?
- Wait for current install-complete signal
- Start's a five minute timer
- Every 5s check each ClusterOperators for Progressing=False and lastTransitionTime < now - 30s?
- Exit non-zero at end of five minute timer
| responsibility of the Cluster-Version Operator to determine whether given | ||
| Cluster Operators constitute a successful version. The idea of keeping the | ||
| cluster-version operator as the single responsible party is discussed in the | ||
| alternatives section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bparees Which of those, in the context of installation, makes the most sense given where we're going in the future?
| Should operators themselves be setting Degraded=True when they don't meet this stability criterion? | ||
|
|
||
| As we have seen with other timeouts in the Installer, developers and users will want to change these. | ||
| We should define a process for refining our stability criteria. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Via another five minutes of openshift-install wait-for stable-operators ? Given all of our wait-for conditions I'm not too worried about knobs for the timeout.
| ### Open Questions [optional] | ||
|
|
||
| 1. What is the correct definition of a stable operator? More importantly, how can we | ||
| refine this definition? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's progressing=False for the last 30 seconds but only waiting for all COs to achieve that up to 5 minutes. Only waiting up to five minutes because these aren't failed installs, the cluster is in all cases I'm aware of, a viable cluster that just differs from the exact spec defined in install-config.
|
|
||
| ### Test Plan | ||
|
|
||
| This code would go directly into the installer and be exercised by e2e tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By a synthetic CO that immediately becomes Available=True fulfilling the historic requirement for install-complete but maintains Progressing=True for 30 seconds after? Do we have tests that watch the cluster during the install or is this new ground? I think @deads2k has called these observers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @deads2k has called these observers?
We don't have e2e observers yet, but if they existed it could work. I built a backstop into our pre-test CI step that we can change from "wait five minutes to settle" to "fail if anything is not settled. If it trips, this feature has a bug.
| tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement | ||
| - none | ||
| see-also: | ||
| - "/enhancements/this-other-neat-thing.md" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And all the comments above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now have a desire to place something awesome at this location :)
| 1. Cluster initializes as usual | ||
| 2. As usual, installer checks that cluster version is Available=true Progressing=False Degraded=False | ||
| 3. Installer checks status of each cluster operator for stability | ||
| 4. If a cluster operator is not stable the installer logs a message and throws an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fold the console check into the proposed CO checks?
What do we do in a cluster where the console is disabled?
| will also help the Technical Release Team identify this class of problem in CI and triage | ||
| issues to the appropriate operator development team. | ||
|
|
||
| ### User Stories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we think about providing a library with such functionality - so it can be reused in AI as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What functionality specificall? Scanning operator conditions and ensuring they've all settled for at least 30s?
|
Thanks to everyone for feedback on this enhancement. There are definitely suggestions that have been made that I could incorporate to improve this enhancement; on the other hand, I think it is best to avoid sinking further effort into this particular version of the enhancement until we resolve the general decision of whether this is better suited for the installer or the CVO. @LalatenduMohanty or @wking do you have thoughts in this regard? |
cblecker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely happy to see this discussion from an SRE point of view. Having a clearer picture to the cluster administrator of "does the thing the installer produce match my intent when invoking it" is extremely beneficial.
|
|
||
| 1. Cluster initializes as usual | ||
| 2. As usual, installer checks that cluster version is Available=true Progressing=False Degraded=False | ||
| 3. Installer checks status of each cluster operator for stability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, here's the things we are currently looking at today with OSD/ROSA to try and determine when a cluster install has actually finished:
https://github.com/openshift/osde2e/blob/71f4013df9420fc961104a9388e42bc7c6af9b2f/pkg/common/cluster/clusterutil.go#L430-L470
|
|
||
| - The worst case failure more is that the Installer throws an error when there is not an actual | ||
| problem with a cluster. In this case, an admin would need to investigate an error or automation would | ||
| need to rerun an install. We would hope to eliminate this failures through monitoring CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this failed state cause a full TF destroy/retry in it's interactions with hive?
|
/test markdownlint |
|
The current linter failure is real |
| One risk would be a false positive: the Installer identifies that a cluster | ||
| operator is unstable, but it turns out the operator is perfectly healthy; | ||
| the install was declared a failure but was actually successful. This risk | ||
| seems low and a risk that could be managed by monitoring these specific failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. It would also be an operator bug, not an installer bug.
| and introduce the potential for false positives or other failures. | ||
|
|
||
| Does implementing this enhancement address symptoms of issues with operator status definitions? | ||
| Should operators themselves be setting Degraded=True when they don't meet this stability criterion? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Progressing is about moving from one steady state to another. Degraded is about "something is broken". They serve different purposes and some progressing states easily last longer than 5 minutes. For instance, rolling out a new kube-apiserver level takes 15-20 minutes.
|
|
||
| 1. What is the correct definition of a stable operator? More importantly, how can we | ||
| refine this definition? | ||
| 2. Should this logic belong in the Installer, CVO, or another component? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion about this one. The logic described here seems fine. I'd also be fine seeing the CVO do it.
| determining whether an install is a success (exit 0) or failure (exit != 0). | ||
| Specifically, the Installer should check whether cluster operators have stopped | ||
| progressing for a certain amount of time. If the Installer sees that an operator | ||
| is Available=true but fails to enter a stable Progressing=false state, the Installer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if you have considered when operator is degraded for a certain amount of time .
|
|
||
| This enhancement allows the Installer to identify and catch a class of errors | ||
| which are currently going unchecked. Without this enhancement, the Installer can | ||
| (and does) declare clusters successfully installed when the cluster has failing components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and if they were truly failing, they should probably be setting available=false or at least degraded=false).
@bparees I guess you meant degraded=true
| progressing so that I can check whether the operator has an issue. | ||
|
|
||
| As a member of the technical release team, I want the Installer to exit non-zero when | ||
| an operator never stops progressing so that I can triage operator bugs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add another use case which SRE has around the installer. They can not deploy the workload immediately after the installation because the cluster is not ready , so they had to develop extra code which checks the cluster to see if the operators are not progressing anymore which is inconvenient (same for our customers) cc @cblecker
|
@patrickdillon I think there's just a few outstanding change recommendations which need to get applied before we move forward. |
I'll be working that PR again during this Sprint. Currently in that PR if CVO sees op Degraded True during Initializing mode, as opposed to Reconciling or Upgrading modes, it reports it but does not fail. |
|
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
| creation-date: 2021-07-14 | ||
| last-updated: yyyy-mm-dd | ||
| tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement | ||
| - none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could possibly link openshift/installer#6124 if there are no Jira trackers? That got reverted in openshift/installer#6503, but presumably whichever pull request restores it with an adjusted threshold will also link 6124.
|
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
|
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/reopen |
|
@sdodson: Reopened this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sdodson The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override ci/prow/markdownlint |
|
@sdodson: Overrode contexts on behalf of sdodson: ci/prow/markdownlint DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@patrickdillon: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The linter job for openshift#1189 was overridden, so invalid markdown landed. This fixes the markup to make the job work for other PRs. Signed-off-by: Doug Hellmann <[email protected]>
|
openshift/installer#7289 is in flight with the implementation. |
Following up on Cluster Lifecycle Arch call from late February, this is my attempt to reframe the discussion in an enhancement.
I have written this enhancement from the point of view of implementing in the installer, but do think this is still an open question as to whether this should go in the installer or CVO.
/assign @deads2k
/assign @sdodson
/assign @bparees
/assign @LalatenduMohanty
/assign @wking