Skip to content

Conversation

@timflannagan
Copy link
Contributor

Add a platform operators phase 0 proposal.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot requested review from tjungblu and tremes July 6, 2022 15:42
@timflannagan timflannagan force-pushed the add-platform-operators-phase-0-proposal branch from 466a5a7 to 7d6c027 Compare July 6, 2022 15:42
@tjungblu tjungblu removed their request for review July 6, 2022 15:45
@timflannagan
Copy link
Contributor Author

/unassign tremes
/assign joelanford
/assign bparees

@timflannagan
Copy link
Contributor Author

/uncc tremes

@openshift-ci openshift-ci bot removed the request for review from tremes July 6, 2022 15:46
@timflannagan
Copy link
Contributor Author

FYI - I originally had an EP that covers phase 0 and phase 1 in more detail, but I felt like that was increasing the scope of the overall EP so I trimmed this done to only cover the first phase in detail.


### User Stories

- As a cluster admin, I want to be able to install OLM-based Operators at cluster creation, and participate in the cluster's lifecycle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's implied here but might be worth making it clearer that "participate in the cluster's lifecycle" means:

  • automatically upgrading to new versions of the PO when the cluster is being upgraded
  • blocking cluster upgrades when the PO can't tolerate the new version (or has another reason to block the upgrade)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, spelling those things out would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this to the summary section in the next round of edits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

automatically upgrading to new versions of the PO when the cluster is being upgraded

Thinking about this some more: do we need to define this more granularity such that we outlined what happens in the following scenarios:

  • What happens when the cluster's minor version is upgraded?
  • What happens when the cluster's patch version is upgraded?
  • Do we need to care about cluster rollbacks? Are rollbacks supported behaviors in OCP?

I think what I've heard before is that when the cluster's minor version is upgraded, we'll automatically upgrade the POs packages that are currently installed. But we've also said that we'd like to enable PO teams to release on their own cadence, without being tied to the OCP payload lifecycle, and provide admins with control over approving out-of-band upgrades.

Maybe this is something we can punt on until phase 1, but I think we'd want to document and express those scenarios here to set the stage longer term.

blocking cluster upgrades when the PO can't tolerate the new version (or has another reason to block the upgrade)

This is slightly (if you squint) related: in the case of a cluster minor version upgrade, when do POs get upgraded? Does this process of upgrading POs happen after the control plane components are upgraded? Is it sufficient to upgrade all POs packages all at once, or is there some sort of CVO run-level style hierarchy we'd need to expose for a layered rollout of packages?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What happens when the cluster's minor version is upgraded?

I would expect the platform operators to be updated to be compatible with that minor version.

  • What happens when the cluster's patch version is upgraded?

I don't think I would expect any platform operators to be upgraded in this case. Expressing compatibility with patch versions of OCP is too finicky.

  • Do we need to care about cluster rollbacks? Are rollbacks supported behaviors in OCP?

Good questions.

I think what I've heard before is that when the cluster's minor version is upgraded, we'll automatically upgrade the POs packages that are currently installed. But we've also said that we'd like to enable PO teams to release on their own cadence, without being tied to the OCP payload lifecycle, and provide admins with control over approving out-of-band upgrades.

As a cluster admin, I want to be able to upgrade a platform operator I have installed independently of OpenShift so I can receive bug fixes in the platform operator.

As a cluster admin, I want my platform operators to stay compatible with the version of OpenShift I am running I upgrade my cluster so the cluster continues to work smoothly after the upgrade.

"Stay compatible" doesn't automatically imply an upgrade, but in practice it probably will require that most of the time.

Maybe this is something we can punt on until phase 1, but I think we'd want to document and express those scenarios here to set the stage longer term.

blocking cluster upgrades when the PO can't tolerate the new version (or has another reason to block the upgrade)

This is slightly (if you squint) related: in the case of a cluster minor version upgrade, when do POs get upgraded? Does this process of upgrading POs happen after the control plane components are upgraded? Is it sufficient to upgrade all POs packages all at once, or is there some sort of CVO run-level style hierarchy we'd need to expose for a layered rollout of packages?

I'd say keep it simple to start out and do them all at once.

It's generally going to be easier for a given version of an operator to support versions N-1 and N of OpenShift, because supporting N and N+1 would mean predicting the future (API removals or changes, etc.). So, I would say to go from OpenShift N-1 to N, upgrade the platform operators first, then upgrade OpenShift.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I would expect any platform operators to be upgraded in this case.

I disagree on this one. Any time the cluster is being upgraded, we should be upgrading the platform operators to the latest version that is compatible w/ the cluster version (typically defined by the minor version). It doesn't matter if it's a patch or minor version upgrade.

The point here being, if i'm at 4.13.3 and my PO is at 1.2.3, and now i upgrade my cluster to 4.13.4, and there's a PO 1.2.4 available, i want you to upgrade that too while i'm taking my upgrade maintenance window.

That's what "lifecycle the operators w/ the cluster means" - make the POs behave like COs in terms of when they upgrade.

Expressing compatibility with patch versions of OCP is too finicky.

There are really only two possibilities here:

  1. we allow POs to specify their OCP dep version all the way down to the patch level (they don't have to, but we allow it). In this case, we won't upgrade the PO anyway, unless the new version is compatible w/ the new OCP version
  2. we don't allow POs to specify dep beyond the minor version. Regardless this is what i'd expect most POs to do. If we do this, then nothing constrains what PO version you get when you install fresh on a new OCP cluster of a given patch level anyway, so there's no reason not to also allow upgrading of the PO version.

(to be clear i think we should be on option (1), but discourage operators from specifying patch level dependencies unless they have a good reason (such as kernel/rhcos deps like CNV has)

It's generally going to be easier for a given version of an operator to support versions N-1 and N of OpenShift, because supporting N and N+1 would mean predicting the future (API removals or changes, etc.). So, I would say to go from OpenShift N-1 to N, upgrade the platform operators first, then upgrade OpenShift.

ultimately this also gets into our desire to have preflight checking. But absent that, the answer in my mind is "both"

You may need to upgrade the PO before you upgrade the cluster (in the case that the PO's maxOCPVersion is less than the incoming OCP version). (Note that this gets slightly tricky because effectively it means the POM should report upgradeable=true, based on the fact that while the cluster can't/shouldn't be upgraded as is, we know there is a PO upgrade that will(may?) resolve that block. This will require some careful orchestration w/ the CVO)

You may also be able to(if not required to) upgrade the PO after you upgrade the cluster (this goes back to what i said above about upgrading everything while you're taking a maintenance window), so if newer versions of the PO are now available to the cluster immediately after the upgrade(really "during" the upgrade), we should be consuming them.

Do we need to care about cluster rollbacks? Are rollbacks supported behaviors in OCP?

we support rollbacks to a previous patch level, but not to a previous minor. And it's only supported as a workaround until you can upgrade again, it's not intended that you downgrade and then stay there long term. For a first pass we may be able to exempt POs from this, but ultimately if downgrading the cluster means we need an older version of the PO (for dependency reasons), we should be able to rerun the dep resolution and "upgrade" to the older PO version (i.e. rukpak and deppy should allow "upgrading" from a newer PO version to an older one, the same way OCP allows you to "upgrade" to an older payload)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I would expect any platform operators to be upgraded in this case.

I disagree on this one. Any time the cluster is being upgraded, we should be upgrading the platform operators to the latest version that is compatible w/ the cluster version (typically defined by the minor version). It doesn't matter if it's a patch or minor version upgrade.

I could go along with that more readily if we limited it to patch versions.

The point here being, if i'm at 4.13.3 and my PO is at 1.2.3, and now i upgrade my cluster to 4.13.4, and there's a PO 1.2.4 available, i want you to upgrade that too while i'm taking my upgrade maintenance window.

That's what "lifecycle the operators w/ the cluster means" - make the POs behave like COs in terms of when they upgrade.

Yes, that's reasonable. I was concerned about automatically upgrading to 1.3.0 automatically in that case, and I'm not sure we want that. It would need to be compatible to support upgrading the PO as part of a move from 4.13 to 4.14 but still. Expressing my desire to upgrade the cluster to a new patch level implies that I don't want new features.

It's generally going to be easier for a given version of an operator to support versions N-1 and N of OpenShift, because supporting N and N+1 would mean predicting the future (API removals or changes, etc.). So, I would say to go from OpenShift N-1 to N, upgrade the platform operators first, then upgrade OpenShift.

ultimately this also gets into our desire to have preflight checking. But absent that, the answer in my mind is "both"

You may need to upgrade the PO before you upgrade the cluster (in the case that the PO's maxOCPVersion is less than the incoming OCP version). (Note that this gets slightly tricky because effectively it means the POM should report upgradeable=true, based on the fact that while the cluster can't/shouldn't be upgraded as is, we know there is a PO upgrade that will(may?) resolve that block. This will require some careful orchestration w/ the CVO)

You may also be able to(if not required to) upgrade the PO after you upgrade the cluster (this goes back to what i said above about upgrading everything while you're taking a maintenance window), so if newer versions of the PO are now available to the cluster immediately after the upgrade(really "during" the upgrade), we should be consuming them.

Both are possible to implement. I think we'll make life easier for everyone (OLM developers, operator developers, release managers, cluster admins) if we pick an order and always do it the same way. Unless you think there's ever a case where a PO couldn't be backwards compatible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are possible to implement. I think we'll make life easier for everyone (OLM developers, operator developers, release managers, cluster admins) if we pick an order and always do it the same way. Unless you think there's ever a case where a PO couldn't be backwards compatible?

upgrading POs after/during the cluster upgrade is definitely "simpler" than upgrading them beforehand, because upgrading them beforehand requires us to determine, at the time we know the admin is starting to think about upgrading, whether or not there is a newer version of the PO that is compatible w/ the intended upgrade version. And in reality we don't want the admin to find out during their maintenance weekend that there isn't one.

So requiring POs be compatible w/ the next minor avoids that problem. But of course that's also impossible to guarantee.

Suppose i'm at ocp 4.13 and i install a PO v1.1.1 that's compatible with v4.13 and v4.14.

Now I upgrade to v4.14, but as v4.15 doesn't exist yet, there is no newer version of the PO that exists yet and support v4.15.

Now when i go to upgrade to v4.15 I will need to first upgrade my PO to a newer PO that supports v4.15 (assuming one even exists). And i need to do that before i upgrade my cluster, otherwise i'll be running a PO on v4.15 when that PO doesn't claim to support v4.15.

Ultimately I think the pragmatic thing to do here is probably to make the following assertions:

  1. If your currently installed PO explicitly is not compatible with the next anticipated minor (or even the next anticipated z), upgradeable=false and the admin must take action to manually upgrade the PO first.
  2. post any (patch or minor) OCP upgrade(as part of the OCP upgrade really), we will upgrade all your POs to the newest available/compatible versions based on the new OCP level.

This leaves a few situations where in theory we could do more for you, instead in the hands of admins, but at least for a first pass I think it strikes the right balance.

To your point about this meaning we might upgrade a PO to a new minor version as part of an OCP patch level upgrade...yes that is true, but if the PO declares compatibility I don't think that's entirely unreasonable.

If we want to get really fancy I suppose we could go down a path of:

  1. during ocp patch level upgrades, only allow POs to upgrade to a new patch level
  2. during ocp minor level upgrades, allow POs to upgrade to a new minor level

I do kind of like that constraint.

Copy link
Contributor Author

@timflannagan timflannagan Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all good discussion. I updated the EP, and clarified the "cluster lifecycle" phrasing. For phase 0, I marked upgrades as out-of-scope, and therefore blocking cluster upgrades for POs would also be out-of-scope. That would mean for that this first phase, we're really only interested in ensuring that a failed, individual PO rollout would act like a SLO today, and the overall cluster rollout would fail.


(i.e. rukpak and deppy should allow "upgrading" from a newer PO version to an older one, the same way OCP allows you to "upgrade" to an older payload)

IIRC, this should be on OLM's 1.x roadmap but there's still some ongoing discussion around how to handle this scenario best given it's easy to get workloads into bad states. At the base rukpak layer, our current stance is that component is very WYSIWYG, which means that instead of using the term "upgrading" we defer to using "pivoting". In this context, "pivoting" is essentially telling rukpak controllers to progress to the desired bundle content without any embedding any significant preflight checks, and instead relies on higher level components (e.g. deppy) to correctly configure the rukpak APIs.

When deppy is performing dependency resolution, and a new resolution outcome is produced, then rukpak will simply honor that outcome with the assumption that aggregate set of bundle contents can be successfully persisted to the cluster. With all of this said, we should be inheriting this behavior from OLM's 1.x vision, but we may require downstream safeguards to help reduce the change for getting workloads in bad states.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. during ocp patch level upgrades, only allow POs to upgrade to a new patch level
  2. during ocp minor level upgrades, allow POs to upgrade to a new minor level

What about allowing POs to upgrade to a new major level? We'll need a story for that, and I have a hard time seeing the answer being "wait until OCPv5".

But this seems right to me conceptually and it would definitely make the "support PO downgrades during cluster patch version downgrades" story more palatable.

This also highlights (if it wasn't already obvious) the fact that the PO controller will need to add/remove constraints into resolution automatically based on the steady vs. upgrading state of the cluster. It'll essentially be:

  1. Steady state: pin to range =X.Y.Z
  2. Patch upgrade: pin to range >=X.Y.Z <X.Y+1.0-0
  3. Minor upgrade : pin to range >=X.Y.Z <X+1.0.0-0

But during steady state, we need to permit manual upgrades triggered by cluster admins. In that case perhaps, we need something like an optional spec.steadyStateVersionRange field on the PO that allows the cluster admin to override the default "pin to this exact version" behavior?

Copy link
Member

@mhrivnak mhrivnak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I found the scope of this hard to follow. It started sounding like we want to:

  • make some openshift components optional but otherwise keep their lifecycle tightly like it is today
  • let a few other components that today are optional also attach themselves to the payload lifecycle

But then there's discussion later of multiple catalogs, manually upgrading these components (so they're presumably not at all tied to the payload lifecycle?), OLM choosing a "best" bundle to install, etc. By the end it sounds like a very different pattern than my first impression.

As noted elsewhere, it would help a lot to crisply define the lifecycle part. For a component that's in the payload now, does the lifecycle stay the same except that it can be removed and installed? Or does this introduce the possibility that it's no longer version-locked to the rest of the payload? It would be useful to acknowledge the cost of testing and supporting a wider matrix if that's the intent.

@bparees
Copy link
Contributor

bparees commented Jul 7, 2022

make some openshift components optional but otherwise keep their lifecycle tightly like it is today

more or less. The lifecycle won't be as tight as it is today (everything won't ship together and be updated lockstep), but the administrative experience will be unified.

let a few other components that today are optional also attach themselves to the payload lifecycle

it's not about the payload lifecycle (in fact the goal is to allow more things to detach themselves from the payload lifecycle), it is about the cluster lifecycle (i.e. today you have to separately manage updating your OLM operators or let them auto update at anytime. With a PO the goal is that we will update it for you, if an appropriate update exists, when you are updating the rest of your cluster)

But then there's discussion later of multiple catalogs, manually upgrading these components (so they're presumably not at all tied to the payload lifecycle?),

POs are not tied to the payload lifecycle.

OLM choosing a "best" bundle to install, etc. By the end it sounds like a very different pattern than my first impression.

Think of it like managing packages. You can update individual packages on your own if you want, to specific levels, but also you can run the fedora updater which will update all the packages at once to whatever is current for the new version, when you are upgrading your entire machine.

As noted elsewhere, it would help a lot to crisply define the lifecycle part. For a component that's in the payload now, does the lifecycle stay the same except that it can be removed and installed? Or does this introduce the possibility that it's no longer version-locked to the rest of the payload? It would be useful to acknowledge the cost of testing and supporting a wider matrix if that's the intent.

The latter(not version-locked to the payload). This is why the doc talks about enforcing min/max kube/ocp versions as specified by the POs. Because there's no tight coupling, so we need POs to be able to set version compatibility requirements. And yes, there's additional test matrices to consider, though that's no more so than is already the case for any OLM operator today. So that cost is really only applicable to something that is in payload today and would be moving out (e.g. console)

@mhrivnak
Copy link
Member

mhrivnak commented Jul 7, 2022

in fact the goal is to allow more things to detach themselves from the payload lifecycle

Makes sense. That would be great to make clear in the Motiviation and/or Goals sections.

With a PO the goal is that we will update it for you, if an appropriate update exists, when you are updating the rest of your cluster

Also it would be great to make this clear up-front. But even that aside, it still wasn't clear to me from reading this proposal (maybe as a draft it's still a WIP and I should have waited to read it. 👼 ) that the upgrade experience is somehow still integrated, nor especially how that would actually work. It would be great to see some discussion of: given a payload, how will POs be related to it, how will upgrade decisions be made, how much upgrade divergence would be possible between the payload vs. POs, etc.

It sounds like this is introducing a new conceptual model for delivering and applying updates, which deserves a significant level of detail in addition to all the logistical points about POs. Or maybe that model was already discussed and accepted elsewhere?

@timflannagan
Copy link
Contributor Author

Thanks for all the early stage feedback everyone. I think what I've seen so far is that this proposal is lacking some of the context behind why we're pursuing this platform operators mechanism before moving this out of the WIP/draft stages. I'm open to suggestions on how to make that clearer, but it sounds like the giving the top-level sections some more attention is a solid first step, which will help shape the problem better, and outline what's in scope for this EP.

I tried to communicate this throughout the proposal, but my intention for this first phase was to introduce $something under tech preview guidelines given the OLM team is essentially rewriting a significant portion of OLM in the background to meet long term demand, and has spent a good chunk of this current year prototyping, and iterating on this rewrite.

As a result, this phase may include a slimmed down design, and it may require manual intervention from an admin to use it during 4.12, but it would at least build the foundational layer for subsequent phases. We're still actively designing for phase 1, and building out the component(s) that will handle dependency resolution during the next quarter, but the hope is that next phase will start to provide a real integration with catalogs for admins and platform teams that wish to install POs outside of a specific CatalogSource, providing a first-class UX at the installer level that reduces manual intervention, and all the necessities around lifecycling POs that aren't covered in phase 0.

Note: I originally organized this EP to be an overall "platform operators" EP that detailed both of the phase 0 and phase 1 design(s), but I elected to cut that out earlier this week given the verbosity. I'm now wondering whether that was the right approach, and having an overall EP document that can be shaped over time is the best path forward would be more beneficial vs. creating individual EPs that are scoped to a specific phase.

@dhellmann
Copy link
Contributor

Thanks for all the early stage feedback everyone. I think what I've seen so far is that this proposal is lacking some of the context behind why we're pursuing this platform operators mechanism before moving this out of the WIP/draft stages. I'm open to suggestions on how to make that clearer, but it sounds like the giving the top-level sections some more attention is a solid first step, which will help shape the problem better, and outline what's in scope for this EP.

+1 - a bunch of us probably don't have all of that background

I tried to communicate this throughout the proposal, but my intention for this first phase was to introduce $something under tech preview guidelines given the OLM team is essentially rewriting a significant portion of OLM in the background to meet long term demand, and has spent a good chunk of this current year prototyping, and iterating on this rewrite.

That was clear to me.

As a result, this phase may include a slimmed down design, and it may require manual intervention from an admin to use it during 4.12, but it would at least build the foundational layer for subsequent phases. We're still actively designing for phase 1, and building out the component(s) that will handle dependency resolution during the next quarter, but the hope is that next phase will start to provide a real integration with catalogs for admins and platform teams that wish to install POs outside of a specific CatalogSource, providing a first-class UX at the installer level that reduces manual intervention, and all the necessities around lifecycling POs that aren't covered in phase 0.

Phases are good things.

Note: I originally organized this EP to be an overall "platform operators" EP that detailed both of the phase 0 and phase 1 design(s), but I elected to cut that out earlier this week given the verbosity. I'm now wondering whether that was the right approach, and having an overall EP document that can be shaped over time is the best path forward would be more beneficial vs. creating individual EPs that are scoped to a specific phase.

I don't think there's one right answer to how to approach that. One way we've handled that in some other places is to have non-goals for things that will come in later phases that say things like "Describing ${edge case, improvement, or extension} is left to a future enhancement". See https://github.com/openshift/enhancements/blob/master/enhancements/agent-installer/automated-workflow-for-agent-based-installer.md for some examples of what I mean. I happen to like that because it helps focus the reader on the problem and solution being described in the present, rather than worrying about the future. It's not always successful at doing that, though. :-)

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general i think this is a really cool enhancement, my selfish desire is to see more language about the platform integration story in addition to the stories about admins who want to enable additional functionality.

i have a suspicion that these platform operators could become a helpful tool for people who want to write platform specific components that can supplement openshift. for example, a platform operator who wants to write custom cloud controller managers, machine-api controllers, and network ingress supplements would have a useful mechanism for bundling this all together from install time.

i realize i might be asking for things that will fit more naturally into later phase work, but i would still love to see some signally of that direction.

@timflannagan
Copy link
Contributor Author

FYI - there's a #tmp-platform-operators channel in the coreos slack workspace in case anyone is interested.

@bparees
Copy link
Contributor

bparees commented Aug 9, 2022

my comments are addressed, thanks

@timflannagan timflannagan force-pushed the add-platform-operators-phase-0-proposal branch from ee66359 to 9c69d51 Compare August 9, 2022 15:41
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 9, 2022
@timflannagan
Copy link
Contributor Author

(squashed all the commits, and ran the proposal through a spell checker)

@exdx
Copy link

exdx commented Aug 10, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2022

- Implementing this PO mechanism using legacy OLM would provide more immediate business value. That implementation is outlined below in the alternatives section.
- Early phases will be delivered through tech preview guidelines. This leads to little customer feedback, given it precludes production clusters.
- Early phases may introduce a new OLM 1.x control plane that has little-to-no interaction with the existing, legacy OLM 0.x control plane.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth expanding on this one?

One reason this is a drawback is because it can result in a sort of split-brain situation where incompatible operators end up being deployed since Rukpak and legacy OLM are not aware of what the other is doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth calling out that split brain behavior as a drawback, but there's still some ambiguity here on what that interaction looks like in practice given we haven't had any concrete examples of deploying packages using APIs from both control planes.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2022
@joelanford
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2022

### Open Questions [optional]

- **Can future phase designs evaluate whether platform operators can be extended to cover things like the MAPI and CCM operators?**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly does this mean? Are you trying to say that we need to support a class of operator package that fulfills some role? Perhaps we want to expand the background here

Copy link
Contributor Author

@timflannagan timflannagan Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's been a bunch of threads in this proposal around whether future PO designs can accommodate some of the install flexibility efforts. This was added as a soft tracking vehicle in the short term, and something we can re-evaluate after phase 0.

@elmiko might be able to provide some of those use cases, but I can try and take a stab at adding some background if we think it's necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, I read this and go "well why can't it do that" so a little colour may help others reading this to understand :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the moment, i think our big questions are around how we could control installing POs during the bootstrap phase of cluster installation, and to that point how we could ensure that something like a Cloud Controller Manager, or Machine API actuator, could be installed at the proper times during the deployment process.

there might be additional followup questions and issues that arise from this discussion, but currently my mind goes to how could provide the infrastructure specific POs during an installations type scenario in such a manner that they could be utilized during the process.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 15, 2022
@JoelSpeed
Copy link
Contributor

Thanks, changes LGTM

Should we squash the commits?

@timflannagan
Copy link
Contributor Author

@JoelSpeed Yep, we should definitely squash here: I was just trying to save people the UI spam when you force push constantly. I'll get back to this once we're at the finish line.

@dhellmann
Copy link
Contributor

@JoelSpeed Yep, we should definitely squash here: I was just trying to save people the UI spam when you force push constantly. I'll get back to this once we're at the finish line.

/label tide/merge-method-squash will ask tide to do that for you when the PR is approved.

Signed-off-by: timflannagan <timflannagan@gmail.com>
@timflannagan timflannagan force-pushed the add-platform-operators-phase-0-proposal branch from cf1de5a to b06a202 Compare August 17, 2022 15:50
@timflannagan
Copy link
Contributor Author

Ended up manually squash merging locally. IIRC, that tide merge method just combines all the commit messages into a single one so overall commit message can be difficult to read.

Added a "future work" section that doug had suggested, and ran the proposal through a spell checker.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2022

@timflannagan: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@joelanford
Copy link
Member

/approve

Great work on this @timflannagan!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: exdx, joelanford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2022
@tylerslaton
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 17, 2022
@openshift-merge-robot openshift-merge-robot merged commit 2549540 into openshift:master Aug 17, 2022
@timflannagan timflannagan deleted the add-platform-operators-phase-0-proposal branch August 17, 2022 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.