Skip to content

Conversation

@pablintino
Copy link
Contributor

@pablintino pablintino commented Oct 24, 2025

This commit adds the formal enhancement proposal for MCO-1914.

This feature adds the ability to run multiple OS base images simultaneously in a cluster by allowing users to select an image 'stream' on a per-MCP basis.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jupierce for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pablintino pablintino changed the title [MCO-1914] Enhacement of the MCO OS streams feature [MCO-1928] Enhacement of the MCO OS streams feature Oct 24, 2025
@pablintino pablintino changed the title [MCO-1928] Enhacement of the MCO OS streams feature MCO-1928: Enhacement of the MCO OS streams feature Oct 24, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 24, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 24, 2025

@pablintino: This pull request references MCO-1928 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

This commit adds the formal enhancement proposal for MCO-1914.

This feature adds the ability to run multiple OS base images simultaneously in a cluster by allowing users to select an image 'stream' on a per-MCP basis.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

This commit adds the formal enhancement proposal for MCO-1914.

This feature adds the ability to run multiple OS base images
simultaneously in a cluster by allowing users to select an image
'stream' on a per-MCP basis.

Not applicable.

## Alternatives (Not Implemented)
Copy link
Contributor

@yuqi-zhang yuqi-zhang Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer a question from the API review office hours, the alternatives we considered are:

  1. expanding on the existing configmap to refer to more images
  2. having this as a status field on an existing object, i.e. the MachineConfiguration cluster object

1 was deemed less optimal since it's an arbitrary map without guardrails, and also with a new API we can use VAP to wire up cross-object validation
2 was deemed less optimal since it's managed by the CVO intead of the MCO, and thus the bootstrap-time MCO doesn't understand or process it (I suppose we could set that up, though). We would like to have a bootstrap (install) time workflow that matches the in-cluster workflow that we can process in the same way, possibly eventually as a install-config field, so the new API object that the MCO manages would be easier to work in both cases.

Also, the MachineConfiguration has introduced quite a lot of new fields in the past few versions, mostly for cluster-knobs (so configurables, instead of status-only discovery objects), so we thought a new CR would be clearer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more context around 1: It's almost impossible to implement stream discovery using the CM. The config map context can just be templated using the simple placeholder replace logic the installer has, but any new stream will require a manual touch to it.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few general thoughts (also inline):

  1. a bit too much detail on the implementation sections, becomes a bit hard to follow, if we can make some sections more concise, and outline a more general phase approach instead of per-section phase approach, I think that might be more clear
  2. let's be careful about specific version naming and commitments that's more long term
  3. let's be clear on the scope. On point 1, we can also scope this enhancement more on the immediate implementation (rhel9->10) and leave details for future phases when we get to them


### User Stories

- **Specify OS stream per pool**: Set `spec.osImageStream` on a MachineConfigPool to provision nodes with a different OS version than the rest of the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, these sound more like workflow and constraints than user stories. The user stories should be presented more like the motivation section above (admin use cases). The motivation can also talk about some of the struggles we faced when going from rhel8->9 instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

- **API-driven stream management**: Declarative, Kubernetes-native API for stream selection
- **Automatic stream discovery**: Populate available OS streams from release payload ImageStream metadata
- **Backward compatibility**: Existing clusters continue working; streams are opt-in via feature gate
- **Multi-source stream configuration**: Support CLI arguments, release ImageStream, and ConfigMap sources with defined precedence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this point is talking about, could you clarify?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made this section more clear


### Non-Goals

- **Supporting unlimited concurrent OS streams**: While the architecture supports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lean towards not considering this a non-goal for now, since we're not sure about the exact future plans.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

automated migration logic or upgrade orchestration. Administrators must manually
select streams for their pools.

- **Bidirectional stream switching**: Only RHEL 9 → RHEL 10 migration is supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As somewhat of a general comment, I feel like part of the enhancement describes the overall goal of multi-streams, but some of the enhancement seems to want to frame this as a RHEL9->RHEL10 transition. I think we should be a bit more clear around the framing. So we would either:

  1. scope the enhancement to RHEL9->RHEL10 transition as the plan for now, and explicitly call out not including other streams in other sections
  2. take a phased approach and clearly call out the boundaries of each phase (starting with rhel9->10, adding more in the future)

Right now reading some of the background it sounds like we're introducing a whole variety of streams which is not the immediate plan, so I think it would be good to have some clarity for the readers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha. thank you for the clarification

Standard MachineConfigPool rollback mechanisms apply, but stream-specific rollback
is out of scope for the initial release.

- **Version skew enforcement**: Enforcing compatibility rules between different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the following three points are out of scope enough that I don't think we should mention them here.

Also we should be careful around target versions for features not in this enhancement, so I'd lean towards removing these 4 points.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

- New clusters install with RHCOS 9
- Existing clusters remain on their current stream

**OpenShift 5.0+ Releases:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, let's not talk about specific openshift versions, since that has not been decided anywhere

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


3. **New MachineConfigPools created in OpenShift 5.0**: Use the new default `"rhel10-coreos"`

**Migration Process:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some considerations around having the update be in one reboot (4.y + rhel9->4.(y+1) + rhel10, as opposed to 4.y + rhel9 -> 4.y + rhel10 -> 4.(y+1) + rhel10.

Regardless, this is for a future phase which I think we should clarify about

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

automatic side effects of platform upgrades. This aligns with the core goal of
separating platform upgrades from OS version transitions.

##### Backward Compatibility with Pre-Streams Clusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These following sections are quite verbose and have too many details. Would you be able to restructure and simplify?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Starting in OpenShift 4.20/4.21, separate RHEL 10 release streams will be produced to
enable early testing:
- Custom RHEL 10-based payloads from nightly builds (similar to F-COS/S-COS for OKD)
- Enables testing RHEL 10 before it becomes the default in OpenShift 5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let's remove references to openshift 5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Note: RHEL 8 → 9 transition revealed distinct bugs with FIPS and real-time kernel
variants not seen in standard testing

**Image Mode OpenShift:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't talked about image mode at all in the previous sections. Should we have a snippet above if we're going to talk about testing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


## Motivation

OpenShift is transitioning from RHEL CoreOS 9 to RHEL CoreOS 10. Currently, all nodes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, they can override the osImageURL of each pool to test a different image. I'd prefer to phrase this as: "Currently, all nodes in a cluster share a single default OS image and there's no straightforward per-MCP upgrade option" (or similar)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

changes together, increasing risk.

This enhancement enables administrators to:
- Upgrade OpenShift platform versions without being forced to change OS versions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is partially true. At some point the stream will disappear, so we will tell the user, hey, upgrade the OS or your pools if you want to upgrade the cluster.
In other words, imagine we ship 4.23 with two streams rhel10-coreos (default) and rhel10-coreos. In the future, ie 3 years (time took from yesterday's conversation around defaults with the CoreOS team) the rhel9-stream will dissapear. We will flag the cluster as not upgradable, to let the user know he is forced to update to jump to the next OCP version.
Why this? There's no clean way to have a long-standing default ie rhel-coreos without abruptly upgrading the OS during OCP updates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Support dual RHCOS 9 and RHCOS 10 streams (`rhel9-coreos` and `rhel10-coreos`)
- Enable per-pool stream selection and gradual migration
- One-directional migration only (RHEL 9→10, no downgrade support)
- Backward compatibility: pools without explicit stream selection maintain current OS version during platform upgrades
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, maybe is worth to mention that if no osImageStream is set we will default to rhel9-coreos.
How do we know the default stream? Still under discussion, it may be fed by the installer, har-coded on our CM, etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#### Future Phases (Out of Scope for Initial Release)

- Additional stream variants (minimal OS images, hardened variants, etc.)
- Support for more than 2 concurrent streams
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be true, we will have support for multiple stream from zero, it's just that we won't have more than 2 available.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### Non-Goals

- Automated migration orchestration
- Automatic rollback from failed stream changes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this one the same thing as Bidirectional stream switching (RHEL 10→9 downgrade), or better said, like a requirement to be able to rollback to RHEL9.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Implementation is located in `pkg/controller/osimagestream/osimagestream.go` and
`BuildOSImageStreamFromSources()`.

##### Default Stream Evolution and Upgrade Behavior
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consolidated

Comment on lines 477 to 460
**Initial Releases (RHCOS 9 default):**
- Default stream: `"rhel9-coreos"`
- Available streams: `"rhel9-coreos"`, `"rhel10-coreos"` (Tech Preview)
- New clusters install with RHCOS 9
- Existing clusters remain on their current stream

**Later Releases (RHCOS 10 default):**
- Default stream: `"rhel10-coreos"`
- Available streams: `"rhel9-coreos"`, `"rhel10-coreos"`
- New clusters install with RHCOS 10
- Existing clusters upgrading from earlier releases remain on their current stream (typically `"rhel9-coreos"`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great section and summary :)


Not applicable.

## Alternatives (Not Implemented)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more context around 1: It's almost impossible to implement stream discovery using the CM. The config map context can just be templated using the simple placeholder replace logic the installer has, but any new stream will require a manual touch to it.

This enhancement introduces two API changes: modifications to the existing
MachineConfigPool API and a new OSImageStream cluster-scoped resource.

#### MachineConfigPool API Changes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the correct section, but we need to mention that the spec.osImageStream field will be validated by a VAP using the OSImageStream singleton object as the source of truth for the valid streams.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 568 to 570
3. **Placeholder Replacement**: When building the release image, `oc` reads the
`image-references` file and replaces placeholders in the ConfigMap manifest with
actual image URLs from the Release ImageStream tags.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated here, we won't be doing this. The CM will maintain the current structure and won't have a clue about streams.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dkhater-redhat dkhater-redhat force-pushed the mco-1928 branch 2 times, most recently from bfcaf6e to b0fd8bc Compare November 19, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants