-
Notifications
You must be signed in to change notification settings - Fork 532
Baremetal IPI Network Configuration for Day-1 #817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,289 @@ | ||
| --- | ||
| title: baremetal-ipi-network-configuration | ||
| authors: | ||
| - "@cybertron" | ||
| - "@hardys" | ||
| - "@zaneb" | ||
| reviewers: | ||
| - "@kirankt" | ||
| - "@dtantsur" | ||
| - "@zaneb" | ||
| approvers: | ||
| - "@trozet" | ||
| - "@staebler" | ||
| creation-date: 2021-05-21 | ||
| last-updated: 2021-10-27 | ||
| status: implementable | ||
|
|
||
| see-also: | ||
| - "/enhancements/host-network-configuration.md" | ||
| - "/enhancements/machine-config/mco-network-configuration.md" | ||
| - "/enhancements/machine-config/rhcos/static-networking-enhancements.md" | ||
| --- | ||
|
|
||
| # Baremetal IPI Network Configuration | ||
|
|
||
| Describe user-facing API for day-1 network customizations in the IPI workflow, | ||
| with particular focus on baremetal where such configuration is a common | ||
| requirement. | ||
|
|
||
| ## Release Signoff Checklist | ||
|
|
||
| - [*] Enhancement is `implementable` | ||
| - [*] Design details are appropriately documented from clear requirements | ||
| - [ ] Test plan is defined | ||
| - [ ] Operational readiness criteria is defined | ||
| - [ ] Graduation criteria for dev preview, tech preview, GA | ||
| - [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
|
||
| ## Summary | ||
|
|
||
| Currently in the IPI flow, there is no way to provide day-1 network configuration | ||
| which is a common requirement, particularly for baremetal users. We can build | ||
| on the [UPI static networking enhancements](https://github.com/openshift/enhancements/blob/master/enhancements/rhcos/static-networking-enhancements.md) | ||
| to enable such configuration in the IPI flow. | ||
|
|
||
| ## Motivation | ||
|
|
||
| Since the introduction of baremetal IPI, a very common user request is how | ||
| to configure day-1 networking, and in particular the following cases which are not currently possible: | ||
|
|
||
| * Deploy with OpenShift Machine network on a tagged (non-default) VLAN | ||
| * Deploy with OpenShift Machine network using static IPs (no DHCP) | ||
|
|
||
hardys marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| In both cases, this configuration cannot be achieved via DHCP so some | ||
| means of providing the configuration to the OS is required. | ||
|
|
||
| In the UPI flow this is achieved by consuming user-provided NetworkManager | ||
| keyfiles, as an input to `coreos-install --copy-network`, but there is | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a reference, assisted installer also uses this mechanism |
||
| no corresponding user interface at the openshift-install level. | ||
|
|
||
| Additionally, there are other networking configurations that would be useful | ||
| to configure via the same mechanism, even though it may be possible to | ||
| accomplish them in another way. For example: | ||
|
|
||
| * Deploy with OpenShift Machine network on a bond | ||
| * Deploy with OpenShift Machine network on a bridge | ||
| * Configure attributes of network interfaces such as bonding policies and MTUs | ||
|
|
||
| The proposed solutions should all be flexible enough to support these use | ||
| cases, but it is worth noting in case an alternative with a narrower scope | ||
| would be put forward. | ||
|
|
||
| ### Goals | ||
|
|
||
| * Define API for day-1 network customizations | ||
| * Enable common on-premise network configurations (bond+vlan, static ips) via IPI | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we should mention if it's a goal to provide different bond/vlan config per host (not just static IPs, which obviously have to be per-hosts.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I guess bond configuration could differ between hosts or groups of hosts (different nic names etc) - I think the API described here is basically always config per-host, but we can then perhaps optimize by detecting common config (to avoid duplicate images). I also wonder if we have per-host config in the install-config.yaml if we can make yaml anchors/aliases work so that e.g in the case where you have a single bond/vlan config for every host you don't have to copy/paste it for each host.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this kind of relates to the de-duplication conversation below. I'll add something about that as a stretch goal. |
||
|
|
||
| Initially these configurations will be one per host. If there is time, an | ||
| additional goal would be to provide a mechanism to apply a single config to all | ||
| nodes of a particular type. For example, one config for all masters and another | ||
| config for all workers. | ||
|
|
||
| ### Non-Goals | ||
|
|
||
| * Platforms other than `baremetal`, although the aim is a solution which could be applied to other platforms in future if needed. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is 'baremetal'? is that platform=baremetal? Or is a 'baremetal' install on AWS (using a baremetal method; platform=none) acceptable?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That doc appears to be describing UPI. This is baremetal IPI (i.e. platform=baremetal). |
||
| * Enabling kubernetes-nmstate by default for day-2 networking is discussed via | ||
| [another proposal](https://github.com/openshift/enhancements/pull/747) | ||
| * Provide a consistent (ideally common) user API for deployment and post-deployment configuration. Getting agreement on a common API for day-1 and day-2 has stalled due to lack of consensus around enabling kubernetes-nmstate APIs (which are the only API for day-2 currently) by default | ||
| * Configuration of the provisioning network. Users who don't want DHCP in their | ||
| deployment can use virtual media, and users who want explicit control over the | ||
| addresses used for provisioning can make the provisioning network unmanaged and | ||
| deploy their own DHCP infrastructure. | ||
|
|
||
| ## Proposal | ||
|
|
||
| ### User Stories | ||
|
|
||
| #### Story 1 | ||
|
|
||
| As a baremetal IPI user, I want to deploy via PXE and achieve a highly | ||
| available Machine Network configuration in the most cost/space effective | ||
| way possible. | ||
|
|
||
hardys marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| This means using two top-of-rack switches, and 2 NICS per host, with the | ||
| default VLAN being used for provisioning traffic, then a bond+VLAN configuration | ||
| is required for the controlplane network. | ||
|
|
||
| Currently this [is not possible](https://bugzilla.redhat.com/show_bug.cgi?id=1824331) | ||
| via the IPI flow, and existing ignition/MachineConfig APIs are not sufficient | ||
| due to the chicken/egg problem with accessing the MCS. | ||
|
|
||
| #### Story 2 | ||
|
|
||
| As an on-premise IPI user, I wish to use static IPs for my controlplane network, | ||
| for reasons of network ownership or concerns over reliability I can't use DHCP | ||
| and therefore need to provide a static configuration for my primary network. | ||
|
|
||
| There is no way to provide [MachineSpecific Configuration in OpenShift](https://github.com/openshift/machine-config-operator/issues/1720) so I am | ||
| forced to use the UPI flow which is less automated and more prone to errors. | ||
|
|
||
| ### API Extensions | ||
|
|
||
| This does not modify the API of the cluster. | ||
|
|
||
| ### Risks and Mitigations | ||
|
|
||
cybertron marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| In some existing workflows, kubernetes-nmstate is used to do network configuration on day-2. Using a different interface for day-1 introduces the potential for mismatches and configuration errors when making day-2 changes. | ||
| However, this is mitigated by the fact that the exact same configuration data can be used for both interfaces. The nmstate configuration provided to the installer can be copied directly into a NodeNetworkConfigurationPolicy for kubernetes-nmstate. | ||
| While there's still the potential for user error, the process is much simpler and less error-prone than if completely different formats were used. | ||
|
|
||
| ## Design Details | ||
|
|
||
| In the IPI flow day-1 network configuration is required in 2 different cases: | ||
|
|
||
| * Deployment of the controlplane hosts via terraform, using input provided to openshift-install | ||
| * Deployment of compute/worker hosts (during initial deployment and scale-out), via Machine API providers for each platform | ||
|
|
||
| In the sections below we will describe the user-facing API that contains network configuration, and the proposed integration for each of these cases. | ||
|
|
||
| ### User-facing API | ||
|
|
||
| RHCOS already provides a mechanism to specify NetworkManager keyfiles during deployment of a new node. We need to expose that functionality during the IPI install process, but preferably using [nmstate](https://nmstate.io) files as the interface for a more user-friendly experience. There are a couple of options on how to do that: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd really recommend using nmstate files. It's simpler, and easy-enough to integrate. As a bonus point, it will make the experience between UPI, IPI, and assisted more consistent. Furthermore, this would make the dream of stronger integration between assisted and IPI more realistic. |
||
|
|
||
| * A new section in install-config. | ||
| * A secret that contains base64-encoded content for the keyfiles. | ||
hardys marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| These are not mutually exclusive. If we implement the install-config option, we will still need to persist the configuration in a secret so it can be used for day-2. | ||
|
|
||
| The data provided by the user will need to have the following structure: | ||
| ```yaml | ||
| <hostname>: <nmstate configuration> | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know you mentioned at first it will be per host, but I think that will be really annoying for some users. Assume in my scenario all my hosts are homogeneous and I have 120 of them. That's a giant config. Could we just support applying the same config to all nodes at least initially as well?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That sounds reasonable. I'll have to defer to the hardware folks who are doing the implementation, but I'd be in favor of something like that. Maybe we could even provide a default configuration that would automatically be applied to all nodes, unless a given node had a specific configuration provided?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the implementation we won't care either way. It's just a question of whether we can give it an interface that makes sense.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for default config / templated configs and some sort of node selector mechanism |
||
| <hostname 2>: <nmstate configuration> | ||
| etc... | ||
| ``` | ||
|
|
||
| For example: | ||
| ```yaml | ||
| openshift-master-0: | ||
| interfaces: | ||
| - name: eth0 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so this works if you know the NIC name. What about doing something like TripleO did with introspection... identifying NICs dynamically and mapping them to networks. I get this might not be applicable to the scope of this enhancement, just thinking longer term how that would work with this API, or if that would need a new API? An example of this is I might have a scenario where I want my kapi traffic to go over the ctlplane network, but I want to define another network for all my default egress OVN traffic.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that would need a higher level API. We're using raw nmstate configuration data here, and it's not going to know anything about OCP networks. A higher level api might be something to consider for the cross-platform implementation of this? The one drawback is that we were trying to stay as close as possible to the kubernetes-nmstate CRD so if someday we have kubernetes-nmstate available on day 1 we could use that CRD and avoid the custom config. A higher level interface could just intelligently populate the CRD though, so those might not be mutually exclusive goals.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This config has to be applied before introspection, because introspection happens... over the network. In the long term I think we should just think of this as a way to configure enough of the network to be able to communicate with ironic. In the short term people will inevitably use it to put their whole network config in because we're not providing them with another way of setting up the network config for the running cluster nodes at install time, but we shouldn't focus too much on that. This is a good point that the end solution for cluster networking configuration may not look like just sticking this same nmstate file into k8s-nmstate, but may involve some other new operator that takes in higher-level data about mapping NICs to networks, inspection data from hosts, and the initial nmstate, then combines them to produce the network config for the cluster node that it then passes to k8s-nmstate. Perhaps this also answers your other question about whether this field should be baremetal-specific: yes, because no other platforms have this pre-boot configuration problem.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to mention, I know that the nmstate folks (@qinqon in particular) are working on a dynamic way to generate nmstate configurations depending on the current network layout (i.e. defining the bridge on top of the default gateway, instead of sticking the interface name). Not sure of the state, because we discussed it some time ago, but I think that would solve this issue.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, yes we are working on a tool named We have a design document with examples, here it's the example you are interested on: capture:
default-gw: routes.running.destination=="0.0.0.0/0"
base-iface-routes: routes.running.next-hop-interface==capture.default-gw.routes.running[0].next-hop-interface
base-iface: interfaces.name==capture.default-gw.routes.running[0].next-hop-interface
bridge-routes: capture.base-iface-routes | routes.running.next-hop-interface:="br1"
delete-base-iface-routes: capture.base-iface-route | routes.running.state:="absent"
bridge-routes-takeover: capture.delete-base-iface-routes.routes.running + capture.bridge-routes.routes.running
desiredState:
interfaces:
- name: br1
description: Linux bridge with base interface as a port
type: linux-bridge
state: up
ipv4: {{ capture.base-iface.interfaces[0].ipv4 }}
bridge:
options:
stp:
enabled: false
port:
- name: {{ capture.base-iface.interfaces[0].name }}
routes:
config: {{ capture.bridge-routes-takeover.running }}
}}With DHCP activated is simpler.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that will help; nmstatectl cannot run on the host, it must run on the cluster and produce keyfiles that are incorporated into the ISO before it even boots on the host. We don't currently have a way of building e.g. a container image into the ISO, so everything but the Ignition (which contains the keyfiles) must come over the network and therefore cannot be required to set up the network.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The nmstate team is reimplementing nmstatectl so it's just a binary without python dependencies, will that help with that ? They are also going to implement a flag to configure directly on the kernel with netlink bypassing NetworkManager.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unless it ships installed by default in the CoreOS live ISO, no. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@zaneb also UPI can gain from having static ips, right? This is basically what is already happening in the assisted installer.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @romfreiman UPI can already configure static ips, just not via nmstate (or automated image building), if we want UPI to support nmstate syntax in future that should be covered via another proposal, IMO it's outside of the scope of this enhancement. |
||
| type: ethernet | ||
| etc... | ||
| openshift-master-1: | ||
| interfaces: | ||
| - name: eth0 | ||
| etc... | ||
| ``` | ||
|
|
||
| In install-config this would look like: | ||
| ```yaml | ||
| platform: | ||
| baremetal: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we put it under baremetal if there might be plans in the future to use it on other platforms? I'm thinking here about replacing ovs-configuration with this.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem with doing anything cross-platform for this release is that we need image customization functionality that is currently only provided by baremetal IPI. As I understand it there is a similar feature in development that will work on all platforms, but it wasn't going to be available soon enough to satisfy our baremetal use cases. I suppose we could put the configuration somewhere more generic, but since it won't work on anything but baremetal right now that could be confusing.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. per our conversation it seems like we could use this for baremetal to replace configure-ovs, and configure multiple nics to be on OVS bridges. configure-ovs can still exist (we need it anyway for hte other platforms), and today just skips doing anything if br-ex is already up and configured. However, this will allow us to do some more complex types of network deployments on baremetal, so seems like a solid first step to me. |
||
| hosts: | ||
| - name: openshift-master-0 | ||
| networkConfig: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this will support all types of nmstate configuration right?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think so? Under the covers we'll be using the nmstatectl cli to convert this to nmconnection files that will be baked into the image. I'm not aware of any nmstate configurations that couldn't be handled that way, but that doesn't mean there aren't any. |
||
| interfaces: | ||
| - name: eth0 | ||
| type: ethernet | ||
| etc... | ||
| ``` | ||
|
|
||
| Because the initial implementation will be baremetal-specific, we can put the | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. baremetal-specific or host specific?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. specific to the baremetal platform |
||
| network configuration data into the baremetal host field, which will allow easy | ||
| mapping to the machine in question. | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the current plan is to initially enable this interface only for the In that case, we could instead add the config to the platform-specific There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with Steve. This should belong in the hosts section of the baremetal platform. e.g. Please note
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, this sounds reasonable. I'd be more concerned about using a baremetal-specific structure if we had the common nmstate API, but since this implementation isn't going to be useful to other groups like OVNK anyway I'm good with it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess that raises the question (already mentioned in the alternatives) of whether we want this to be nmstate format or just inlined keyfiles? I guess we need the interface to enable future nmstate support even if we start with keyfiles. Also for those familiar with OpenStack
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we use nmstate yaml for install-config then we need nmstate available on the provisioning host so we can convert it to keyfiles. I'm fine with doing that (it's how we'd have to process NNCP records too), but it does add a dependency. I'm not sure how controversial that will be. It's also a bit more complex implementation-wise than straight keyfiles. One option would be to replace the interface name with the filename, i.e. eth0 -> eth0.nmconnection. That's going to have to happen at some point anyway, and then we could potentially add support for eth0.yaml, which would indicate nmstate data. But maybe that's a little too magic? +1 to not overloading networkData.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@cgwalters this doesn't work for controlplane network configuration as previously discussed via coreos/ignition#979 - there is a chicken/egg problem e.g networking to be up before it can retrieve the merged config from the MCS. So instead we're adopting the same process as UPI here, generate an ignition config for the coreos installer live-iso which then contains the network configs for the live-iso and
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm talking about injecting into the pointer configuration, not the MCS config. But I think that's not relevant because:
Right, OK! So I think a similar case applies though - this "inject arbitrary network configuration into live ISO" really generalizes into "inject arbitrary Ignition into live ISO". I know some people in the past have disagreed with exposing this "arbitrary Ignition config" functionality, but in practice we already expose completely arbitrary things via MachineConfig, so having this generic functionality to me does not cost much in terms of support. I think some prior objections were rooted in us not having an opinionated high level sugar for Ignition, which is partially being addressed by shipping butane. Now, we don't need to gate this enhancement on having an official way to inject extra ignition into the installer's generated configs. I'm more arguing that that's the way to think of it - this is just "syntactic sugar" for an ignition config which writes the provided data to
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As I said, this doesn't work, because Ignition tries to resolve the UPI works around this limitation by allowing users to embed network configuration into the live-iso, then apply that to the deployed OS via
This is true, but we're trying to reduce the friction around a really common for baremetal case here, not expose a way to arbitrarily customize the live-iso process. Use of the live-iso here is arguably an implementation detail, particularly given that no other IPI platforms currently use it, so exposing a generic cross-platform installer interface would not be possible?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think it's misleading to say it that way. Ignition is the mechanism to configure the Live ISO - including networking for the Live ISO. The whole design is intended to enable exactly this. The way I'd say it instead is that configuring the Live ISO via Ignition allows handling effectively arbitrary network/install requirements.
We already do expose this to be clear;
Well, I think it would boil down to having a "configure the pointer config" hook, and a "configure the live ISO config", where the latter would only apply on baremetal IPI. Anyways...hmm, I guess my bottom line is I am not opposed to the proposal of a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ack, thanks @cgwalters - I think we're basically in agreement - I don't think anything here precludes potentially adding a more generalized live-iso-ignition "escape hatch" in future. |
||
| ### Processing user configuration | ||
|
|
||
| #### Deployment of the controlplane hosts via terraform | ||
|
|
||
| We will map the keyfiles to their appropriate BareMetalHost using the host field | ||
| of the baremetal install-config. The keyfiles will then be added to custom | ||
| images for each host built by Terraform and Ironic. | ||
|
|
||
| Since different configuration may be needed for each host (for example, when | ||
| deploying with static IPs), a Secret per host will be created. A possible | ||
| future optimization is to use a single secret for scenarios such as VLANs | ||
| where multiple hosts can consume the same configuration, but the initial | ||
| implementation will have a 1:1 Secret:BareMetalHost mapping. | ||
|
|
||
| #### Deployment of compute/worker hosts | ||
|
|
||
| BareMetalHost resources for workers will be created with the Secret containing the network data referenced in the `preprovisioningNetworkData` field defined in the Metal³ [image builder integration design](https://github.com/metal3-io/metal3-docs/blob/master/design/baremetal-operator/image-builder-integration.md#custom-agent-image-controller). | ||
| This will cause the baremetal-operator to create a PreprovisioningImage CRD and wait for it to become available before booting the IPA image. | ||
|
|
||
| An OpenShift-specific PreprovisioningImage controller will use the provided network data to build a CoreOS IPA image with the correct ignition configuration in place. This will be accomplished by [converting the nmstate data](https://nmstate.io/features/gen_conf.html) | ||
| from the Secret into NetworkManager keyfiles using `nmstatectl gc`. The baremetal-operator will then use this customised image to boot the Host into IPA. The network configuration will be retained when CoreOS is installed to disk during provisioning. | ||
|
|
||
| If not overridden, the contents of the same network data secret will be passed to the Ironic custom deploy step for CoreOS that installs the node, though this will be ignored at least initially. | ||
|
|
||
| ### Test Plan | ||
|
|
||
| Support will be added to [dev-scripts](https://github.com/openshift-metal3/dev-scripts) for deploying the baremetal network without DHCP enabled. A CI job will populate install-config with the appropriate network configuration and verify that deployment works properly. | ||
|
|
||
| ### Graduation Criteria | ||
|
|
||
| We expect to support this immediately on the baremetal IPI platform. | ||
|
|
||
| #### Dev Preview -> Tech Preview | ||
|
|
||
| N/A | ||
|
|
||
| #### Tech Preview -> GA | ||
|
|
||
| N/A | ||
|
|
||
| #### Removing a deprecated feature | ||
|
|
||
| N/A | ||
|
|
||
| ### Upgrade / Downgrade Strategy | ||
|
|
||
| There should be little impact on upgrades and downgrades. Nodes are deployed with network configuration baked into the image, which means it will remain over upgrades or downgrades. NetworkManager keyfiles are considered a stable interface so any version of NetworkManager should be able to parse them equally. The same is true of nmstate files. | ||
|
|
||
| Any additions or deprecations in the keyfile interface would need to be handled per the NetworkManager policy. | ||
|
|
||
| ### Version Skew Strategy | ||
|
|
||
| As this feature targets day-1 configuration there should be no version skew. Day-2 operation will be handled by other components which are outside the scope of this document. | ||
|
|
||
| ### Operational Aspects of API Extensions | ||
|
|
||
| NA | ||
|
|
||
| #### Failure Modes | ||
|
|
||
| Improper network configuration may cause deployment failures for some or all nodes in the cluster, depending on the nature of the misconfiguration. | ||
|
|
||
| #### Support Procedures | ||
|
|
||
| Because a networking failure is likely to make a node inaccessible, it may be necessary to access the failed node via its BMC (iDRAC, iLO, etc.) to determine why the network config failed. | ||
|
|
||
| ## Implementation History | ||
|
|
||
| 4.9: Initial implementation | ||
|
|
||
| ## Drawbacks | ||
|
|
||
| Adds a dependency on NMState. However, NMState provides a strong backward compatibility | ||
| promise (much like NetworkManager itself), so this should be a stable interface. | ||
|
|
||
| ## Alternatives | ||
|
|
||
| ### Use Kubernetes-NMState NodeNetworkConfigurationPolicy custom resources | ||
|
|
||
| If we were able to install the [NNCP CRD](https://nmstate.io/kubernetes-nmstate/user-guide/102-configuration) | ||
| at day-1 then we could use that as the configuration interface. This has the advantage of matching the configuration syntax and objects used for day-2 network configuration via the operator. | ||
|
|
||
| This is currently blocked on a resolution to [Enable Kubernetes NMstate by default for selected platforms](https://github.com/openshift/enhancements/pull/747). Without NMState content available at day-1 we do not have any way to process the NMState configuration to a format usable in initial deployment. | ||
| While we hope to eventually come up with a mechanism to make NMState available on day-1, we needed another option that did not make use of NMState in order to deliver the feature on time. | ||
|
|
||
| In any event, we inevitably need to store the network config data for each node in a (separate) Secret to satisfy the PreprovisioningImage interface in Metal³, so the existence of the same data in a NNCP CRD is irrelevant. In future, once this is available we could provide a controller to keep them in sync. | ||
|
|
||
| #### Create a net-new NMState Wrapper CR | ||
|
|
||
| The [assisted-service](https://github.com/openshift/assisted-service/blob/0b0e3677ae83799151d11f1267cbfa39bb0c6f2e/docs/hive-integration/crds/nmstate.yaml) has created a new NMState wrapper CR. | ||
|
|
||
| We probably want to avoid a proliferation of different CR wrappers for nmstate | ||
| data, but one option would be to convert that (or something similar) into a common | ||
| OpenShift API, could such an API be a superset of NNCP e.g also used for day-2? | ||
|
Comment on lines
+277
to
+279
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mentioned this in an earlier comment. There are some advantages to this wrapper, whether it uses it's own CR or just reads a secret and runs validations/actions on the data. I'd love for assisted and IPI to have a common interface here, it would help with the integration of both projects and it will make it easier for users as well. |
||
|
|
||
| This would mean we could at least use a common configuration format based on nmstate with minimal changes (or none if we make it _the_ way OpenShift users interact with nmstate), but unless the new API replaces NNCP there is still the risk of configuration drift between the day-1 and day-2 APIs. And we still need a Secret to generate the PreprovisioningImage. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agreed, maybe this is a good opportunity to bump this conversation? Try to find some alignment here?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @flaper87 This PR has been up for nearly 4 months, and designing a new API for network config is explicitly stated as a non-goal, so I'd suggest we defer that discussion ;) Context is - we wanted to use NNCP, but because kubernetes-nmstate (and thus that CRD) isn't deployed via the release payload, we can't consume that API at install-time - discussion on that stalled ref #747 The same problem exists for the new AI nmstate CRD, it's only available via AI or when using the CIM layered-product stack. IMO we definitely need to provide a core-OCP API for network configuration, but given the historical lack of progress on that discussion, we decided instead to focus on the existing lower level interfaces instead (e.g Secrets), any future CRD API can obviously be layered on top if/when we manage to achieve alignment :)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd rather not try to boil the ocean and solve all the networking problems at once, particularly because, as noted in this section, "...we still need a Secret to generate the PreprovisioningImage." Any higher level general purpose networking API is going to have to build on this functionality anyway, not replace it. This is step one in a longer journey that has been delayed for years already. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FWIW, I was not proposing to design a new config. That said, I didn't realize this had been up for so long, 😅 |
||
|
|
||
| ### Pass NetworkManager keyfiles directly | ||
|
|
||
| NetworkManager keyfiles are already used (directly with CoreOS) for UPI and when doing network configuration in the Machine Config Operator (MCO). However, they are harder to read, harder to produce, and they don't store well in JSON. | ||
|
|
||
| In addition, we hope to eventually base Day-2 networking configuration on nmstate NodeNetworkConfigurationPolicy (as KubeVirt already does). So using the nmstate format provides a path to a more consistent interface in the future than do keyfiles. | ||
|
|
||
| If we were eventually decide never to use NNCP and instead try to configure Day-2 networking through the Machine Config Operator, then it might perhaps be better to use keyfiles as that is what MCO uses. | ||
Uh oh!
There was an error while loading. Please reload this page.