|
| 1 | +--- |
| 2 | +title: Baremetal IPI Config Enhancement |
| 3 | +authors: |
| 4 | + - "@sadasu" |
| 5 | +reviewers: |
| 6 | + - "@deads2k" |
| 7 | + - "@hardys" |
| 8 | + - "@enxebre" |
| 9 | + - "@squeed" |
| 10 | + - "@abhinavdahiya" |
| 11 | + - "@dhellmann" |
| 12 | + |
| 13 | +approvers: |
| 14 | + - "@abhinavdahiya" |
| 15 | + - "@enxebre" |
| 16 | + - "@deads2k" |
| 17 | + |
| 18 | +creation-date: 2019-11-14 |
| 19 | +last-updated: 2019-11-26 |
| 20 | +status: not implemented |
| 21 | +see-also: |
| 22 | + - |
| 23 | +replaces: |
| 24 | + - https://github.com/openshift/enhancements/pull/90 |
| 25 | +superseded-by: |
| 26 | + - |
| 27 | +--- |
| 28 | + |
| 29 | +# Config required for Baremetal IPI deployments |
| 30 | + |
| 31 | +## Release Signoff Checklist |
| 32 | + |
| 33 | +- [*] Enhancement is `implementable` |
| 34 | +- [*] Design details are appropriately documented from clear requirements |
| 35 | +- [ ] Test plan is defined |
| 36 | +- [ ] Graduation criteria for dev preview, tech preview, GA |
| 37 | +- [ ] User-facing documentation is created in [openshift/docs] |
| 38 | + |
| 39 | +## Open Questions |
| 40 | + |
| 41 | +1. Which is a preferred name for the new CR "Metal3ProvisioningController" or |
| 42 | +"Metal3Controller"? |
| 43 | + |
| 44 | +[Closed] Based on review comments it appears that "Metal3ProvisioningController" |
| 45 | +is the preferred name since it makes the contents of the config very clear. |
| 46 | + |
| 47 | +2. There is a possibility that the MCO would also have to refer to this new CR |
| 48 | +created in operator scope. Would it be OK for the MCO to access the CR with this |
| 49 | +(limited) scope? |
| 50 | + |
| 51 | +[Closed] MCO will be able to read config from this new CR if needed. |
| 52 | + |
| 53 | +## Summary |
| 54 | + |
| 55 | +The configuration representing different provisioning network parameters used |
| 56 | +for provisioning baremetal servers need to be made available to the Machine |
| 57 | +API Operator (MAO) in a Config Resource. We had submitted an earlier proposal |
| 58 | +[1] to augment the BareMetalInfrastructureStatus CR with these config items. |
| 59 | +Based on feedback we recieved we are revising our proposal to create a new CR |
| 60 | +for these configuration items. |
| 61 | + |
| 62 | +This new CR needs to be accessed by just the Installer and the Machine API |
| 63 | +Operator and hence does not need to be created with global scope. The installer |
| 64 | +would be responsible for instantiating the CR with user input values. The MAO is |
| 65 | +responsible for deploying the metal3 cluster. And the configuration values from |
| 66 | +this CR would be read by the MAO and used to generate some other configuration |
| 67 | +parameters. All of these values (user provided and MAO derived) need to be |
| 68 | +passed in as env vars to the containers that are part of the metal3 cluster |
| 69 | +before starting the containers. In most cases, the containers will not start |
| 70 | +without these init configs available as env vars. |
| 71 | + |
| 72 | +For a background on the work being done for BareMetal IPI installs please refer |
| 73 | +to [2] for the necessary context for the enhancements proposed here. |
| 74 | + |
| 75 | +## Motivation |
| 76 | + |
| 77 | +The Baremetal IPI deployments are different from the other platform types currently |
| 78 | +being supported by OpenShift in that there is no underlying cloud platform |
| 79 | +exposing an API as in public clouds e.g AWS, until the Baremetal Operator (BMO) |
| 80 | +along with Ironic services are run exposing an inventory of available hosts as |
| 81 | +custom resources. |
| 82 | + |
| 83 | +The "metal3" pod deployed by the Machine API operator (MAO) contains the BareMetal |
| 84 | +Operator (BMO) and a provisioning service (Ironic) that are together responsible for |
| 85 | +PXE booting baremetal servers and enrolling them as BareMetal Hosts. The MAO is |
| 86 | +responsible for deploying the BMO and the Ironic containers. |
| 87 | + |
| 88 | +This enhancement request proposes to add a new CR for configs required only for |
| 89 | +the provisioning service to PXE boot a baremetal server and make it available as a |
| 90 | +BareMetal host. The configurations in this baremetal provisioning CR, are seperate |
| 91 | +from the configuration in the BareMetalHost CR. This new CR will allow for setting |
| 92 | +some sane defaults for these configurations with the ability to overrride them if |
| 93 | +required. After the baremetal server is provisioned, the configurations in the |
| 94 | +BareMetalHost CR contain information to connect to the controller and also to |
| 95 | +fulfill a Machine. |
| 96 | + |
| 97 | +### Goals |
| 98 | + |
| 99 | +The goal of this enhancement request is to provide details about the configuration |
| 100 | +being added to a new CR used only in "metal3" deployments. This new CR would be |
| 101 | +within the operator scope and will only contain configuration required by the |
| 102 | +provisioning service to boot baremetal servers. |
| 103 | + |
| 104 | +### Non-Goals |
| 105 | + |
| 106 | +The provisioning network or the provisioning IP are not expected to change after |
| 107 | +deployment. But, it is possible for the DHCP range to be expanded after nodes |
| 108 | +have already been deployed. This proposal is not considering the update path for |
| 109 | +these configurations items. |
| 110 | + |
| 111 | +### Proposal |
| 112 | + |
| 113 | +Before BareMetal Hosts can be matched to Machines, they need to be connected to their |
| 114 | +provisioning network for them to be PXE booted and given an IP address. To make |
| 115 | +this happen, the provisioning service needs to know which NIC, provisioning network, |
| 116 | +IP address and image URLs to use to download and boot images on these servers. |
| 117 | + |
| 118 | +A new CR called "Metal3ProvisioningController" would be created within the operator scope. |
| 119 | + |
| 120 | +This new CR would consist of the following: |
| 121 | + |
| 122 | +1. ProvisioningInterface : This is the interface name on the underlying baremetal |
| 123 | +server which would be physically connected to the provisioning network. This |
| 124 | +configuration is needed only for the underlying provisioning service (Ironic) |
| 125 | +and could have values like "eth1" or "ens3". |
| 126 | + |
| 127 | +2. ProvisioningIP : This is the IP address used to to bring up a NIC on the |
| 128 | +baremetal server for provisioning. This is also a value that is useful just to the |
| 129 | +provisioning service. This value should not be in the DHCP range and should not |
| 130 | +be used in the provisioning network for any other purpose (should be a free IP |
| 131 | +address.) It is expected to be provided in the format : 10.1.0.3. |
| 132 | + |
| 133 | +3. ProvisioningNetworkCIDR : This is the provisioning network with its CIDR. The |
| 134 | +ProvisioningIP and the ProvisioningDHCPRange are expected to be within this network. |
| 135 | +The value for this config is expected to be provided in the format : 10.1.0.0/24. |
| 136 | + |
| 137 | +4. ProvisioningDHCP : This configuration needs to convey two values: if the DHCP |
| 138 | +service needs to be managed within the cluster and if so what is the range of IP |
| 139 | +addresses that can used. Towards that end, this configuration would be a struct |
| 140 | +with 2 members: |
| 141 | + 1. ManagementType - The ManagementType is a string that can have any of |
| 142 | + the following states "Managed", "Unmanaged", "Force", "Removed". In this |
| 143 | + case, we are interested only in 2 states, "Managed" and "Removed". If the |
| 144 | + ManagementType is "Removed" it means that the metal3 cluster would not be |
| 145 | + responsible for managing DHCP addresses and an external DHCP server is |
| 146 | + expected to be available and reachable by the cluster. If the ManagementType |
| 147 | + is "Managed", then the DHCP range indicates the pool of IP addresses that |
| 148 | + can used to assign to the baremetal hosts. This value cannot be changed |
| 149 | + after installation. |
| 150 | + |
| 151 | + 2. DHCPRange - The DHCPRange when set, is a string which consists of a pair of |
| 152 | + comma seperated IP addresses representing the start and end of the IP address |
| 153 | + range. If unset, then the default IP address range (.10 to .100) would be |
| 154 | + used. The value of the DHCP range can be changed even after insallation. |
| 155 | + |
| 156 | +### User Stories [optional] |
| 157 | + |
| 158 | +1. As a Deployment Operator, I want Barametal IPI deployments to be customizable to |
| 159 | +hardware and network requirements. |
| 160 | + |
| 161 | +2. As an Openshift Administrator, I want Baremetal IPI deployments to take place without |
| 162 | +manual workarounds like creating a ConfigMap for the config (which is the current approach |
| 163 | +being used in 4.2 and 4.3.) |
| 164 | + |
| 165 | +## Design Details |
| 166 | + |
| 167 | +This new baremetal CR would be created in the "openshift-machine-api" namespace and as |
| 168 | +mentioned earlier would be in operator scope. Only one instance of this CR would be |
| 169 | +created by the installer and hence it is a singleton CR. |
| 170 | + |
| 171 | +Important details of the CR: |
| 172 | + |
| 173 | +Resource name - provisioningconfig.baremetal.operator.openshift.io |
| 174 | +Instance name - main/default |
| 175 | +Namespace - openshift-machine-api |
| 176 | +Version - apiextensions.k8s.io/v1 |
| 177 | + |
| 178 | +The new config items would be set by the installer and will be used by the MAO to |
| 179 | +generate more config items that are derivable from these basic parameters. Put |
| 180 | +together, these config parameters are passed in as environment variables to the various |
| 181 | +containers that make up a metal3 baremetal IPI deployment. |
| 182 | + |
| 183 | +This baremetal provisioning CR contains configuration data for the provisioning services, |
| 184 | +which are not values that should be configured by the end user via BareMetalHost objects. |
| 185 | + |
| 186 | +The configs described in this enhancement doc would be part of the Spec field of the CR. |
| 187 | +Only the ProvisioningDHCP.DHCPRange field can change after installtion, so this will be |
| 188 | +marked as editable. All other config items will be marked as not editable. |
| 189 | + |
| 190 | +### Test Plan |
| 191 | + |
| 192 | +The test plan should involve making sure the openshift/installer generates |
| 193 | +all configuration items within the BaremetalPlatformStatus when the platform |
| 194 | +type is Baremetal. |
| 195 | + |
| 196 | +MAO reads this configuration and uses these to derive additional configuration |
| 197 | +required to bring up a metal3 cluster. E2e testing should make sure that MAO |
| 198 | +is able to bring up a metal3 cluster using config from this new Metal3Controller |
| 199 | +CR which has operator scope. |
| 200 | + |
| 201 | +Once metal3 is up, the next level of testing should involve bringing up worker nodes. |
| 202 | +Also, testing needs to make sure we are still able to bring up worker nodes when there |
| 203 | +is an external DHCP server and we donot bring up DHCP services within the cluster. |
| 204 | + |
| 205 | +Test plan should also include tests to dynamically increase the DHCP range after a |
| 206 | +metal3 cluster has been up and a few workers have come up successfully. |
| 207 | + |
| 208 | +### Upgrade / Downgrade Strategy |
| 209 | + |
| 210 | +Baremetal Platform type will be available for customers to use for the first |
| 211 | +time in Openshift 4.3. And, when it is installed, it will always start as a |
| 212 | +fresh baremetal installation at least in 4.3. There is no use case where a 4.2 |
| 213 | +installation would be upgraded to a 4.3 installation with Baremetal Platform |
| 214 | +support enabled. |
| 215 | + |
| 216 | +To ensure a hitless upgrade from 4.3 to 4.4, the implementation in 4.4 would try to |
| 217 | +read the configuration from the new CR and the ConfigMap. If MAO is unable to find the |
| 218 | +provsioning configuration in the new CR, it will fallback to reading it from the ConfigMap. |
| 219 | +And, this decision will be made per config item and not based just on the presence of the |
| 220 | +new CR. |
| 221 | + |
| 222 | +[1] - https://github.com/openshift/enhancements/pull/90 |
| 223 | +[2] - https://github.com/openshift/enhancements/pull/102 |
0 commit comments