Support RAID and BIOS configuration for Baremetal Server#279
Conversation
This commit aims to add **RAIDConfig** and **BIOSConfig* as new properties into **BareMetalHostSpec**. These options allows Metal3 user to configure RAID and BIOS via **raid** and **bios** field in YAML file. Co-Authored-By: Dao Cong Tien <tiendc@vn.fujitsu.com> Co-Authored-By: Nguyen Phuong An <annp@vn.fujitsu.com> Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
This commit intends to support BIOS configuraion from bare metal drivers. In particular, **GetBIOSConfigDetails** will help the controller know exactly which BIOS setting is supported by the driver. Co-Authored-By: Dao Cong Tien <tiendc@vn.fujitsu.com> Co-Authored-By: Nguyen Phuong An <annp@vn.fujitsu.com> Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
dhellmann
left a comment
There was a problem hiding this comment.
It would be better if we could have the BIOS and RAID work in separate pull requests. I see that they are related, but designing the CRD changes for both of them at the same time is going to make the changes harder to review.
This implementation still leaks too much knowledge about ironic through the API. Ironic is an implementation detail, and it might be dropped in favor of another tool or we might want to support several different tools. Therefore, the Metal3 API must not be based on the Ironic API.
I have tried to leave some more specific comments about things to change, but the big one to me is to not ask the user to specify any cleaning steps at all. "Steps" are imperative ("configure RAID", "configure BIOS"). We want to provide a declarative API ("there should be 1 RAID device with 2 volumes", "this BIOS flag should be ON"), and then figure out the imperative steps needed to produce the desired results.
|
|
||
| // StateCleaning means Ironic are running the referenced CleanSteps via | ||
| // manual cleaning for RAID and BIOS configuration | ||
| StateCleaning ProvisioningState = "cleaning" |
There was a problem hiding this comment.
"Cleaning" is an ironic concept, and we do not want to expose it through the metal3 API. We should be able to use the existing provisioning state to indicate that work is happening to provision the server.
There was a problem hiding this comment.
And at least we shouldn't call it "cleaning". It's confusing enough in ironic world :)
There was a problem hiding this comment.
Thanks for you comment. I will abandon cleaning state from this prototype, then execute the cleansteps in the other provision state. But I am confusing between registering and inspecting :(
| SharePhysicalDisks *bool `json:"sharePhysicalDisks,omitempty"` | ||
|
|
||
| // If this is not specified, disk type will not be a criterion to find backing physical disks | ||
| DiskType string `json:"diskType,omitempty"` |
There was a problem hiding this comment.
What are valid values for disk type?
There was a problem hiding this comment.
It could be hdd or ssd [1]
[1] https://docs.openstack.org/ironic/latest/admin/raid.html#raid-configuration-json-format
There was a problem hiding this comment.
OK, then DiskType should have an enum defined that restricts its values.
| DiskType string `json:"diskType,omitempty"` | ||
|
|
||
| // If this is not specified, interface type will not be a criterion to find backing physical disks. | ||
| InterfaceType string `json:"interfaceType,omitempty"` |
There was a problem hiding this comment.
What are valid values for interface type?
There was a problem hiding this comment.
The valid values for interface type are: sata or scsi or sas [1]
[1] https://docs.openstack.org/ironic/latest/admin/raid.html#raid-configuration-json-format
There was a problem hiding this comment.
InterfaceType should have an enum defined that restricts its values.
| } | ||
|
|
||
| // BIOSConfig contains the configuration that are required to config BIOS in Bare Metal server | ||
| type BIOSConfig map[string]interface{} |
There was a problem hiding this comment.
As I said in an earlier review, the BIOS structure needs to have defined fields with appropriate types. If something is an on/off flag, then a boolean pointer. If something is a string, then a string. A map using interface is not going to be an acceptable API because it is impossible to validate it in the OpenAPI code and future versions of kubernetes are going to require more structural validation for CRDs.
There was a problem hiding this comment.
Because BIOS setting depend on vendor's driver, so I declare BIOSConfig as interface{} to fit with all of BIOS struct from vendors, then each vendor have to defined VendorBIOSConfigSpec [1] for their own BIOS settings.
For BIOS validation, I also defined ValueType for each BIOS setting in VendorBIOSConfigSpec, so we can validate the BIOS setting from users via ValidateAndBuildBIOSCleanStep function.
[1] https://github.com/metal3-io/baremetal-operator/pull/279/files#diff-334e60af5353ff0c1ef95211da871902R53
[2] https://github.com/metal3-io/baremetal-operator/pull/279/files#diff-bc78df50e156a9dedd77791eea452d13R1085
There was a problem hiding this comment.
I realize it is going to make it more complicated to design an API that supports changing different settings on different types of hardware, but that's the task. A pass-through API is completely untenable, because it's not an API.
We must decide on some way to describe every API parameter. That may mean abstracting some of the BIOS settings so the same inputs can apply on different types of hardware. It may mean providing a way to specify hardware-type-specific BIOS settings. We will not have an API that accepts an unspecified input and relies on validation happening after the fact.
There was a problem hiding this comment.
I think part of the conundrum is how this has been handled from the actual vendor interfaces. There is no standardization there even on the exact same API redfish interface as vendors have freely named individual settings as they wish. :\
This can vary between models as well, so I'm really not sure if there is any better way than lettering the vendor data model being represented...
There was a problem hiding this comment.
So in discussing this with @dhellmann, I think the only path forward for cross-vendor consistency with Metal3 is to define a set list of names. Such as a single VT/VTX CPU flag, such as "Virtualization", that may actually map to specific settings based upon underlying hardware types/Hardware Models. That allows for other backends at a later point in time if necessary. The conundrum is that settings can vary wildly based upon back-end hardware and the best thing to do is for the underlying specific settings to be implementation details for the supported platforms as opposed to try and pass-through the data structure that the underlying driver in ironic or that the BMC may be able to parse.
| HardwareDetails *HardwareDetails `json:"hardware,omitempty"` | ||
|
|
||
| // The executed CleanSteps on the host. | ||
| CleanSteps []nodes.CleanStep `json:"cleanSteps,omitempty"` |
There was a problem hiding this comment.
Is it possible for us to determine the steps to follow without requiring the user to specify them? Even if we always just follow the same steps, I would rather not expose them through the API.
There was a problem hiding this comment.
This is a read-only field, I'm not sure why we need it here.
There was a problem hiding this comment.
@dhellmann: This CleanSteps is include both RAID and BIOS clean steps, so if there is no valid cleanstep for both, we no need to run cleaning step
@dtantsur: I will move this cleansteps into ironicProvisioner, so this field will be removed.
| if host.Status.CleanSteps != nil { | ||
| return false | ||
| } else if len(cleanSteps) > 0 { | ||
| // Never run manual cleaning if RAID or BIOS is not configured |
There was a problem hiding this comment.
This comment is confusing. The way the CRD is defined above, cleaning steps can be configured independently of either the BIOS or RAID settings. Should the condition on line 594 be checking those parts of the struct, too?
There was a problem hiding this comment.
This BIOS struct depend on vendor driver, and it fit with BIOS struct in baremetalhost_types.go via GetBIOSConfigDetails [1]
| return iRMCBiosConfigDetails | ||
| } | ||
|
|
||
| var iRMCBiosConfigDetails = map[string]VendorBIOSConfigSpec { |
There was a problem hiding this comment.
There is a lot of good information in the comments and type specifiers in this section. We should consider including all of these things in the BIOS struct that needs to be defined in baremetalhost_types.go.
We will want to start with a small list of settings and expand it, because removing things from the API is harder than adding things. Are all of these options absolutely required? Are they all used on a regular basis?
There was a problem hiding this comment.
Should cleaning happen before inspection?
In the case of cleaning includes RAID configuration, some Storage details in HardwareDetails might change, so we need inspecting phase to refresh these information.
I would have expected to do it as part of provisioning and deprovisioning.
IMO, we should not put cleaning into provisioning phase. I think both of RAID and BIOS should be configured only one time, so if we run cleaning before inspecting phase, the latest hardware information would be updated in HardwareDetails.
But I am confusing whether cleaning should be put in registering or inspecting...
There was a problem hiding this comment.
The metal3 API is unlikely to expose every feature of ironic to the end user. The point is to simplify it, to make it easier to use and more predictable. So, let's start by assuming that every time metal3 wants to provision an image to a host, it is going to "reset" that host so the RAID and BIOS configuration matches only what is given in the CR.
Does that help answer where those cleaning steps should be performed?
| case host.WasExternallyProvisioned(): | ||
| actionName = metal3v1alpha1.StateExternallyProvisioned | ||
| case host.NeedsManualCleaning(info.cleanSteps): | ||
| actionName = metal3v1alpha1.StateCleaning |
There was a problem hiding this comment.
Should cleaning happen before inspection? I would have expected to do it as part of provisioning and deprovisioning.
There was a problem hiding this comment.
There are manual cleaning and automated cleaning (yes, we're bad at naming things). Manual cleaning is a set of "ready-state" steps, and it should run before inspection, so that inspection sees RAID. Automated cleaning is what wipes the disks, and it runs before the first and after every deployment. I hope that helps.
There was a problem hiding this comment.
Should cleaning happen before inspection?
In the case of cleaning includes RAID configuration, some Storage details in HardwareDetails might change, so we need inspecting phase to refresh these information.
I would have expected to do it as part of provisioning and deprovisioning.
IMO, we should not put cleaning into provisioning phase. I think both of RAID and BIOS should be configured only one time, so if we run cleaning before inspecting phase, the latest hardware information would be updated in HardwareDetails.
But I am confusing whether cleaning should be put in registering or inspecting...
@dtantsur: Thank you for the explanation.
There was a problem hiding this comment.
The HardwareDetails section of the host status is meant to provide information about physical hardware resources. It does not need to reflect the RAID configuration. If ironic is going to fill in RAID info, we will need to strip that part of the data out, but it would be better if we prevent it from showing up there at all.
| return utils.StringInList(host.Finalizers, metal3v1alpha1.BareMetalHostFinalizer) | ||
| } | ||
|
|
||
| func(r *ReconcileBareMetalHost) buildAndValidateCleanSteps( |
There was a problem hiding this comment.
We want to keep all of the ironic-specific logic in the ironic package. The next several functions build instructions for ironic to do cleaning that the controller should not be aware of, and should move into the ironic package as part of the ironicProvisioner.
There was a problem hiding this comment.
We want to keep all of the ironic-specific logic in the ironic package.
Ah, sorry, I misunderstood your points. I will move all of the ironic related functions into ironicProvisioner as your suggestion.
Currently, Metal3 does not support deploy baremetal server with RAID and BIOS configuration. This commit aims to support RAID and BIOS configuration in Metal3 with the belows: - Clean step builder and validator: It will check whether RAID and BIOS is supported by vendor or not - Introduce **cleaning** state as a new provisioning state to setup RAID/BIOS if needed. In this state, Metal3 can make Ironic enter Manual cleaning with clean steps to configure RAID/BIOS. Co-Authored-By: Dao Cong Tien <tiendc@vn.fujitsu.com> Co-Authored-By: Nguyen Phuong An <annp@vn.fujitsu.com> Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
| Controller string `json:"controller,omitempty"` | ||
|
|
||
| // A list of physical disks to use as read by the RAID interface. | ||
| PhysicalDisks []string `json:"physicalDisks,omitempty"` |
There was a problem hiding this comment.
Do we really want to expose all advanced features of ironic RAID setup?
There was a problem hiding this comment.
I would prefer if we start small. It's not clear how that relates to the PhysicalDisks parameter. Does ironic make its own choice if no list is presented?
There was a problem hiding this comment.
Software Raid also doesn't support physical disk specification. Some of the drivers try to automatically do the right thing, other drivers (and their vendors) disagree with that model though... :(
| Args: nil, | ||
| }, | ||
| *raidCleanStep, | ||
| ) |
There was a problem hiding this comment.
For software RAID we'll also need to wipe partitions. But this is for Train.
There was a problem hiding this comment.
Thanks @dtantsur for Software RAID case. I will take care it too :)
| biosConfigDetails := accessDetails.GetBIOSConfigDetails() | ||
| if biosConfigDetails == nil { | ||
| return nil, errors.New(fmt.Sprintf( | ||
| "'%s' driver does not support BIOS configuration", accessDetails.Driver())) |
There was a problem hiding this comment.
This is incorrect. The driver might support BIOS, we (BMO) doesn't support it.
There was a problem hiding this comment.
Thanks. I will update it :)
dtantsur
left a comment
There was a problem hiding this comment.
Left some comments. More importantly, could you split BIOS from RAID patches? I'd start with RAID since its format is less controversial.
|
Can one of the admins verify this patch? |
|
Hi @dhellmann , @dtantsur , @juliakreger , @nordixinfra , |
Currently, Metal3 does not support deploy baremetal server with RAID (#206 )and BIOS (#207) configuration. This commit aims to support RAID and BIOS configuration in Metal with the belows: