Skip to content

Support RAID and BIOS configuration for Baremetal Server#279

Closed
longkb wants to merge 3 commits into
metal3-io:masterfrom
longkb:support_raid_and_bios_configuration
Closed

Support RAID and BIOS configuration for Baremetal Server#279
longkb wants to merge 3 commits into
metal3-io:masterfrom
longkb:support_raid_and_bios_configuration

Conversation

@longkb
Copy link
Copy Markdown
Contributor

@longkb longkb commented Aug 15, 2019

Currently, Metal3 does not support deploy baremetal server with RAID (#206 )and BIOS (#207) configuration. This commit aims to support RAID and BIOS configuration in Metal with the belows:

  • Extend BaremetalHost CRD to suport RAID and BIOS configuration via raid and bios property
  • Adding BIOS configuration detail to each vendor's driver
  • Clean step builder and validator: It will check whether RAID and BIOS is supported by vendor's driver or not
  • Introduce cleaning state as a new provisioning state to setup RAID/BIOS if needed. In this state, Metal3 can make Ironic enter Manual cleaning with clean steps to configure RAID/BIOS.

longkb and others added 2 commits August 15, 2019 20:40
This commit aims to add **RAIDConfig** and **BIOSConfig* as new properties
into **BareMetalHostSpec**. These options allows Metal3 user to configure
RAID and BIOS via **raid** and **bios** field in YAML file.

Co-Authored-By: Dao Cong Tien <tiendc@vn.fujitsu.com>
Co-Authored-By: Nguyen Phuong An <annp@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
This commit intends to support BIOS configuraion from bare metal drivers.
In particular, **GetBIOSConfigDetails** will help the controller know exactly
which BIOS setting is supported by the driver.

Co-Authored-By: Dao Cong Tien <tiendc@vn.fujitsu.com>
Co-Authored-By: Nguyen Phuong An <annp@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
Copy link
Copy Markdown
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if we could have the BIOS and RAID work in separate pull requests. I see that they are related, but designing the CRD changes for both of them at the same time is going to make the changes harder to review.

This implementation still leaks too much knowledge about ironic through the API. Ironic is an implementation detail, and it might be dropped in favor of another tool or we might want to support several different tools. Therefore, the Metal3 API must not be based on the Ironic API.

I have tried to leave some more specific comments about things to change, but the big one to me is to not ask the user to specify any cleaning steps at all. "Steps" are imperative ("configure RAID", "configure BIOS"). We want to provide a declarative API ("there should be 1 RAID device with 2 volumes", "this BIOS flag should be ON"), and then figure out the imperative steps needed to produce the desired results.


// StateCleaning means Ironic are running the referenced CleanSteps via
// manual cleaning for RAID and BIOS configuration
StateCleaning ProvisioningState = "cleaning"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cleaning" is an ironic concept, and we do not want to expose it through the metal3 API. We should be able to use the existing provisioning state to indicate that work is happening to provision the server.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And at least we shouldn't call it "cleaning". It's confusing enough in ironic world :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you comment. I will abandon cleaning state from this prototype, then execute the cleansteps in the other provision state. But I am confusing between registering and inspecting :(

SharePhysicalDisks *bool `json:"sharePhysicalDisks,omitempty"`

// If this is not specified, disk type will not be a criterion to find backing physical disks
DiskType string `json:"diskType,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are valid values for disk type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then DiskType should have an enum defined that restricts its values.

DiskType string `json:"diskType,omitempty"`

// If this is not specified, interface type will not be a criterion to find backing physical disks.
InterfaceType string `json:"interfaceType,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are valid values for interface type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The valid values for interface type are: sata or scsi or sas [1]
[1] https://docs.openstack.org/ironic/latest/admin/raid.html#raid-configuration-json-format

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterfaceType should have an enum defined that restricts its values.

}

// BIOSConfig contains the configuration that are required to config BIOS in Bare Metal server
type BIOSConfig map[string]interface{}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in an earlier review, the BIOS structure needs to have defined fields with appropriate types. If something is an on/off flag, then a boolean pointer. If something is a string, then a string. A map using interface is not going to be an acceptable API because it is impossible to validate it in the OpenAPI code and future versions of kubernetes are going to require more structural validation for CRDs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because BIOS setting depend on vendor's driver, so I declare BIOSConfig as interface{} to fit with all of BIOS struct from vendors, then each vendor have to defined VendorBIOSConfigSpec [1] for their own BIOS settings.

For BIOS validation, I also defined ValueType for each BIOS setting in VendorBIOSConfigSpec, so we can validate the BIOS setting from users via ValidateAndBuildBIOSCleanStep function.

[1] https://github.com/metal3-io/baremetal-operator/pull/279/files#diff-334e60af5353ff0c1ef95211da871902R53
[2] https://github.com/metal3-io/baremetal-operator/pull/279/files#diff-bc78df50e156a9dedd77791eea452d13R1085

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize it is going to make it more complicated to design an API that supports changing different settings on different types of hardware, but that's the task. A pass-through API is completely untenable, because it's not an API.

We must decide on some way to describe every API parameter. That may mean abstracting some of the BIOS settings so the same inputs can apply on different types of hardware. It may mean providing a way to specify hardware-type-specific BIOS settings. We will not have an API that accepts an unspecified input and relies on validation happening after the fact.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of the conundrum is how this has been handled from the actual vendor interfaces. There is no standardization there even on the exact same API redfish interface as vendors have freely named individual settings as they wish. :\

This can vary between models as well, so I'm really not sure if there is any better way than lettering the vendor data model being represented...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in discussing this with @dhellmann, I think the only path forward for cross-vendor consistency with Metal3 is to define a set list of names. Such as a single VT/VTX CPU flag, such as "Virtualization", that may actually map to specific settings based upon underlying hardware types/Hardware Models. That allows for other backends at a later point in time if necessary. The conundrum is that settings can vary wildly based upon back-end hardware and the best thing to do is for the underlying specific settings to be implementation details for the supported platforms as opposed to try and pass-through the data structure that the underlying driver in ironic or that the BMC may be able to parse.

HardwareDetails *HardwareDetails `json:"hardware,omitempty"`

// The executed CleanSteps on the host.
CleanSteps []nodes.CleanStep `json:"cleanSteps,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for us to determine the steps to follow without requiring the user to specify them? Even if we always just follow the same steps, I would rather not expose them through the API.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a read-only field, I'm not sure why we need it here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhellmann: This CleanSteps is include both RAID and BIOS clean steps, so if there is no valid cleanstep for both, we no need to run cleaning step
@dtantsur: I will move this cleansteps into ironicProvisioner, so this field will be removed.

if host.Status.CleanSteps != nil {
return false
} else if len(cleanSteps) > 0 {
// Never run manual cleaning if RAID or BIOS is not configured
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing. The way the CRD is defined above, cleaning steps can be configured independently of either the BIOS or RAID settings. Should the condition on line 594 be checking those parts of the struct, too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This BIOS struct depend on vendor driver, and it fit with BIOS struct in baremetalhost_types.go via GetBIOSConfigDetails [1]

[1] https://github.com/metal3-io/baremetal-operator/pull/279/files#diff-bc78df50e156a9dedd77791eea452d13R1073

Comment thread pkg/bmc/irmc.go
return iRMCBiosConfigDetails
}

var iRMCBiosConfigDetails = map[string]VendorBIOSConfigSpec {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of good information in the comments and type specifiers in this section. We should consider including all of these things in the BIOS struct that needs to be defined in baremetalhost_types.go.

We will want to start with a small list of settings and expand it, because removing things from the API is harder than adding things. Are all of these options absolutely required? Are they all used on a regular basis?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cleaning happen before inspection?

In the case of cleaning includes RAID configuration, some Storage details in HardwareDetails might change, so we need inspecting phase to refresh these information.

I would have expected to do it as part of provisioning and deprovisioning.

IMO, we should not put cleaning into provisioning phase. I think both of RAID and BIOS should be configured only one time, so if we run cleaning before inspecting phase, the latest hardware information would be updated in HardwareDetails.
But I am confusing whether cleaning should be put in registering or inspecting...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metal3 API is unlikely to expose every feature of ironic to the end user. The point is to simplify it, to make it easier to use and more predictable. So, let's start by assuming that every time metal3 wants to provision an image to a host, it is going to "reset" that host so the RAID and BIOS configuration matches only what is given in the CR.

Does that help answer where those cleaning steps should be performed?

case host.WasExternallyProvisioned():
actionName = metal3v1alpha1.StateExternallyProvisioned
case host.NeedsManualCleaning(info.cleanSteps):
actionName = metal3v1alpha1.StateCleaning
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cleaning happen before inspection? I would have expected to do it as part of provisioning and deprovisioning.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are manual cleaning and automated cleaning (yes, we're bad at naming things). Manual cleaning is a set of "ready-state" steps, and it should run before inspection, so that inspection sees RAID. Automated cleaning is what wipes the disks, and it runs before the first and after every deployment. I hope that helps.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cleaning happen before inspection?

In the case of cleaning includes RAID configuration, some Storage details in HardwareDetails might change, so we need inspecting phase to refresh these information.

I would have expected to do it as part of provisioning and deprovisioning.

IMO, we should not put cleaning into provisioning phase. I think both of RAID and BIOS should be configured only one time, so if we run cleaning before inspecting phase, the latest hardware information would be updated in HardwareDetails.
But I am confusing whether cleaning should be put in registering or inspecting...

@dtantsur: Thank you for the explanation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HardwareDetails section of the host status is meant to provide information about physical hardware resources. It does not need to reflect the RAID configuration. If ironic is going to fill in RAID info, we will need to strip that part of the data out, but it would be better if we prevent it from showing up there at all.

return utils.StringInList(host.Finalizers, metal3v1alpha1.BareMetalHostFinalizer)
}

func(r *ReconcileBareMetalHost) buildAndValidateCleanSteps(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to keep all of the ironic-specific logic in the ironic package. The next several functions build instructions for ironic to do cleaning that the controller should not be aware of, and should move into the ironic package as part of the ironicProvisioner.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to keep all of the ironic-specific logic in the ironic package.

Ah, sorry, I misunderstood your points. I will move all of the ironic related functions into ironicProvisioner as your suggestion.

Currently, Metal3 does not support deploy baremetal server with RAID
and BIOS configuration. This commit aims to support RAID and BIOS configuration
in Metal3 with the belows:

- Clean step builder and validator: It will check whether RAID and BIOS is supported
by vendor or not

- Introduce **cleaning** state as a new provisioning state to setup RAID/BIOS if needed. In
this state, Metal3 can make Ironic enter Manual cleaning with clean steps to configure RAID/BIOS.

Co-Authored-By: Dao Cong Tien <tiendc@vn.fujitsu.com>
Co-Authored-By: Nguyen Phuong An <annp@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
Controller string `json:"controller,omitempty"`

// A list of physical disks to use as read by the RAID interface.
PhysicalDisks []string `json:"physicalDisks,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to expose all advanced features of ironic RAID setup?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we start small. It's not clear how that relates to the PhysicalDisks parameter. Does ironic make its own choice if no list is presented?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Software Raid also doesn't support physical disk specification. Some of the drivers try to automatically do the right thing, other drivers (and their vendors) disagree with that model though... :(

Args: nil,
},
*raidCleanStep,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For software RAID we'll also need to wipe partitions. But this is for Train.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dtantsur for Software RAID case. I will take care it too :)

biosConfigDetails := accessDetails.GetBIOSConfigDetails()
if biosConfigDetails == nil {
return nil, errors.New(fmt.Sprintf(
"'%s' driver does not support BIOS configuration", accessDetails.Driver()))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. The driver might support BIOS, we (BMO) doesn't support it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will update it :)

Copy link
Copy Markdown
Member

@dtantsur dtantsur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. More importantly, could you split BIOS from RAID patches? I'd start with RAID since its format is less controversial.

@nordixinfra
Copy link
Copy Markdown

Can one of the admins verify this patch?

@longkb
Copy link
Copy Markdown
Contributor Author

longkb commented Aug 28, 2019

Hi @dhellmann , @dtantsur , @juliakreger , @nordixinfra ,
I have just push a new request to support RAID configuration in BM #292. Could you guys help me to review it. I will close this PR now :)

@longkb longkb closed this Aug 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants