Skip to content

Conversation

@imain
Copy link
Contributor

@imain imain commented Aug 2, 2019

Would love some reviews on this. This adds a configmap to support the metal3 deployment. Some options need to change/be eliminated but I wanted to get it up for review to see if iI did it right.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 2, 2019
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 2, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: imain
To complete the pull request process, please assign abhinavdahiya
You can assign the PR to them by writing /assign @abhinavdahiya in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@russellb
Copy link
Contributor

russellb commented Aug 2, 2019

/label platform/baremetal

@openshift-ci-robot openshift-ci-robot added the platform/baremetal IPI bare metal hosts platform label Aug 2, 2019
@juliakreger
Copy link

Looks straight forward to me. It feels a little nit-pickish, but the use of metal3 vs baremetal seems a little confusing in this, but ultimately I guess that is also fine.

@wking
Copy link
Member

wking commented Aug 2, 2019

Is metal3-configmap a well-known name for the machine-API operator? This feels more like something that belongs in the infrastructure config or some machine-API-specific config.

@metal3ci
Copy link

metal3ci commented Aug 2, 2019

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/998/

@openshift-ci-robot
Copy link
Contributor

@imain: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/gofmt 92571ac link /test gofmt
ci/prow/e2e-aws-scaleup-rhel7 92571ac link /test e2e-aws-scaleup-rhel7
ci/prow/e2e-openstack 92571ac link /test e2e-openstack

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@abhinavdahiya
Copy link
Contributor

I agree with @wking here.. these should be on the machine object / baremetal host object ... This information is not versioned object.

@russellb
Copy link
Contributor

russellb commented Aug 3, 2019

Good feedback. I think we could simplify this by putting a few pieces of info in the platform status in the infrastructure config object, and then have the machine-api-operator produce the config items needed for the baremetal operator.

For 4.2, I suggest we just make this config map something we drop in as a custom manifest from our install script. That’s good enough while this is still experimental. We can work on refactoring this config in 4.3.

@russellb
Copy link
Contributor

russellb commented Aug 3, 2019

I guess the other option would be to merge something like this, but file a bug to refactor it away in 4.3. I’m ok either way since there’s an easy enough workaround.

@abhinavdahiya
Copy link
Contributor

@russellb can you provide more context on who uses this, and why the machine object, baremetal host object and infrastructure object already do not cover this?

@dhellmann
Copy link
Contributor

@russellb can you provide more context on who uses this, and why the machine object, baremetal host object and infrastructure object already do not cover this?

We need to tell the provisioning service which network interface to listen on for DHCP requests from hosts being provisioned. That's the provisioning_interface configuration option defined here.

We also need the IP range information for the DHCP server to use to give to hosts while they are being provisioned. That's the dhcp_range option.

We also need a URL pointing to the RHCOS image that should be provisioned. We download that into the cluster before actually using it, but we have to know where to go download it from. That's the rhcos_image_url option.

It wouldn't be right to add these to Machine or Host objects, because they either don't apply to the Machine/Host or should not be set differently for different instances.

I don't think any of those things are in the infrastructure object right now. Maybe the image URL is?

The advice we've had previously is if we need configuration settings, we should define our own thing to hold them because the other configuration APIs aren't considered stable. Is that wrong, or does that just not apply to the infrastructure object? Should we add these fields there instead of using a config map?

@abhinavdahiya
Copy link
Contributor

@russellb can you provide more context on who uses this, and why the machine object, baremetal host object and infrastructure object already do not cover this?

We need to tell the provisioning service which network interface to listen on for DHCP requests from hosts being provisioned. That's the provisioning_interface configuration option defined here.

is this provisioning service running on the cluster?

We also need the IP range information for the DHCP server to use to give to hosts while they are being provisioned. That's the dhcp_range option.

Is this IP range different from MachineCIDR

We also need a URL pointing to the RHCOS image that should be provisioned. We download that into the cluster before actually using it, but we have to know where to go download it from. That's the rhcos_image_url option.

How is this not machine object's option, what if somebody wants to use the machine object to create RHEL machine.. ?

It wouldn't be right to add these to Machine or Host objects, because they either don't apply to the Machine/Host or should not be set differently for different instances.

I don't think any of those things are in the infrastructure object right now. Maybe the image URL is?

The advice we've had previously is if we need configuration settings, we should define our own thing to hold them because the other configuration APIs aren't considered stable. Is that wrong, or does that just not apply to the infrastructure object? Should we add these fields there instead of using a config map?

You are free to define the unstable api using group/v1alpha1 objects.. and be responsible for providing migration steps for api changes as you graduate them.. using a config map doesn't allow that therefore not the best way to move forward..

@dhellmann
Copy link
Contributor

@russellb can you provide more context on who uses this, and why the machine object, baremetal host object and infrastructure object already do not cover this?

We need to tell the provisioning service which network interface to listen on for DHCP requests from hosts being provisioned. That's the provisioning_interface configuration option defined here.

is this provisioning service running on the cluster?

Yes, it's part of the metal3 Deployment created when the MAO sees the "baremetal" platform.

We also need the IP range information for the DHCP server to use to give to hosts while they are being provisioned. That's the dhcp_range option.

Is this IP range different from MachineCIDR

Yes, baremetal machines have multiple NICs. The IP range for this DHCP server is on the provisioning network, not the main network used by the rest of the cluster described by MachineCIDR.

We also need a URL pointing to the RHCOS image that should be provisioned. We download that into the cluster before actually using it, but we have to know where to go download it from. That's the rhcos_image_url option.

How is this not machine object's option, what if somebody wants to use the machine object to create RHEL machine.. ?

We could support that in the future, but do not today.

We do actually have a field on the Machine to set this, but we use a URL that points to the local cache inside the cluster so that each provisioning operation doesn't need to download the image.

It wouldn't be right to add these to Machine or Host objects, because they either don't apply to the Machine/Host or should not be set differently for different instances.
I don't think any of those things are in the infrastructure object right now. Maybe the image URL is?
The advice we've had previously is if we need configuration settings, we should define our own thing to hold them because the other configuration APIs aren't considered stable. Is that wrong, or does that just not apply to the infrastructure object? Should we add these fields there instead of using a config map?

You are free to define the unstable api using group/v1alpha1 objects.. and be responsible for providing migration steps for api changes as you graduate them.. using a config map doesn't allow that therefore not the best way to move forward..

I don't understand this part of your response.

@imain
Copy link
Contributor Author

imain commented Aug 6, 2019

OK, where are we with this then? I feel like Doug and Russel have given some great responses. These are definitely unique options to facilitate deploying baremetal machines from the baremetal operator. Does that make this approach viable or should we look at another way?

PS I'm in favour of changing the name of the configmap.. baremetal-configmap?

@abhinavdahiya
Copy link
Contributor

Most of these options should move to either the host CR or machine CR...
and data in the config map is not versioned and therefore it should not be used as-is.

@dhellmann
Copy link
Contributor

Most of these options should move to either the host CR or machine CR...
and data in the config map is not versioned and therefore it should not be used as-is.

I think we've failed to communicate something clearly about what these options are. They don't belong on the host or machine, because they control how the service works at a higher level than those individual CRs.

The only option we might be able to put on the Machine is the image URL, but as I explained earlier that URL is actually being used by the metal3 service when it starts up to provide a cached copy of the image for all later provisioning operations. That URL is also not something we want the user to have to configure. It's tied to the RHCOS image that was deployed to build the cluster in the first place and we don't want them using arbitrary system images.

From the rest of the discussion, it seems like moving these things to the infrastructure config might be the right direction.

@imain
Copy link
Contributor Author

imain commented Aug 6, 2019

OK. I don't think we'll have time to implement this with a CRD before the freeze as it would mean adding another operator. Based on the discussions here and on slack I think it makes the most sense to continue to use dev-scripts to create the configmap for now. This way we can document the case and implement this properly in the next cycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. platform/baremetal IPI bare metal hosts platform size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants