Skip to content

Conversation

@asalkeld
Copy link
Contributor

cluster-baremetal-operator is only enabled when deploying an OpenShift cluster with the baremetal
platform (via the IPI or AI workflow). Having the ability to manage baremetal nodes from
clusters without requiring the cluster to be on baremetal would be beneficial to customers.

openshift/cluster-baremetal-operator#189 (comment)
/cc @hardys

@openshift-ci openshift-ci bot requested a review from hardys August 20, 2021 02:00
@asalkeld
Copy link
Contributor Author

thanks for the great feedback @mhrivnak and @dhellmann , I have updated as requested

@asalkeld asalkeld requested review from dhellmann and mhrivnak August 22, 2021 23:23
@derekwaynecarr
Copy link
Member

what is the overhead of the baremetal operator on these additional platforms when baremetalhost is not used?

@sadasu
Copy link
Contributor

sadasu commented Aug 30, 2021

@asalkeld probably worth mentioning that the when the Provisioning CR is set to Disabled mode, workers nodes would be booted via virtual media. This removes the requirement for the Provisioning Network which can be expected to be available only in Baremetal platform types.
Also, when we are restricting metal3 functionality by restricting what is supported in the Provisioning CR, do we also need platform checks. We could enable CBO to run on all platforms and not just None, OpenStack and Baremetal.

We should also mention that in platforms other than Baremetal, centralized machine management (via MAO) would not be available so users would not be able to scale up Baremetal Machinesets. With the cluster-api based machine management expected to come later, we would be able to provide that (and leave a placeholder for the enhancement proposal for that.)

Nit: We can remove "MPINSTALL-7" for the title.

@asalkeld
Copy link
Contributor Author

@asalkeld probably worth mentioning that the when the Provisioning CR is set to Disabled mode, workers nodes would be booted via virtual media. This removes the requirement for the Provisioning Network which can be expected to be available only in Baremetal platform types.

Sure, I'll add that.

Also, when we are restricting metal3 functionality by restricting what is supported in the Provisioning CR, do we also need platform checks. We could enable CBO to run on all platforms and not just None, OpenStack and Baremetal.

I think the point is to only enable what we need and nothing else.

We should also mention that in platforms other than Baremetal, centralized machine management (via MAO) would not be available so users would not be able to scale up Baremetal Machinesets. With the cluster-api based machine management expected to come later, we would be able to provide that (and leave a placeholder for the enhancement proposal for that.)

See the Non-Goals section, I have it there.

Nit: We can remove "MPINSTALL-7" for the title.

🤷 it was suggested in the template..

@asalkeld asalkeld changed the title MPINSTALL-7: Enable Baremetal on other Platforms to support ZTP Enable Baremetal on other Platforms to support centralized host management Aug 30, 2021
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the overhead of the baremetal operator on these additional platforms when baremetalhost is not used?

We still need an answer to Derek's question. I think there's no additional overhead when the baremetal-operator is not enabled, because cluster-baremetal-operator always runs (see the question inline about cluster profiles, though). So, we would expect users to accept whatever overhead is introduced by enabling the service, but perhaps we need to give them some idea of what that overhead is.

@asalkeld
Copy link
Contributor Author

what is the overhead of the baremetal operator on these additional platforms when baremetalhost is not used?

We still need an answer to Derek's question. I think there's no additional overhead when the baremetal-operator is not enabled, because cluster-baremetal-operator always runs (see the question inline about cluster profiles, though). So, we would expect users to accept whatever overhead is introduced by enabling the service, but perhaps we need to give them some idea of what that overhead is.

@derekwaynecarr if cbo gets asked to create bmo, but there are no BMHost CRs then I assume it's just the memory overhead.
On my metal cluster (that does have BMH CRs), this is the memory used by all containers deployed by cbo
done with

kubectl exec pod/+podName -c  containerName -- cat /sys/fs/cgroup/memory/memory.usage_in_bytes

imagecache daemonset
pod/metal3-image-cache-ktn5z -c metal3-httpd: memory 62320640
pod/metal3-image-cache-bflhj -c metal3-httpd: memory 46858240
pod/metal3-image-cache-p5grw -c metal3-httpd: memory 54792192

bmo deployment
pod/metal3-7bd97c69c7-l2dm4 -c metal3-httpd: memory 46096384
pod/metal3-7bd97c69c7-l2dm4 -c metal3-static-ip-manager: memory 2293760
pod/metal3-7bd97c69c7-l2dm4 -c metal3-mariadb: memory 92688384
pod/metal3-7bd97c69c7-l2dm4 -c metal3-ironic-conductor: memory 123207680
pod/metal3-7bd97c69c7-l2dm4 -c metal3-ironic-api: memory 411484160
pod/metal3-7bd97c69c7-l2dm4 -c metal3-baremetal-operator: (doesn't have "cat")
pod/metal3-7bd97c69c7-l2dm4 -c metal3-dnsmasq: memory 1994752
pod/metal3-7bd97c69c7-l2dm4 -c metal3-ironic-inspector: memory 88412160
pod/metal3-7bd97c69c7-l2dm4 -c ironic-inspector-ramdisk-logs: memory 1658880
pod/metal3-7bd97c69c7-l2dm4 -c ironic-deploy-ramdisk-logs: memory 1921024

Copy link

@flaper87 flaper87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this proposal and I've needed this myself recently as part of an investigation I'm doing for a valid customer use case.

#### Functional Testing

An e2e test will be written in the Assisted Installer CI that will:
1. create one of the platforms above (SNO Platform=None might be the easiest) with Assisted Service.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I have deployed this exact same scenario (with a custom-built 4.8 image and the original PR that sparked this enhancement).

1. create one of the platforms above (SNO Platform=None might be the easiest) with Assisted Service.
2. confirm that CBO is enabled
3. create a Provisioning CR and confirm that BMO is running
4. provision a baremetal cluster

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to wait for the full cluster to be deployed. Waiting for the agent's discovery phase to be over should be enough proof. This is assuming assisted will be used for this test.

Regardless, enabling CBO and BMO in an SNO node is a good, easy-enough, test

@hardys
Copy link
Contributor

hardys commented Oct 5, 2021

/approve

Overall this looks good to me, couple of minor comments added if you do any further updates, thanks!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 5, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hardys

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2021
@asalkeld
Copy link
Contributor Author

asalkeld commented Oct 5, 2021

/approve

Overall this looks good to me, couple of minor comments added if you do any further updates, thanks!

thanks @hardys , I have updated with the tweaks..

Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very thorough.
/lgtm

The specific goals of this proposal are to:

Support the centralized host management use case by partially enabling Baremetal Host API
on the following on-premise platforms:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's on-premises. There's no singular form of 'premises'.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 13, 2021
@openshift-merge-robot openshift-merge-robot merged commit e2f4d4b into openshift:master Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.