Skip to content

Allow BMO to watch all Namespaces#106

Merged
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
sadasu:multi-namespace
Feb 25, 2021
Merged

Allow BMO to watch all Namespaces#106
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
sadasu:multi-namespace

Conversation

@sadasu
Copy link
Copy Markdown
Contributor

@sadasu sadasu commented Feb 12, 2021

When the metal3 deployment is created, the BMO is not provided with
the WATCH_NAMESPACE env var which allows it to watch all Namespaces.
Also, change the rbacs to reflect the fact that BMH objects now can
exist in multiple namespaces and not just the openshift-machine-api
namespace.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 12, 2021
@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Feb 13, 2021

/test e2e-metal-ipi-ovn-ipv6

@asalkeld
Copy link
Copy Markdown
Contributor

/retest
seems ok to me...

},
Env: []corev1.EnvVar{
{
Name: "WATCH_NAMESPACE",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we Ok with unconditionally watching all namespaces, or should we have a provisioning CR option to enable it just for the hub-cluster use-case?

@dhellmann I recall we discussed both options before - in the regular IPI case where BMH resources only exist in the openshift-machine-api namespace, will there be any overhead to this being unconditional?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there would be significant overhead in always watching all namespaces in a normal cluster. There's a kubernetes API to list all resources of a type, regardless of the namespace, so I would expect the client in the controller-runtime to use that instead of the API for listing resources within a namespace.

There could be some user confusion if a user creates a host outside of the openshift-machine-api namespace and it is inspected but cannot be used for provisioning because CAPBM won't see it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be some user confusion if a user creates a host outside of the openshift-machine-api namespace and it is inspected but cannot be used for provisioning because CAPBM won't see it.

I think that will be the next step, e.g look at the work being done in the machine-api team to enable machines in a multi-namespace environment, IIUC the current plan is a machine-controller (and thus CAPBM) per-namespace, so we'll need to sync up with folks working on that and figure out how to make it work on baremetal (current testing is only on AWS I believe)

@JoelSpeed is ^^ accurate, and can you point to any docs/scripts etc that we can refer to re running multi-namespace machine controllers (I'm thinking it's probably simplest to start without hive and just test Machine+BaremetalHost resources?)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a normal OpenShift cluster we restrict the controllers to only watch a single namespace (openshift-machine-api), so if you were to create a machine outside of this namespace, nothing would happen, and that is expected.

I don't think there would be significant overhead in always watching all namespaces in a normal cluster.

This entirely depends on what kinds of resources you are watching. For example, in the machine controller we need to load secrets for userdata and credentials. Normally you would use the controller-runtime cached client for this so that you get notified of changes to the secrets to trigger a reconcile. If you don't restrict this cache to a certain namespace, it will cache all secrets from all namespaces, which not only could impact memory usage, crosses a security boundary (eg multi-tenant clusters must not list secrets from multiple namespaces).

Copy link
Copy Markdown
Contributor Author

@sadasu sadasu Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we Ok with unconditionally watching all namespaces, or should we have a provisioning CR option to enable it just for the hub-cluster use-case?

@hardys In some earlier discussions around this, I recall that we said we cannot assume Provisioning CR is going to be present in the multi-namespace scenario. Has that changed?

@JoelSpeed the issues you mentioned above are valid. I think our team is viewing this approach only for 4.8 and aligning with MAO's approach for 4.9 which might include multiple instances of BMO or even CBO (still under review). The current assumption is that we can live with these limitations/concerns listed above for 1 release.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a normal OpenShift cluster we restrict the controllers to only watch a single namespace (openshift-machine-api), so if you were to create a machine outside of this namespace, nothing would happen, and that is expected.

I don't think there would be significant overhead in always watching all namespaces in a normal cluster.

This entirely depends on what kinds of resources you are watching. For example, in the machine controller we need to load secrets for userdata and credentials. Normally you would use the controller-runtime cached client for this so that you get notified of changes to the secrets to trigger a reconcile. If you don't restrict this cache to a certain namespace, it will cache all secrets from all namespaces, which not only could impact memory usage, crosses a security boundary (eg multi-tenant clusters must not list secrets from multiple namespaces).

We do watch Secrets as well as the hosts, but only those associated with the hosts. Based on what you're saying, it sounds like we may want to change the deployment so that our single baremetal-operator doesn't have to watch all Secrets? I'm not sure we'll actually save any RAM that way, since we'll have N copies of the operator with smaller caches, but it does address the security boundary issue.

We really can't afford to run multiple copies of Ironic, though. I think that means splitting the metal3 pod apart, which we didn't feel we would have time to do this release.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke with @hardys yesterday and I think given your current plans for 4.8, it makes sense to be able to watch all namespaces for the first iteration and look into later splitting into per namespace. Security boundaries are less important if all the hosts belong to the same person for example.

But I do still question removing the namespace limiting functionality altogether, is it not useful to keep that for regular OCP clusters and just have the ability to configure whether it should be namespaced or watch everything?

We do watch Secrets

Depending on what you do with the secrets, it may be better to use a non-caching client and Get instead of list. Looks like you're creating the secrets though so that's probably not possible here, else you'd lose the event notification if they are modified out of band

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CBO is not responsible for creating the Secrets for BMC credentials and does not try to access them. I think the above comment is meant for those Secrets and we need to make sure within BMO that we follow the above guidelines i.e do a GET for a specific Namespace vs a List for all Namespaces etc. Also, make sure we are using the right client to access the Secrets.

Copy link
Copy Markdown

@hardys hardys Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall that we said we cannot assume Provisioning CR is going to be present in the multi-namespace scenario. Has that changed?

@sadasu I don't recall saying that - I don't see any reason we couldn't put some configuration option in the provisioning CR, so that the current single-namespace behavior is maintained, but for the hub-cluster use-case watching all namespaces could be enabled?

The main disadvantage of that approach is probably that it expands the test matrix, as we potentially have to care about the situation where we switch from single->multi namespace mode as mentioned in metal3-io/baremetal-operator#797 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cb47870 is taking care of adding this config to Provisioning CR.

@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Feb 16, 2021

/test e2e-agnostic

Env: []corev1.EnvVar{
{
Name: "WATCH_NAMESPACE",
ValueFrom: &corev1.EnvVarSource{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what will be the impact, if any, on LeaderElectionNamespace for BMO in case of empty namespace?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that leader election needs a Namespace to be passed in https://github.com/kubernetes-sigs/controller-runtime/blob/0b554ebb54901a88427d4a89014af0d2dba1bbcf/pkg/manager/manager.go#L319.

With this change, BMO will not have the Namespace passed in via WATCH_NAMESPACE. But, with #107 we are passing in the POD_NAMESPACE which is the correct Namespace to be used for leader election. /cc @asalkeld

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 20, 2021
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 22, 2021
@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Feb 22, 2021

/test e2e-agnostic

@sadasu sadasu force-pushed the multi-namespace branch 2 times, most recently from 9faa8b9 to 7eff872 Compare February 22, 2021 22:26
@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Feb 23, 2021

/test e2e-metal-ipi

Comment thread api/v1alpha1/provisioning_types.go Outdated

// EnableMultiNamespaces provides a way to explicitly allow use of this
// Provisioning configuration across multiple namespaces.
EnableMultiNamespaces bool `json:"enableMultiNamespaces,omitempty"`
Copy link
Copy Markdown

@hardys hardys Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can use some existing API for this, e.g for BMH resources we already set metadata/namespace:

https://github.com/openshift/installer/blob/master/pkg/asset/machines/baremetal/hosts.go#L75

However, we unfortunately didn't add that to the provisioning CR here:

https://github.com/openshift/installer/blob/master/data/data/manifests/openshift/baremetal-provisioning-config.yaml.template#L4

But I think such an interface might fit more cleanly, e.g similar to the OLM operator namespace behavior described here:

https://docs.openshift.com/container-platform/4.6/operators/understanding/olm/olm-understanding-operatorgroups.html#olm-operatorgroups-target-namespace_olm-understanding-operatorgroups

Is there any way we could add that missing metadata on upgrade, e.g by looking at the status/generations which does show the openshift-machine-api namespace?

Alternatively if we think modifying that existing behavior will be too difficult or ambiguous, I'd suggest renaming this interface to something terser e.g:

   spec:
      allNamespaces: true

That would align pretty well with the OLM docs e.g targetNamespaces ?

Copy link
Copy Markdown
Contributor Author

@sadasu sadasu Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, Provisioning CR is cluster scoped. If we change it to Namespace scope, we would get this metadata change. This would be a breaking API change during an upgrade. The solution that is proposed here only tells us if multi-namespace mode is enabled. It will not allow up to specify an actual namespace. When we move to model where there would be a Provisioning CR per namespace, we could make the change to make Provisioning CRD with its scope set to Namespace.
In this 1st solution, I can update PR to use "allNamespaces" to convey this config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After going through the links mentioned above, I see that the semantics of the OLM operator namespace behavior different from what we are looking for here. If we re-use the name, would it cause some confusion?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadasu Ok sounds like we should go with the existing boolean, but perhaps rename the flag to allNamespaces (not blocking on that, but it seems to communicate the same as the proposed flag in a slightly terser and more accurate way, since "multi" could mean all, or some subset of all?)

Copy link
Copy Markdown
Member

@stbenjam stbenjam Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Would we never want to watch some but not all namespaces?

  • I'm also not a fan of more knobs. If it's always going to be turned on, why do we need it? And if it can be conditionally on/off how does one do that from the installer? Or do you change it on day 2?

Copy link
Copy Markdown
Contributor Author

@sadasu sadasu Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When in this mode, we ask BMO to watch all Namespaces (by not setting the WATCH_NAMESPACE env var), although in practice BMH objects would likely be created in some Namespaces.

Point taken about additional knob. Best solution would be to make Provisioning CR Namespaced but that is coming later. This config is optional, would be false by default and also from the Installer and can be set to true on Day2.

Also, @kirankt 's suggestion to call the knob, "WatchAllNamespaces" is a better choice than what I have now, imo. I am not sure if it is terse enough but it conveys its purpose clearly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we never want to watch some but not all namespaces?

Possibly, but I don't think there's any requirement for that right now.

I'm also not a fan of more knobs. If it's always going to be turned on, why do we need it? And if it can be conditionally on/off how does one do that from the installer? Or do you change it on day 2?

IMO this shouldn't always be turned on, particularly given that we want to align with the machine-api e.g "In a normal OpenShift cluster we restrict the controllers to only watch a single namespace (openshift-machine-api)" comment from @JoelSpeed

The future direction for multi-namespace at the machine-api layer seems unclear at this point, so it seems safest to leave the existing single-namespace behavior alone, and add a knob to enable the specific "hub cluster" use-case which isn't a normal OpenShift cluster.

In terms of how to configure it, I was thinking either modify the installer manifest or ensure it can be turned on after initial deployment (at least until the story around multi-namespace machine management is worked out for other controllers, when perhaps there will be some kind of installer flag).

@sadasu sadasu force-pushed the multi-namespace branch 2 times, most recently from 40ed1f7 to 56eb892 Compare February 24, 2021 23:06
@sadasu
Copy link
Copy Markdown
Contributor Author

sadasu commented Feb 25, 2021

/test e2e-agnostic

@hardys
Copy link
Copy Markdown

hardys commented Feb 25, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 25, 2021
@openshift-bot
Copy link
Copy Markdown

/retest

Please review the full test history for this PR and help us cut down flakes.

When the metal3 deployment is created, the BMO is not provided with
the WATCH_NAMESPACE env var which allows it to watch all Namespaces.
Also, change the rbacs to reflect the fact that BMH objects now can
exist in multiple namespaces and not just the openshift-machine-api
namespace.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 25, 2021
Copy link
Copy Markdown
Contributor

@kirankt kirankt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 25, 2021
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kirankt, sadasu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 1f7ff0a into openshift:master Feb 25, 2021
// will be used to provision baremetal hosts in only the
// openshift-machine-api namespace. When set to true, this provisioning
// configuration would be used for baremetal hosts across all namespaces.
WatchAllNamespaces bool `json:"watchAllNamespaces,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try to avoid using bool fields in the API, because it makes it difficult to expand or redefine them later. Here I would suggest an enum with 2 values Single and All, using All as the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.