Allow BMO to watch all Namespaces#106
Allow BMO to watch all Namespaces#106openshift-merge-robot merged 2 commits intoopenshift:masterfrom
Conversation
|
/test e2e-metal-ipi-ovn-ipv6 |
|
/retest |
| }, | ||
| Env: []corev1.EnvVar{ | ||
| { | ||
| Name: "WATCH_NAMESPACE", |
There was a problem hiding this comment.
Are we Ok with unconditionally watching all namespaces, or should we have a provisioning CR option to enable it just for the hub-cluster use-case?
@dhellmann I recall we discussed both options before - in the regular IPI case where BMH resources only exist in the openshift-machine-api namespace, will there be any overhead to this being unconditional?
There was a problem hiding this comment.
I don't think there would be significant overhead in always watching all namespaces in a normal cluster. There's a kubernetes API to list all resources of a type, regardless of the namespace, so I would expect the client in the controller-runtime to use that instead of the API for listing resources within a namespace.
There could be some user confusion if a user creates a host outside of the openshift-machine-api namespace and it is inspected but cannot be used for provisioning because CAPBM won't see it.
There was a problem hiding this comment.
There could be some user confusion if a user creates a host outside of the openshift-machine-api namespace and it is inspected but cannot be used for provisioning because CAPBM won't see it.
I think that will be the next step, e.g look at the work being done in the machine-api team to enable machines in a multi-namespace environment, IIUC the current plan is a machine-controller (and thus CAPBM) per-namespace, so we'll need to sync up with folks working on that and figure out how to make it work on baremetal (current testing is only on AWS I believe)
@JoelSpeed is ^^ accurate, and can you point to any docs/scripts etc that we can refer to re running multi-namespace machine controllers (I'm thinking it's probably simplest to start without hive and just test Machine+BaremetalHost resources?)
There was a problem hiding this comment.
In a normal OpenShift cluster we restrict the controllers to only watch a single namespace (openshift-machine-api), so if you were to create a machine outside of this namespace, nothing would happen, and that is expected.
I don't think there would be significant overhead in always watching all namespaces in a normal cluster.
This entirely depends on what kinds of resources you are watching. For example, in the machine controller we need to load secrets for userdata and credentials. Normally you would use the controller-runtime cached client for this so that you get notified of changes to the secrets to trigger a reconcile. If you don't restrict this cache to a certain namespace, it will cache all secrets from all namespaces, which not only could impact memory usage, crosses a security boundary (eg multi-tenant clusters must not list secrets from multiple namespaces).
There was a problem hiding this comment.
Are we Ok with unconditionally watching all namespaces, or should we have a provisioning CR option to enable it just for the hub-cluster use-case?
@hardys In some earlier discussions around this, I recall that we said we cannot assume Provisioning CR is going to be present in the multi-namespace scenario. Has that changed?
@JoelSpeed the issues you mentioned above are valid. I think our team is viewing this approach only for 4.8 and aligning with MAO's approach for 4.9 which might include multiple instances of BMO or even CBO (still under review). The current assumption is that we can live with these limitations/concerns listed above for 1 release.
There was a problem hiding this comment.
In a normal OpenShift cluster we restrict the controllers to only watch a single namespace (
openshift-machine-api), so if you were to create a machine outside of this namespace, nothing would happen, and that is expected.I don't think there would be significant overhead in always watching all namespaces in a normal cluster.
This entirely depends on what kinds of resources you are watching. For example, in the machine controller we need to load secrets for userdata and credentials. Normally you would use the controller-runtime cached client for this so that you get notified of changes to the secrets to trigger a reconcile. If you don't restrict this cache to a certain namespace, it will cache all secrets from all namespaces, which not only could impact memory usage, crosses a security boundary (eg multi-tenant clusters must not list secrets from multiple namespaces).
We do watch Secrets as well as the hosts, but only those associated with the hosts. Based on what you're saying, it sounds like we may want to change the deployment so that our single baremetal-operator doesn't have to watch all Secrets? I'm not sure we'll actually save any RAM that way, since we'll have N copies of the operator with smaller caches, but it does address the security boundary issue.
We really can't afford to run multiple copies of Ironic, though. I think that means splitting the metal3 pod apart, which we didn't feel we would have time to do this release.
There was a problem hiding this comment.
I spoke with @hardys yesterday and I think given your current plans for 4.8, it makes sense to be able to watch all namespaces for the first iteration and look into later splitting into per namespace. Security boundaries are less important if all the hosts belong to the same person for example.
But I do still question removing the namespace limiting functionality altogether, is it not useful to keep that for regular OCP clusters and just have the ability to configure whether it should be namespaced or watch everything?
We do watch Secrets
Depending on what you do with the secrets, it may be better to use a non-caching client and Get instead of list. Looks like you're creating the secrets though so that's probably not possible here, else you'd lose the event notification if they are modified out of band
There was a problem hiding this comment.
The CBO is not responsible for creating the Secrets for BMC credentials and does not try to access them. I think the above comment is meant for those Secrets and we need to make sure within BMO that we follow the above guidelines i.e do a GET for a specific Namespace vs a List for all Namespaces etc. Also, make sure we are using the right client to access the Secrets.
There was a problem hiding this comment.
I recall that we said we cannot assume Provisioning CR is going to be present in the multi-namespace scenario. Has that changed?
@sadasu I don't recall saying that - I don't see any reason we couldn't put some configuration option in the provisioning CR, so that the current single-namespace behavior is maintained, but for the hub-cluster use-case watching all namespaces could be enabled?
The main disadvantage of that approach is probably that it expands the test matrix, as we potentially have to care about the situation where we switch from single->multi namespace mode as mentioned in metal3-io/baremetal-operator#797 (comment)
There was a problem hiding this comment.
cb47870 is taking care of adding this config to Provisioning CR.
|
/test e2e-agnostic |
| Env: []corev1.EnvVar{ | ||
| { | ||
| Name: "WATCH_NAMESPACE", | ||
| ValueFrom: &corev1.EnvVarSource{ |
There was a problem hiding this comment.
Question: what will be the impact, if any, on LeaderElectionNamespace for BMO in case of empty namespace?
There was a problem hiding this comment.
It appears that leader election needs a Namespace to be passed in https://github.com/kubernetes-sigs/controller-runtime/blob/0b554ebb54901a88427d4a89014af0d2dba1bbcf/pkg/manager/manager.go#L319.
With this change, BMO will not have the Namespace passed in via WATCH_NAMESPACE. But, with #107 we are passing in the POD_NAMESPACE which is the correct Namespace to be used for leader election. /cc @asalkeld
d2e7050 to
d073d2d
Compare
d073d2d to
192f344
Compare
|
/test e2e-agnostic |
9faa8b9 to
7eff872
Compare
|
/test e2e-metal-ipi |
|
|
||
| // EnableMultiNamespaces provides a way to explicitly allow use of this | ||
| // Provisioning configuration across multiple namespaces. | ||
| EnableMultiNamespaces bool `json:"enableMultiNamespaces,omitempty"` |
There was a problem hiding this comment.
I wonder if we can use some existing API for this, e.g for BMH resources we already set metadata/namespace:
However, we unfortunately didn't add that to the provisioning CR here:
But I think such an interface might fit more cleanly, e.g similar to the OLM operator namespace behavior described here:
Is there any way we could add that missing metadata on upgrade, e.g by looking at the status/generations which does show the openshift-machine-api namespace?
Alternatively if we think modifying that existing behavior will be too difficult or ambiguous, I'd suggest renaming this interface to something terser e.g:
spec:
allNamespaces: true
That would align pretty well with the OLM docs e.g targetNamespaces ?
There was a problem hiding this comment.
Currently, Provisioning CR is cluster scoped. If we change it to Namespace scope, we would get this metadata change. This would be a breaking API change during an upgrade. The solution that is proposed here only tells us if multi-namespace mode is enabled. It will not allow up to specify an actual namespace. When we move to model where there would be a Provisioning CR per namespace, we could make the change to make Provisioning CRD with its scope set to Namespace.
In this 1st solution, I can update PR to use "allNamespaces" to convey this config.
There was a problem hiding this comment.
After going through the links mentioned above, I see that the semantics of the OLM operator namespace behavior different from what we are looking for here. If we re-use the name, would it cause some confusion?
There was a problem hiding this comment.
@sadasu Ok sounds like we should go with the existing boolean, but perhaps rename the flag to allNamespaces (not blocking on that, but it seems to communicate the same as the proposed flag in a slightly terser and more accurate way, since "multi" could mean all, or some subset of all?)
There was a problem hiding this comment.
-
Would we never want to watch some but not all namespaces?
-
I'm also not a fan of more knobs. If it's always going to be turned on, why do we need it? And if it can be conditionally on/off how does one do that from the installer? Or do you change it on day 2?
There was a problem hiding this comment.
When in this mode, we ask BMO to watch all Namespaces (by not setting the WATCH_NAMESPACE env var), although in practice BMH objects would likely be created in some Namespaces.
Point taken about additional knob. Best solution would be to make Provisioning CR Namespaced but that is coming later. This config is optional, would be false by default and also from the Installer and can be set to true on Day2.
Also, @kirankt 's suggestion to call the knob, "WatchAllNamespaces" is a better choice than what I have now, imo. I am not sure if it is terse enough but it conveys its purpose clearly.
There was a problem hiding this comment.
Would we never want to watch some but not all namespaces?
Possibly, but I don't think there's any requirement for that right now.
I'm also not a fan of more knobs. If it's always going to be turned on, why do we need it? And if it can be conditionally on/off how does one do that from the installer? Or do you change it on day 2?
IMO this shouldn't always be turned on, particularly given that we want to align with the machine-api e.g "In a normal OpenShift cluster we restrict the controllers to only watch a single namespace (openshift-machine-api)" comment from @JoelSpeed
The future direction for multi-namespace at the machine-api layer seems unclear at this point, so it seems safest to leave the existing single-namespace behavior alone, and add a knob to enable the specific "hub cluster" use-case which isn't a normal OpenShift cluster.
In terms of how to configure it, I was thinking either modify the installer manifest or ensure it can be turned on after initial deployment (at least until the story around multi-namespace machine management is worked out for other controllers, when perhaps there will be some kind of installer flag).
40ed1f7 to
56eb892
Compare
|
/test e2e-agnostic |
|
/lgtm |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
When the metal3 deployment is created, the BMO is not provided with the WATCH_NAMESPACE env var which allows it to watch all Namespaces. Also, change the rbacs to reflect the fact that BMH objects now can exist in multiple namespaces and not just the openshift-machine-api namespace.
56eb892 to
4f004f4
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kirankt, sadasu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| // will be used to provision baremetal hosts in only the | ||
| // openshift-machine-api namespace. When set to true, this provisioning | ||
| // configuration would be used for baremetal hosts across all namespaces. | ||
| WatchAllNamespaces bool `json:"watchAllNamespaces,omitempty"` |
There was a problem hiding this comment.
We generally try to avoid using bool fields in the API, because it makes it difficult to expand or redefine them later. Here I would suggest an enum with 2 values Single and All, using All as the default.
When the metal3 deployment is created, the BMO is not provided with
the WATCH_NAMESPACE env var which allows it to watch all Namespaces.
Also, change the rbacs to reflect the fact that BMH objects now can
exist in multiple namespaces and not just the openshift-machine-api
namespace.