-
Notifications
You must be signed in to change notification settings - Fork 535
Simplify OLM APIs #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/assign @jwforres |
enhancements/olm/simplify-apis.md
Outdated
|
|
||
| A Cluster Admin can generate `ServiceAccounts` with different installation privileges as needed; e.g. namespace list, pod create, etc. To allow a user to reference a `ServiceAccount`, a Cluster Admin can bind a `Role/ClusterRole` with the `use` custom verb to the user. | ||
|
|
||
| This behavior will be enforced by a `ValidatingAdmissionWebhook`. In order to configure serving certs for the webhook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "This behavior" do you mean the user's access to the ServiceAccount? Ther big difference vs OperatorGroups is the user supplying the SA for the Operator CR. With OperatorGroups admins could setup what SA is used and not allow users to manage OperatorGroups at all. For OSD, my concern is a user could reference a SA with higher privilege than dedicated-admins have by simply knowing what strings to plug into the Operator CR.
Is there any RBAC check on the SA ref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "This behavior" do you mean the user's access to the ServiceAccount?
Yes.
For OSD, my concern is a user could reference a SA with higher privilege than dedicated-admins have by simply knowing what strings to plug into the Operator CR.
Is there any RBAC check on the SA ref?
This is what the use custom verb and ValidatingAdmissionWebhook will ensure. Operator create/update requests will be rejected if the user does not have use on the referenced ServiceAccount.
Ther big difference vs OperatorGroups is the user supplying the SA for the Operator CR. With OperatorGroups admins could setup what SA is used and not allow users to manage OperatorGroups at all.
Fair point. We've been working on an amendment to this proposal that shifts away from requiring a ServiceAccount. So, let me take the opportunity to get some early feedback.
Essentially, we want an "android-esque" permission approval flow. For us, this means that if the user that creates an Operator has the required permissions, OLM will install it with only the subset of permissions required. If an update comes through that requires greater permissions than the original subset, a user of greater or equal privileges must approve a permission expansion before OLM will install the update (and so on). The resource that will gate permission expansion will be InstallPlan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For OSD, my concern is a user could reference a SA with higher privilege than dedicated-admins have by simply knowing what strings to plug into the Operator CR.
Is there any RBAC check on the SA ref?This is what the
usecustom verb andValidatingAdmissionWebhookwill ensure.Operatorcreate/update requests will be rejected if the user does not haveuseon the referencedServiceAccount.
Do you have some docs on the use verb? I'd like to understand it in the context of this enhancment and for use in OSD where we're starting to develop some ValidatingAdmissionWebhooks. I see a discrete list of operations allowed but it doesn't include "USE": https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#webhook-configuration
Ther big difference vs OperatorGroups is the user supplying the SA for the Operator CR. With OperatorGroups admins could setup what SA is used and not allow users to manage OperatorGroups at all.
Fair point. We've been working on an amendment to this proposal that shifts away from requiring a
ServiceAccount. So, let me take the opportunity to get some early feedback.Essentially, we want an "android-esque" permission approval flow. For us, this means that if the user that creates an
Operatorhas the required permissions, OLM will install it with only the subset of permissions required. If an update comes through that requires greater permissions than the original subset, a user of greater or equal privileges must approve a permission expansion before OLM will install the update (and so on). The resource that will gate permission expansion will beInstallPlan.
Seems reasonable assuming the following:
- Auto-approval will not auto-approve an
InstallPlanif additional permissions required are outside of permissions available to the user that created theOperatorat the time of the creation of thatOperator. - Approving an
InstallPlanchecks the approvers permissions as though that user was the one creating theOperator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have some docs on the use verb?
It's a made up RBAC verb, which would go in a (Cluster)Role; i.e not a webhook verb/action.
Seems reasonable assuming the following: ...
You've pretty much outlined the idea -- there are a few slight differences that will hopefully be clear once we push the updates to this enhancement.
| lastTransitionTime: "2019-09-16T22:26:29Z" | ||
| ``` | ||
|
|
||
| Not all components will need to have status conditions surfaced. Initially, it should be sufficient to add conditions for: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a material cost / downside to adding all components and all status conditions? If there is a good reason to limit this I'm all for it, otherwise it will be a great convenience to have everything roll up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every resource has status conditions and we'll most likely need to do a mapping when we implement this (since under the cover they are different golang types). Other than that, I was thinking that it didn't make sense to have conditions for resources like ServiceAccounts and RBAC, etc... That being said, we can always add more as we find we need them.
|
|
||
| - A new API resource is required | ||
| - Listing operators becomes a privileged operation that leaks info about existing namespaces and resources | ||
| - Implict agreement that an operator is a cluster singleton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has some drawbacks, see previous comment.
|
|
||
| ## Alternatives | ||
|
|
||
| We can define a namespace-scoped operator resource in addition to the cluster-scoped one. We can use the cluster-scoped resource to indicate an operator may span multiple namespaces, and a namespace scoped operator to indicate it should be installed only in a specific namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the use cases for operators that "span multiple namespaces"? An operator can watch resources across the cluster with sufficient RBAC, so it's not about what it can touch. And you could use multiple instances of Operator to install an operator more than once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the use cases for operators that "span multiple namespaces"? An operator can watch resources across the cluster with sufficient RBAC, so it's not about what it can touch.
We're trying to set the stage for a better operator author user story. An operator author knows best how their operator works, and it's come into question whether enforcing the use of a single namespace is a reasonable constraint. For example, some operators are written with hardcoded namespace, and others with RoleBindings in kube-system. Rather than "what it can touch", I'd say it's a "native configuration" story. Along with this we also need to add knobs for admins to map those native namespaces.
And you could use multiple instances of Operator to install an operator more than once.
I don't think that multiple instances of an operator is a tenable configuration for a cluster. This comes down to the fact that conversion webhooks and extension api-servers are effectively singletons and require exactly one backing service to operate. While we can come up with creative (and probably complex) ways of intercepting and routing traffic, I think that any real solution will require better first-class multitenancy in kubernetes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I think of the operator installation I think mostly about the Deployment and associated Roles and RoleBindings (permissions in CSV currently). Anything outside of this is managed by part of the operator's bundle or managed by the operator's controller directly. I see bundled and controller managed resources as outside of the scope of what Operator will manage. It is possible today to create a bundle with Roles and RoleBindings outside of the namespace in which the Deployment is created, but Subscription is namespaced. What is changing about the operator story that changes the scope of operator installation to imply operators are a cluster scoped resource? At the end of the day, the core of an operator is the controller deployed as a pod by a deployment, which is namespaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the operator story is changing; the implementation is evolving to better match the reality of the platform. As @njhale points out, having multiple instances of the same controller running concurrently is not a good fit for the way k8s is evolving its CRD story. As I understand it, the outcome is that you'll have one instance of a controller running in the cluster, but then you might be able to choose whether that controller watches one namespace, a list of namespaces, or all namespaces.
enhancements/olm/simplify-apis.md
Outdated
|
|
||
| #### Constraining Installation | ||
|
|
||
| In a fashion similar `OperatorGroups`, we can constrain the installation of an operator by associating it with a `ServiceAccount`. For an `Operator` resource, the user's desired `ServiceAccount` will be specified by the `spec.install.as` field. The install namespace, where the namespaced contents of an operator will be installed to, is specified by the `install.namespace` field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is driving this use case? Most admins are used to the idea of running a package manager as "root" as long as they can:
- have an opportunity to see what it's going to do (in this case, just list the resources that will be created)
- are able to limit the sources from which content is installed to sources they trust
The far more interesting concern is what permissions the operator itself runs with over the long term. That is a better fit for your android update analogy below. It's also observable from the operator bundle, so it's mostly a solved problem (unless you want to influence auto-upgrade gating based on RBAC changes).
When installing an operator, won't that in most cases involve creating a ServiceAccount, ClusterRole, and ClusterRoleBinding for that operator? If the installing user has the ability to create those things, don't they already have the keys to the kingdom? What value will we get from tracking resource-by-resource what that user can do vs what it takes to install the operator? Would it be sufficient to just assume that installing operators takes a lot of privilege, and gate that with RBAC for the Operator resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user or SA creating a ClusterRoleBinding cannot do so unless they already have the permissions being granted. For managed services (OSD, ARO) a customer does not run as a cluster-admin. In this case we cannot allow customers to install any arbitrary operator as they could elevate privileges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does ARO stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is driving this use case? Most admins are used to the idea of running a package manager as "root"...
As @jewzaam points out, the admin installing an operator is not always a full blown cluster-admin.
The far more interesting concern is what permissions the operator itself runs with over the long term... It's also observable from the operator bundle, so it's mostly a solved problem (unless you want to influence auto-upgrade gating based on RBAC changes).
We plan on doing this. The recent proposal updates (hopefully) make this more clear.
If the installing user has the ability to create those things, don't they already have the keys to the kingdom?
kube's RBAC model prevents a user from creating/updating (Cluster)Roles/RoleBindings if they don't already have matching privileges at the same or greater scope (or have the escalate verb).
What value will we get from tracking resource-by-resource what that user can do vs what it takes to install the operator?
We prevent users from escalating their privilege via OLM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does ARO stand for?
|
|
||
| ## Alternatives | ||
|
|
||
| We can define a namespace-scoped operator resource in addition to the cluster-scoped one. We can use the cluster-scoped resource to indicate an operator may span multiple namespaces, and a namespace scoped operator to indicate it should be installed only in a specific namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the operator story is changing; the implementation is evolving to better match the reality of the platform. As @njhale points out, having multiple instances of the same controller running concurrently is not a good fit for the way k8s is evolving its CRD story. As I understand it, the outcome is that you'll have one instance of a controller running in the cluster, but then you might be able to choose whether that controller watches one namespace, a list of namespaces, or all namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
I think this looks good and goes into the right amount of detail. We'll probably want a couple of proposals in the OLM repo (particularly around the ServiceAccount generation and the Install approval process).
@dmesser @jwforres @shawn-hurley Please do a final pass on this so we can merge. Thanks!
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ecordell, njhale The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Proposal for simplifying OLM's APIs.