Simplify OLM APIs #28

njhale · 2019-09-19T16:27:57Z

Proposal for simplifying OLM's APIs.

njhale · 2019-09-19T18:35:51Z

/cc @ecordell @kevinrizza @tkashem @dmesser @shawn-hurley @gallettilance @dinhxuanvu @jpeeler @awgreene @anik120

deads2k · 2019-09-19T22:03:34Z

/assign @jwforres

enhancements/olm/simplify-olm-apis.md

enhancements/olm/simplify-apis.md

jewzaam · 2019-09-25T17:57:46Z

enhancements/olm/simplify-apis.md

+
+A Cluster Admin can generate `ServiceAccounts` with different installation privileges as needed; e.g. namespace list, pod create, etc. To allow a user to reference a `ServiceAccount`, a Cluster Admin can bind a `Role/ClusterRole` with the `use` custom verb to the user.
+
+This behavior will be enforced by a `ValidatingAdmissionWebhook`. In order to configure serving certs for the webhook


By "This behavior" do you mean the user's access to the ServiceAccount? Ther big difference vs OperatorGroups is the user supplying the SA for the Operator CR. With OperatorGroups admins could setup what SA is used and not allow users to manage OperatorGroups at all. For OSD, my concern is a user could reference a SA with higher privilege than dedicated-admins have by simply knowing what strings to plug into the Operator CR.
Is there any RBAC check on the SA ref?

By "This behavior" do you mean the user's access to the ServiceAccount?

Yes.

For OSD, my concern is a user could reference a SA with higher privilege than dedicated-admins have by simply knowing what strings to plug into the Operator CR.
Is there any RBAC check on the SA ref?

This is what the use custom verb and ValidatingAdmissionWebhook will ensure. Operator create/update requests will be rejected if the user does not have use on the referenced ServiceAccount.

Ther big difference vs OperatorGroups is the user supplying the SA for the Operator CR. With OperatorGroups admins could setup what SA is used and not allow users to manage OperatorGroups at all.

Fair point. We've been working on an amendment to this proposal that shifts away from requiring a ServiceAccount. So, let me take the opportunity to get some early feedback.

Essentially, we want an "android-esque" permission approval flow. For us, this means that if the user that creates an Operator has the required permissions, OLM will install it with only the subset of permissions required. If an update comes through that requires greater permissions than the original subset, a user of greater or equal privileges must approve a permission expansion before OLM will install the update (and so on). The resource that will gate permission expansion will be InstallPlan.

For OSD, my concern is a user could reference a SA with higher privilege than dedicated-admins have by simply knowing what strings to plug into the Operator CR.
Is there any RBAC check on the SA ref?

This is what the use custom verb and ValidatingAdmissionWebhook will ensure. Operator create/update requests will be rejected if the user does not have use on the referenced ServiceAccount.

Do you have some docs on the use verb? I'd like to understand it in the context of this enhancment and for use in OSD where we're starting to develop some ValidatingAdmissionWebhooks. I see a discrete list of operations allowed but it doesn't include "USE": https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#webhook-configuration

Ther big difference vs OperatorGroups is the user supplying the SA for the Operator CR. With OperatorGroups admins could setup what SA is used and not allow users to manage OperatorGroups at all.

Fair point. We've been working on an amendment to this proposal that shifts away from requiring a ServiceAccount. So, let me take the opportunity to get some early feedback.

Essentially, we want an "android-esque" permission approval flow. For us, this means that if the user that creates an Operator has the required permissions, OLM will install it with only the subset of permissions required. If an update comes through that requires greater permissions than the original subset, a user of greater or equal privileges must approve a permission expansion before OLM will install the update (and so on). The resource that will gate permission expansion will be InstallPlan.

Seems reasonable assuming the following:

Auto-approval will not auto-approve an InstallPlan if additional permissions required are outside of permissions available to the user that created the Operator at the time of the creation of that Operator.

Approving an InstallPlan checks the approvers permissions as though that user was the one creating the Operator.

Do you have some docs on the use verb?

It's a made up RBAC verb, which would go in a (Cluster)Role; i.e not a webhook verb/action.

Seems reasonable assuming the following: ...

You've pretty much outlined the idea -- there are a few slight differences that will hopefully be clear once we push the updates to this enhancement.

enhancements/olm/simplify-apis.md

jewzaam · 2019-09-25T19:42:49Z

enhancements/olm/simplify-apis.md

+        lastTransitionTime: "2019-09-16T22:26:29Z"
+```
+
+Not all components will need to have status conditions surfaced. Initially, it should be sufficient to add conditions for:


Is there a material cost / downside to adding all components and all status conditions? If there is a good reason to limit this I'm all for it, otherwise it will be a great convenience to have everything roll up.

Not every resource has status conditions and we'll most likely need to do a mapping when we implement this (since under the cover they are different golang types). Other than that, I was thinking that it didn't make sense to have conditions for resources like ServiceAccounts and RBAC, etc... That being said, we can always add more as we find we need them.

enhancements/olm/simplify-apis.md

jewzaam · 2019-09-25T20:02:19Z

enhancements/olm/simplify-apis.md

+
+- A new API resource is required
+- Listing operators becomes a privileged operation that leaks info about existing namespaces and resources
+- Implict agreement that an operator is a cluster singleton


I think this has some drawbacks, see previous comment.

jewzaam · 2019-09-25T20:05:13Z

enhancements/olm/simplify-apis.md

+
+## Alternatives
+
+We can define a namespace-scoped operator resource in addition to the cluster-scoped one. We can use the cluster-scoped resource to indicate an operator may span multiple namespaces, and a namespace scoped operator to indicate it should be installed only in a specific namespace.


What are the use cases for operators that "span multiple namespaces"? An operator can watch resources across the cluster with sufficient RBAC, so it's not about what it can touch. And you could use multiple instances of Operator to install an operator more than once.

What are the use cases for operators that "span multiple namespaces"? An operator can watch resources across the cluster with sufficient RBAC, so it's not about what it can touch.

We're trying to set the stage for a better operator author user story. An operator author knows best how their operator works, and it's come into question whether enforcing the use of a single namespace is a reasonable constraint. For example, some operators are written with hardcoded namespace, and others with RoleBindings in kube-system. Rather than "what it can touch", I'd say it's a "native configuration" story. Along with this we also need to add knobs for admins to map those native namespaces.

And you could use multiple instances of Operator to install an operator more than once.

I don't think that multiple instances of an operator is a tenable configuration for a cluster. This comes down to the fact that conversion webhooks and extension api-servers are effectively singletons and require exactly one backing service to operate. While we can come up with creative (and probably complex) ways of intercepting and routing traffic, I think that any real solution will require better first-class multitenancy in kubernetes.

When I think of the operator installation I think mostly about the Deployment and associated Roles and RoleBindings (permissions in CSV currently). Anything outside of this is managed by part of the operator's bundle or managed by the operator's controller directly. I see bundled and controller managed resources as outside of the scope of what Operator will manage. It is possible today to create a bundle with Roles and RoleBindings outside of the namespace in which the Deployment is created, but Subscription is namespaced. What is changing about the operator story that changes the scope of operator installation to imply operators are a cluster scoped resource? At the end of the day, the core of an operator is the controller deployed as a pod by a deployment, which is namespaced.

I don't think the operator story is changing; the implementation is evolving to better match the reality of the platform. As @njhale points out, having multiple instances of the same controller running concurrently is not a good fit for the way k8s is evolving its CRD story. As I understand it, the outcome is that you'll have one instance of a controller running in the cluster, but then you might be able to choose whether that controller watches one namespace, a list of namespaces, or all namespaces.

mhrivnak · 2019-09-30T22:18:16Z

enhancements/olm/simplify-apis.md

+
+#### Constraining Installation
+
+In a fashion similar `OperatorGroups`, we can constrain the installation of an operator by associating it with a `ServiceAccount`. For an `Operator` resource, the user's desired `ServiceAccount` will be specified by the `spec.install.as` field. The install namespace, where the namespaced contents of an operator will be installed to, is specified by the `install.namespace` field.


What is driving this use case? Most admins are used to the idea of running a package manager as "root" as long as they can:

have an opportunity to see what it's going to do (in this case, just list the resources that will be created)

are able to limit the sources from which content is installed to sources they trust

The far more interesting concern is what permissions the operator itself runs with over the long term. That is a better fit for your android update analogy below. It's also observable from the operator bundle, so it's mostly a solved problem (unless you want to influence auto-upgrade gating based on RBAC changes).

When installing an operator, won't that in most cases involve creating a ServiceAccount, ClusterRole, and ClusterRoleBinding for that operator? If the installing user has the ability to create those things, don't they already have the keys to the kingdom? What value will we get from tracking resource-by-resource what that user can do vs what it takes to install the operator? Would it be sufficient to just assume that installing operators takes a lot of privilege, and gate that with RBAC for the Operator resource?

The user or SA creating a ClusterRoleBinding cannot do so unless they already have the permissions being granted. For managed services (OSD, ARO) a customer does not run as a cluster-admin. In this case we cannot allow customers to install any arbitrary operator as they could elevate privileges.

What does ARO stand for?

What is driving this use case? Most admins are used to the idea of running a package manager as "root"...

As @jewzaam points out, the admin installing an operator is not always a full blown cluster-admin.

The far more interesting concern is what permissions the operator itself runs with over the long term... It's also observable from the operator bundle, so it's mostly a solved problem (unless you want to influence auto-upgrade gating based on RBAC changes).

We plan on doing this. The recent proposal updates (hopefully) make this more clear.

If the installing user has the ability to create those things, don't they already have the keys to the kingdom?

kube's RBAC model prevents a user from creating/updating (Cluster)Roles/RoleBindings if they don't already have matching privileges at the same or greater scope (or have the escalate verb).

What value will we get from tracking resource-by-resource what that user can do vs what it takes to install the operator?

We prevent users from escalating their privilege via OLM.

What does ARO stand for?

Azure Red Hat OpenShift

mhrivnak · 2019-09-30T22:48:03Z

enhancements/olm/simplify-apis.md

+
+## Alternatives
+
+We can define a namespace-scoped operator resource in addition to the cluster-scoped one. We can use the cluster-scoped resource to indicate an operator may span multiple namespaces, and a namespace scoped operator to indicate it should be installed only in a specific namespace.


I don't think the operator story is changing; the implementation is evolving to better match the reality of the platform. As @njhale points out, having multiple instances of the same controller running concurrently is not a good fit for the way k8s is evolving its CRD story. As I understand it, the outcome is that you'll have one instance of a controller running in the cluster, but then you might be able to choose whether that controller watches one namespace, a list of namespaces, or all namespaces.

ecordell

/approve

I think this looks good and goes into the right amount of detail. We'll probably want a couple of proposals in the OLM repo (particularly around the ServiceAccount generation and the Install approval process).

@dmesser @jwforres @shawn-hurley Please do a final pass on this so we can merge. Thanks!

openshift-ci-robot · 2019-10-03T23:15:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ecordell, njhale

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [ecordell]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 19, 2019

openshift-ci-robot requested review from brancz and spadgett September 19, 2019 16:28

openshift-ci-robot requested review from anik120, awgreene, dinhxuanvu, dmesser, ecordell, gallettilance, jpeeler, kevinrizza, shawn-hurley and tkashem September 19, 2019 18:35

openshift-ci-robot assigned jwforres Sep 19, 2019

tkashem mentioned this pull request Sep 20, 2019

Propose convention for storing operator bundle #27

Merged

ecordell reviewed Sep 20, 2019

View reviewed changes

enhancements/olm/simplify-olm-apis.md Outdated Show resolved Hide resolved

ecordell reviewed Sep 20, 2019

View reviewed changes

enhancements/olm/simplify-olm-apis.md Outdated Show resolved Hide resolved

ecordell reviewed Sep 20, 2019

View reviewed changes

enhancements/olm/simplify-olm-apis.md Outdated Show resolved Hide resolved

ecordell reviewed Sep 20, 2019

View reviewed changes

enhancements/olm/simplify-olm-apis.md Outdated Show resolved Hide resolved

ecordell reviewed Sep 20, 2019

View reviewed changes

enhancements/olm/simplify-olm-apis.md Outdated Show resolved Hide resolved

ecordell reviewed Sep 20, 2019

View reviewed changes

enhancements/olm/simplify-olm-apis.md Outdated Show resolved Hide resolved

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 24, 2019

njhale mentioned this pull request Sep 24, 2019

WIP Add operators.coreos.com/v2alpha1 operator-framework/api#4

Closed

jewzaam suggested changes Sep 25, 2019

View reviewed changes

mhrivnak reviewed Sep 30, 2019

View reviewed changes

njhale mentioned this pull request Oct 1, 2019

WIP Add operators.coreos.com/v2alpha1 operator-framework/api#5

Closed

njhale changed the title ~~WIP Enhancement: Simplify OLM APIs~~ Simplify OLM APIs Oct 2, 2019

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2019

docs(olm): add enhancement proposal for simplifying apis

2f5ddcf

openshift-ci-robot assigned ecordell Oct 3, 2019

ecordell approved these changes Oct 3, 2019

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 3, 2019

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2019

openshift-merge-robot merged commit 0e06c9d into openshift:master Oct 3, 2019


		A Cluster Admin can generate `ServiceAccounts` with different installation privileges as needed; e.g. namespace list, pod create, etc. To allow a user to reference a `ServiceAccount`, a Cluster Admin can bind a `Role/ClusterRole` with the `use` custom verb to the user.

		This behavior will be enforced by a `ValidatingAdmissionWebhook`. In order to configure serving certs for the webhook


		## Alternatives

		We can define a namespace-scoped operator resource in addition to the cluster-scoped one. We can use the cluster-scoped resource to indicate an operator may span multiple namespaces, and a namespace scoped operator to indicate it should be installed only in a specific namespace.


		#### Constraining Installation

		In a fashion similar `OperatorGroups`, we can constrain the installation of an operator by associating it with a `ServiceAccount`. For an `Operator` resource, the user's desired `ServiceAccount` will be specified by the `spec.install.as` field. The install namespace, where the namespaced contents of an operator will be installed to, is specified by the `install.namespace` field.

Simplify OLM APIs #28

Simplify OLM APIs #28

Uh oh!

Conversation

njhale commented Sep 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhale commented Sep 19, 2019

Uh oh!

deads2k commented Sep 19, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njhale Oct 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ecordell left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Oct 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

njhale commented Sep 19, 2019 •

edited

Loading

njhale Oct 2, 2019 •

edited

Loading

ecordell left a comment •

edited

Loading