Skip to content

RFD 174 - Kube RBAC Bootstrap.#43322

Closed
AntonAM wants to merge 28 commits intomasterfrom
rfd/0174-kube-rbac-bootstrap
Closed

RFD 174 - Kube RBAC Bootstrap.#43322
AntonAM wants to merge 28 commits intomasterfrom
rfd/0174-kube-rbac-bootstrap

Conversation

@AntonAM
Copy link
Copy Markdown
Contributor

@AntonAM AntonAM commented Jun 20, 2024

This PR adds RFD for improving RBAC setup capabilities of Teleport.

Context: Implements #39628 and also addressing Sasha's comments about improving Day 1 UX.

Rendered version

@AntonAM AntonAM added kubernetes-access rfd Request for Discussion no-changelog Indicates that a PR does not require a changelog entry labels Jun 20, 2024
@github-actions github-actions Bot requested review from atburke and jakule June 20, 2024 21:20
@AntonAM AntonAM requested review from hugoShaka, rosstimothy and tigrato and removed request for atburke and jakule June 20, 2024 21:20
Copy link
Copy Markdown
Contributor

@tigrato tigrato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we plan to migrate customers that do not have this feature enabled and do not want to have it in the future because they roll out their own kubernetes rbac using a different proccess.

I think we should do something similar to how we roll out kubernetes app discovery. by default for existing clusters it's disabled but for new installations, it's enabled by default.

Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated

### Exposing the default Kubernetes user facing cluster roles (cluster-admin, view, edit)

Kubernetes clusters have default user-facing cluster roles: cluster-admin, view, edit. By default they are not usable, because
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system:masters is the only one that is binded everywhere but GKE doesn't allow it to be used when impersonating requests.

Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
Currently, we only expose the cluster-admin role on GKE
installations, since system:masters group is not available for impersonation there. We will expand on this and when
discovering a cluster/installing kube-agent/creating a service account, we will
create cluster role bindings for those roles, accordingly linking them to the groups "default-cluster-admin", "default-view" and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of automatically linking them, we should instead ship a kubernetes provision entry that does that job for us. THis is good because if customers do not want it, they can simply remove the resource from teleport and kubernetes agents will handle everything.

It's also good for users to be able to understand how it works

Copy link
Copy Markdown
Contributor Author

@AntonAM AntonAM Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I've described this option in the alternative. But I think overall it's better to go with the "expose by default" route, since combined with our now available fine-tuning of kubernetes resources access in Teleport roles it can make it easier for the new users and to those users who don't need complicated RBAC setup. We could also depend on that default in our documentation and simplify those as well (which we can't really do if we rely on the kube provision path). But ultimately it's a product decision.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the option with less manually created resources, whether in Teleport or Kubernetes, is going to result in the better UX - especially for Day 0 users. I think we should strive to make Day 0 as easy as possible without requiring users to invoke tctl or learn about new Teleport resources if at all possible.

Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated

## Security

The introduction of this functionality does not directly impact the security of Teleport
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kube agent already has abitilty to impersonate all groups/users, so they can always impersonate someone in the cluster

@rosstimothy rosstimothy requested a review from tigrato June 24, 2024 17:12
@AntonAM
Copy link
Copy Markdown
Contributor Author

AntonAM commented Jun 25, 2024

How do we plan to migrate customers that do not have this feature enabled and do not want to have it in the future because they roll out their own kubernetes rbac using a different proccess.

KubeProvision doesn't do anything unless users create those resources, so there's no need to do anything here. And with exposing default roles I think we just do that on new install by default. The roles do not force clients to use them by themselves, we basically state that this is what default Teleport installation is. We will provide options to not create the default bindings though, as we do with the rest of Teleport RBAC setup.

@tigrato
Copy link
Copy Markdown
Contributor

tigrato commented Jun 27, 2024

How do we plan to migrate customers that do not have this feature enabled and do not want to have it in the future because they roll out their own kubernetes rbac using a different proccess.

KubeProvision doesn't do anything unless users create those resources, so there's no need to do anything here. And with exposing default roles I think we just do that on new install by default. The roles do not force clients to use them by themselves, we basically state that this is what default Teleport installation is. We will provide options to not create the default bindings though, as we do with the rest of Teleport RBAC setup.

It's the opposite of what I believe is good user experience in this case. Our focus should be on improving the onboarding process for new users, ensuring they grasp how Teleport operates easily. At the same time, we need to offer solutions for existing customers to expand and allow Teleport to handle everything seamlessly.

My suggestion is to create a clusterRoleBinding by default that assigns the default view and cluster-admin roles to a designated teleport:xx:group. This setup would come pre-configured in Teleport, and for new Kubernetes installations, this field would be activated by default.

$ cat << EOF > prod-cluster-values.yaml
roles: kube,app,discovery
authToken: 616346f2892f82de6a58f8dd09db78b8
proxyAddr: tele.local:3080
kubeClusterName: test
enableRBACSync: true <-- new setting
labels:
    teleport.internal/resource-id: b34f651c-32de-45f6-86dc-ab5173f716c9
enterprise: true
EOF
 
$ helm install teleport-agent teleport/teleport-kube-agent -f prod-cluster-values.yaml --version 17.0.0-dev \
--create-namespace --namespace test

Implementing this will automate the synchronization of default RBAC settings, greatly simplifying the subsequent setup steps where we guide users through configuring kubernetes_users and kubernetes_groups. This automation is possible because we'll already have knowledge of the groups and users present in the cluster.

For clusters that are already enrolled, users will need to manually enable enableRBACSync to ensure we don't disrupt their existing configurations. This step is crucial to maintain their setup intact and allow them to migrate to this new feature. Existing customers have deep knowledge on how Teleport and Kubernetes interconnect and work so they know exactly what they want/need in their setup.

Making the enrollment process simpler will be a win-win situation. Check out #28276 where our @strideynet was a bit confused (lol).

@AntonAM
Copy link
Copy Markdown
Contributor Author

AntonAM commented Jun 27, 2024

It's the opposite of what I believe is good user experience in this case. Our focus should be on improving the onboarding process for new users, ensuring they grasp how Teleport operates easily. At the same time, we need to offer solutions for existing customers to expand and allow Teleport to handle everything seamlessly.

Hm, I feel that I propose the same thing functionally. I advocate for enabling it by default on (creating clusterRoleBindings for view, edit and cluster-admin roles) chart installation. And yep, as you said, and as I mention in the RFD, it improves situation for new enrollments, documentation etc, because we can rely on those default groups to be there, that's why I want it to be default.
I'm just thinking about new flag in the rbac field.

$ cat << EOF > prod-cluster-values.yaml
roles: kube,app,discovery
authToken: 616346f2892f82de6a58f8dd09db78b8
proxyAddr: tele.local:3080
kubeClusterName: test
rbac:
    create: true           <-- existing setting, defaults to true
    bindDefaultRoles: true <-- new setting, also defaults to true
labels:
    teleport.internal/resource-id: b34f651c-32de-45f6-86dc-ab5173f716c9
enterprise: true
EOF

So if either rbac.create or rbac.bindDefaultRoles is false we don't expose the default roles. For new installations it will be on by default. And for already enrolled clusters user will need to enable it manually or manually create clusterRoleBindings if they want.

Comment thread rfd/0174-kube-rbac-bootstrap.md
Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
@AntonAM AntonAM requested review from hugoShaka and klizhentas July 5, 2024 21:14
@AntonAM
Copy link
Copy Markdown
Contributor Author

AntonAM commented Jul 11, 2024

@klizhentas can you take a look, we have a couple of product decisions to make here (mainly if just always expose default kubernetes roles on agent installation as a part of Teleport RBAC resources we create or use KubeProvision feature for it)

Comment thread rfd/0174-kube-rbac-bootstrap.md
Comment thread rfd/0174-kube-rbac-bootstrap.md
Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
// DeleteKubeProvision deletes a KubeProvision.
rpc DeleteKubeProvision(DeleteKubeProvisionRequest) returns (google.protobuf.Empty);

// DeleteAllKubeProvisions removes all KubeProvisions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeleteAll is only required for the cache and is not recommended to be exposed in the gRPC layer.

Comment thread rfd/0174-kube-rbac-bootstrap.md
@AntonAM
Copy link
Copy Markdown
Contributor Author

AntonAM commented Jul 30, 2024

@tigrato @hugoShaka @rosstimothy @klizhentas I've updated the RFD to reflect the decision we made to go with auto-provisioning Kubernetes roles from the Teleport roles, instead of default Kubernetes roles exposure, as a Day 1 improvement.

Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated

Presence of this field will signal that user wants Teleport to auto-provision these permissions into the Kubernetes clusters.
Teleport will translate permissions defined in that field into Kubernetes RBAC definitions and will automatically create roles and bindings
on the target Kubernetes clusters (as specified by the labels). If `namespaces` field includes '*' Teleport will create ClusterRole/ClusterRoleBinding
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as specified by the labels

Is this referring to the kubernetes_labels field of a role? Should we mention this also applies to kubernetes_label_expressions as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, basically we should match the targeted Kubernetes clusters - if user can access it, we should have role provisioned there.

Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
Comment on lines +197 to +198
$ kubectl get clusterRoles -o yaml > raw_cluster_roles.yaml
$ tctl create -f kube_provision_staging.yaml --import raw_cluster_roles.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this flow a little bit? Instead of requiring two yaml files could we pass in the name and labels for the kube_provision resource? What do you think about something like the following (naming could use some further thought)?

kubectl get clusterRoles -o yaml | tctl kube import --name=staging-rbac --labels=env=staging
kube provision "staging-rbac" had been created.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think looks good. Should we call it tctl kube import-provision? And also add option to provide input file.

Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated

Even though we will always run the reconciliation loop, by default it will be a no-op, since three will
be no resources to provision, so users need to explicitly create KubeProvision resources to start actively using this new feature.
We also will explicitly require labels to be present for the resource to be provisioned to the Kubernetes cluster, making it
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We also will explicitly require labels to be present for the resource to be provisioned to the Kubernetes cluster, making it
We also will explicitly require labels or label_expressions to be present for the resource to be provisioned to the Kubernetes cluster, making it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we talk about labels on the resource itself, not in the roles.

Comment on lines +317 to +318
As mentioned above adding pattern matching to the `kubernetes_permissions` field is probably something we will need to add in the future, improving
fine-grained control of RBAC users can setup.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some potential drawbacks and security issues that may arise from doing this correct? Teleport will need to be "smart" and act as the arbiter of access because Kubernetes doesn't natively support pattern matching natively.


We will track changes to the KubeProvision resources in the Audit log. There will be three new events:

* `kube.provision.create`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this only be emitted what a KubeProvision resource is created? Should any implicit creation of ephemeral KubeProvision resource from a role also have an audit trail?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned earlier, ephemeral KubeProvision is just a logical abstraction and nothing actually is created. I don't think we need additional audit trail here, because it's already audited by the role manipulation - this auto-provisioning is directly linked to roles.

* `kube.provision.delete`

Issued when a KubeProvision resource is created/updated/deleted accordingly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've touched briefly on compatibility above, but we should include the section as described in RFD 0 and fill in the relevant details and expectations.

Suggested change
## Backward Compatibility
- This is opt-in...
- Older kube agents will not reconcile any KubeProvision or Roles with kubernetes_provisions defined
- Helm chart changes
- ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this section so it's more explicit, but now it's just all opt-in. I've changed approach to impersonate "system:masters" so no changes to charts are needed and all agents are capable of provisioning, as long as user created kubeProvision or a role with kubernetes_permissions (and user didn't remove access to "system:masters", but that's an edge case and they probably don't want us to provision in that situation).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 54dc758

Comment on lines +77 to +78
If `kubernetes_permissions` field is set in the `allow` part of the Teleport role, fields `kubernetes_resources`, `kubernetes_users` and `kubernetes_groups`
should not be set, we will return an error if user will try to create a role mixing those fields. In the future we might relax this rule.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this block us from adding kubernetes_permissions to the editor role?

Also, I think we are populating kubernetes_resources in v7 when it's empty. Maybe a more accurate statement would be "kubernetes_permissions requires kubernetes_resources to be the default wildcard rule.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean access role (editor is about editing Teleport resources)?
I think we probably shouldn't put kubernetes_permissions in there, kubernetes_resources are present there yes, but this is more because as you said we populate these if you have kubernetes_labels/expressions set on a role and access role has wildcard labels for Kubernetes so you could see all clusters. If we add kubernetes_permissions by default to the access role it means we will always create teleport:access cluster role, which might be unexpected for users who didn't plan to use kube auto-provisioning, although it might an interesting possibility to explore 🤔 - currently access role doesn't actually give you access to the Kubernetes resources, because you need to have some groups as well, if we add kubernetes_permission it will mean you would have actual access to everything inside the Kubernetes clusters. Maybe start smaller and add this later?

Generally idea about forbidding mixing of the fields is to lower confusion among the users - basically you have two modes:

  1. Use groups/users and kubernetes_resources - user is responsible for creating Kubernetes RBAC in that case.
  2. Use kubernetes_permissions - Teleport is responsible for Kubernetes RBAC in that case.

Maybe we can get away with the default wildcard rule, but I think it would be better if users didn't even see kubernetes_resources field if they are using kubernetes_permissions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can get away with the default wildcard rule, but I think it would be better if users didn't even see kubernetes_resources field if they are using kubernetes_permissions

I agree but I think this is not how it's implemented today. Hwne kubernetes_resources is not set, the wildcar item is added to the list and this is what you get when you tctl get role/... .

Copy link
Copy Markdown
Contributor Author

@AntonAM AntonAM Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actually we should implicitly convert kubernetes_permissions to kubernetes_resources when doing permissions check on the Teleport side. Doing wildcard kubernetes resources can mess things up if user has other roles, that rely on kubernetes_resources. I'm not even sure us defaulting to wildcard resources for a role is the right behaviour - imagine situation, user has kuberole1 that has powerful wide-access kubernetes_groups but then is limited by the kubernetes_resources. Now admin wants to give access to other Kubernetes clusters, so they create role kuberole2 with additional kubernetes_labels so the user can see the clusters, but they forget to add kubernetes_resources; we then create wildcard resources for them and now the user with that role has access to everything that powerful kubernetes_groups allows.

So we either need to forbid mixing of kubernetes_permissions roles and kubernetes_resources, which I think is not a good UX; or we translate permissions to kubernetes_resources analog just-in-time when doing permissions checks, which is more complicated, but much better UX. And we don't create default wildcard kubernetes_resources if kubernetes_permissions are present.

Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we don't create default wildcard kubernetes_resources if kubernetes_permissions are present.

I missed this part initially when reading the comment. I think doing conditional defaults like this would cause inconsistencies and surprise behaviour changes.

Case 1:

  • I create a simple role v7 without kube_resources nor kube_permissions
  • the resource in the backend has the wildcard
  • I tctl edit the resource to add kube_permission
  • I now have a resource with both kube_permission and wildcard kube_resources

Case 2:

  • I create a simple role v7 without kube_resources nor kube_permissions
  • the resource in the backend has the wildcard
  • I add the kube_permission field to the local manifest and tctl create -f
  • I now have a resource with kube_permission and without kube_resources

Such "smart" default behaviours are a big issue in Teleport today because we evaluate defaults before writing the resource to the backend, so they are persisted. We end up with an hysteresis and this messes with people, and especially infrastructure as code tooling.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the way I propose you won't be able to tctl edit resource to add kube_permissions without explicitly deleting kube_resources wildcard. In case 2 as I understand you mean that you would overwrite resource, which to me also seems like a deliberate action - if you first create role without anything and then you also try to create it with a new yaml file it's a bit weird, but you end up with what you wrote to the yaml file. If you first get yaml of the resource and edit it to add kube_permissions you again need to explicitly delete kube_permissions from there or it would return an error.

And as I mentioned, I'm not sure that creating wildcard by default for kube_resources is the right thing to do - a role with wildcard kube_resources is a dangerous role, it can screw up permissions if combined with another role that has wide-access kubernetes_groups. Though removing it would be a breaking change. But it's not linked to kube_permissions directly if we don't allow mixing of the fields, and it's something we can think about later.

a much simpler process, since user will be able to setup RBAC completely in Teleport if they want. We will also be able
to amend our documentation to make our guides simpler, which will also help users to start with Teleport more easily.

#### UX
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest UX challenge is that this adds yet another way of configuring Kuberntes RBAC. Most users also don't understand how Teleport and kube RBAC interact.

IIUC, a user wanting to control kube RBAC would have 3 approaches:

  • create users/groups with restricted rights and use kubernetes_labels + kubernetes_users/groups
  • create admin RBAC and use kubernetes_labels + kubernetes_resources to restrict what users can do
  • use the new shiny kubernetes_permissions and let the Teleport agent create the RBAC you need

User were already confused when we had 2 ways of configuring RBAC, but now they will have 3 😅
To reduce the confusion, I think we should have a dedicated documentation page comparing the 3 approaches and advising users to use kubernetes_permissions if they don't have an existing setup.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep we can do that. We also can start put emphasis on the new kubernetes_permissions in our guides, examples etc so it's easier for the new users and describe usage of groups + kubernetes_resources as a more advanced use case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the new shiny kubernetes_permissions and let the Teleport agent create the RBAC you need

This won't cover all cases that we currently support like with deny clauses, resource filtering...
I wonder if it will be simple to explain how everything works and when you should use one or two together.

Comment thread rfd/0174-kube-rbac-bootstrap.md
Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
Comment on lines +69 to +71
Presence of this field will signal that user wants Teleport to auto-provision these permissions into the Kubernetes clusters.
Teleport will translate permissions defined in that field into Kubernetes RBAC definitions and will automatically create roles and bindings
on the target Kubernetes clusters (as specified by the labels). If `namespaces` field includes '*' Teleport will create ClusterRole/ClusterRoleBinding
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we avoid conflicts between two agents in the same cluster with identical role names? Hash the cluster name and prefix/suffix the kube RBAC resources?

This is definitely a thing people are doing and this will make adoption harder for them as they tend to have consistent role naming rules across their multiple Teleport clusters.

Copy link
Copy Markdown
Contributor Author

@AntonAM AntonAM Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean case when Kubernetes cluster is enrolled in two Teleport clusters? Yeah in that case we can a collision 🤔
We can add prefix, so it would be something like teleport:telecluster1:kuberole1, I think we should just use cluster names instead of a hash. It does increase role names, unfortunately, but I guess it's not that bad since they are not supposed to be used manually by the users. I think it's still better for users if they are readable though.
(edited: Removed "don't" from "I don't think we should..." 🤦 )

Existing users will not see any changes and don't need to migrate anything. To use the new feature clients will need to explicitly
set the new `kubernetes_permissions` field in the role definition.

### Improving Day 2 experience - Automatically provisioning RBAC resources to Kubernetes clusters.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this is needed now that we have the kubernetes_permission field which terribly powerful and will solve most of our getting started issues.

I feel like if you have a use-case requiring the kube_provision resource, ArgoCD/fleet management/provisioning tools are a better fit than Teleport anyway as your use-case is complex and you are an advanced user.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be a use case for more advanced users. We had direct request from some customers for that, so looks like they see it as a better approach than using their provisioning tools 🤷

AntonAM and others added 5 commits July 31, 2024 14:18
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated

Presence of this field will signal that user wants Teleport to auto-provision these permissions into the Kubernetes clusters.
Teleport will translate permissions defined in that field into Kubernetes RBAC definitions and will automatically create roles and bindings
on the target Kubernetes clusters (as specified by the labels). If `namespaces` field includes '*' Teleport will create ClusterRole/ClusterRoleBinding
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If namespaces field includes '*' Teleport will create ClusterRole/ClusterRoleBinding

I believe that using * should not imply ClusterRole/ClusterRoleBinding. First, some resources are cluster-wide and need to be distinguished from resources that can be namespaced. Second, * typically means all namespaces, but some-team-* only matches a subset but both include a matcher.

Therefore, from my perspective, namespaces: ['*'] should indicate creating a role and rolebinding in every namespace. If namespaces is omitted, it should signify the creation of a clusterrole.

Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating a role and rolebinding in every namespace

Oh no, namespace watcher and race-conditions incoming.

I am a user that can create namespaces and create resources in the namespace *. If I kubectl create ns foo || kubectl apply -f manifest.yaml -n foo the second command will likely fail as the RBAC is not there yet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, namespace watcher and race-conditions incoming.

@hugoShaka, you will always need to monitor namespaces, and the issues will persist. There's no guarantee that the namespace specified in kubernetes_permissions will exist at any given moment, but it may exist at a later time. Therefore, a namespace watcher is essential, and we must reconcile based on that, even if you don't use the wildcard.

If your role specifies namespace: abc and you run kubectl create ns abc || kubectl apply ..., you will encounter the same problem. For me, translating namespace: * into a cluster role is problematic because it doesn’t align well with specifying cluster-wide resources. Cluster-wide resources don’t have a namespace, so this approach seems inconsistent.

Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
on the target Kubernetes clusters (as specified by the labels). If `namespaces` field includes '*' Teleport will create ClusterRole/ClusterRoleBinding
pair on the Kubernetes cluster, otherwise Role/Binding pair will be created in the each listed namespace.

Subject for the bindings will be a group named `teleport:*RoleName*`, e.g. for a Teleport role `kube-role` the resulting
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Subject for the bindings will be a group named `teleport:*RoleName*`, e.g. for a Teleport role `kube-role` the resulting
Subject for the bindings will be a group named `teleport:md5(clusterName,roleName)`, e.g. for a Teleport role `kube-role` the resulting

We should hash the role and keep it under 63 chars because kube forces us to

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then we can do teleport:clusterName:roleName by default and if it exceeds 63 characters we switch to teleport:hash(clusterName,rolename), it would leave most of times role names to be human readable.

Subject for the bindings will be a group named `teleport:*RoleName*`, e.g. for a Teleport role `kube-role` the resulting
group name will be `teleport:kube-role`. This will allow Kube service to setup correct impersonation group headers based on the roles available to the user.

If `kubernetes_permissions` field is set in the `allow` part of the Teleport role, fields `kubernetes_resources`, `kubernetes_users` and `kubernetes_groups`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be a blocker? If I have 200 roles that use kube_users, kube_groups, kube_resources, why can't I mix them given that kube permissions are always additions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To lessen potential confusion and give us an option to enable it later, if we allow it from the beginning and then find some problems we can't roll it back. If you're existing user with 200 roles you already have complicated system and probably some way of herding your Kubernetes RBAC, feels like mixing approaches might be not the cleanest way. You still can create completely new roles with kubernetes_permissions or convert your existing ones. For newer users it should be less confusing though. And as I said, we can add this functionality later once we see how it's going.

Comment on lines +84 to +86
Pattern matching for resource names that is supported by the `kubernetes_resources` field will not be supported (same as in native Kubernetes RBAC)
by the `kubernetes_permissions` in the
first implementation, but we will be able to add it later. Not allowing pattern matching, though limiting functionality compared to the `kubernetes_resources` field,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we can implement this correctly and safely. First, it would require polling all resources, matching them against the keyword, and updating the role to allow access to a subset. This means we would need to process everything in real time to avoid leaking resources to users, and we would likely encounter role name filter limits.

Given this, I think we will always need a combination of kubernetes_permissions and kubernetes_resources because they function differently. One provisions Kubernetes RBAC, while the other dynamically filters resources. If our intention is for kubernetes_permissions to become kubernetes_resources and enforce filters dynamically, I have concerns about its feasibility.

kubernetes_permissions and kubernetes_resources working together seems a better approach but I wonder if people will understand how to use them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mentioned that we might need additional checks relying on kubernetes_resources. What I had in mind that we could translate something like resourceNames: ['env-*]intoresourceNames: [""]on the Kubernetes cluster role and addkubernetes_resourcesfilter forenv-, so we get all resources from the Kubernetes cluster and filter them on Teleport. We could do that transparently for the user, so they don't have to set up kubernetes_resources` manually.

a much simpler process, since user will be able to setup RBAC completely in Teleport if they want. We will also be able
to amend our documentation to make our guides simpler, which will also help users to start with Teleport more easily.

#### UX
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the new shiny kubernetes_permissions and let the Teleport agent create the RBAC you need

This won't cover all cases that we currently support like with deny clauses, resource filtering...
I wonder if it will be simple to explain how everything works and when you should use one or two together.

Comment thread rfd/0174-kube-rbac-bootstrap.md Outdated
kubernetes_labels:
'env': 'staging'
kubernetes_permissions:
- namespaces: ["main-company-app"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we support namespace matching?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean something like - namespaces: ["env-*"]? I think we can, but not sure if we should include it in the first version, since we don't include pattern matching in the resource names, so maybe just keep it simple at start 🤔

#### Upgrade path

Existing users will not see any changes and don't need to migrate anything. To use the new feature clients will need to explicitly
set the new `kubernetes_permissions` field in the role definition.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't entirely accurate. Users will need to upgrade their agents to a version that supports this feature and upgrade via the Helm chart. If they use the auto-upgrader, the agent won't have the necessary permissions to create, delete, or update (cluster) roles.

Another solution is to impersonate random roles in the cluster to create them, but this isn't always possible. Some customers prevent the agent from impersonating certain groups or users in the cluster with restrictive cluster roles, so this solution may often fail.

Comment on lines +91 to +93
Kubernetes RBAC will be auto-provisioned using the KubeProvisions functionality described in the next section of this RFD. For the best UX any change to
a role that contains `kubernetes_permissions` will trigger auto-provisioning immediately to reduce latency between the moments when role was created
or changed and when clients can use Kubernetes cluster with the updated RBAC.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify the reasoning behind this sentence? What is preventing us from having KubeProvisions auto-provision immediately? Can you expand on this?

A watcher in kube agents would do exactly what is needed and allow real-time synchronization in both directions, ensuring that if someone changes the cluster role in kube, the Teleport version is re-enforced and re-enforce the old version when teleport object changes.

I also find it cumbersome to have two fields defined in completely different objects that operate on the same feature and objects. From my perspective, we should have either kubernetes_permissions or KubeProvisions, but not both. People may not realize they are the same thing and could make mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally KubeProvisions were just about provisioning, so it made sense with the simple approach of a reconciliation loop that we often use. Introduction of roles requires short latency for better UX, that why I added this mentioning of immediate reaction, we also trigger that on any changes to KubeProvisions. But yeah, you are right, we can add watchers for changes on the Kubernetes cluster side and reapply provisions immediately, having real-time enforcement in both directions.

I also find it cumbersome to have two fields defined in completely different objects that operate on the same feature and objects. From my perspective, we should have either kubernetes_permissions or KubeProvisions, but not both. People may not realize they are the same thing and could make mistakes.

I think they are not really the same. And I think it's fine if users logically separate them. Maybe I should remove mentioning of ephemeral KubeProvisions from the RFD, seems it misleadingly shows them as something of a same. kubernetes_provisions logically is a different feature, it can be done through mostly the same code KubeProvisions use, but it's just an implementation details.

will need to make decision and create KubeProvisions resources to actively start using this feature.

In order to be able to reconcile the state, Teleport will need to perform CRUD operations on Kubernetes RBAC resources.
When performing CRUD operations, Teleport will impersonate the "system:masters" group for most cases, but on GKE `cluster-admin`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not always feasible, as customers might block impersonation of system:masters. In my opinion, we should ask users to explicitly grant permissions to edit or update roles and cluster roles, rather than randomly impersonating system:masters. It's clear and users know what to expect from teleport

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ignore those cases, at least just in the beginning (but probably longterm as well). The goal here is with first version to provide benefits for 90% of the users. Clients who have disabled system:masters probably do it because they kinda "don't trust" teleport with too much permissions, in that case giving us permission to create cluster roles and bindings defeats the purpose, because it basically gives full permissions to do anything again. Those clients probably are not the ones who would want us to auto-provision stuff. So I think for at least the first version we can just avoid all that additional complexity with permissions/chart upgrade and delay decision if we should add it in the future.

Comment on lines +106 to +121
```yaml
kind: role
metadata:
name: staging-kube-access
version: v7
spec:
allow:
kubernetes_labels:
'env': 'staging'
kubernetes_permissions:
- namespaces: ["main-company-app"]
- resources: ["pod"]
apiGroups: [""]
verbs: ["get", "list", "watch"]
deny: {}
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```yaml
kind: role
metadata:
name: staging-kube-access
version: v7
spec:
allow:
kubernetes_labels:
'env': 'staging'
kubernetes_permissions:
- namespaces: ["main-company-app"]
- resources: ["pod"]
apiGroups: [""]
verbs: ["get", "list", "watch"]
deny: {}
```
```yaml
kind: role
metadata:
name: staging-kube-access
version: v7
spec:
allow:
kubernetes_labels:
'env': 'staging'
kubernetes_permissions:
- namespaces: ["main-company-app"]
resources: ["pod"]
apiGroups: [""]
verbs: ["get", "list", "watch"]
deny: {}

Was this what you intended? Or did you intend to use `- namespaces: ["main-company-app"]` as a separate entry in the array? 

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I've updated it incorrectly - it should match the definition example above. namespaces is a field under kubernetes_permissions and field rules contains the rules. Updated b01f73d

Copy link
Copy Markdown
Contributor

@hugoShaka hugoShaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Globally LGTM, especially the kubernetes_permissions part.

I'm leaving on PTO soon and won't be able to do another pass, my biggest warnings regarding with this proposal would be:

  • Whether "Improving Day 2 experience" is really needed. I feel like building a provisioning system might not be required for Teleport to achieve its mission of easily providing access to Kube clusters. I know customers asked for this, but at this point they should just use a provisioning system which will provide better experience and support than what we'll build.
  • I think that ns wildcards are a dangerous path. This implies dynamically watching namespaces, there's no way to do this in time without a webhook (and even with a webhook we have races). The implementation will be sub-par and cause friction. I'd vote for keeping this out of the proposal for now and only supporting static namespaces.
  • The last risk is how kubernetes_permissions and kubernetes_resources interact. There are a lot of threads in the PR about this with valid points. There's both the technical complexity of how we'll combine the features, and the cognitive complexity for users to know which one they should use. Maybe naming kubernetes_permissions differently would help but I don't have any good suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kubernetes-access no-changelog Indicates that a PR does not require a changelog entry rfd Request for Discussion size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants