Conversation
tigrato
left a comment
There was a problem hiding this comment.
How do we plan to migrate customers that do not have this feature enabled and do not want to have it in the future because they roll out their own kubernetes rbac using a different proccess.
I think we should do something similar to how we roll out kubernetes app discovery. by default for existing clusters it's disabled but for new installations, it's enabled by default.
|
|
||
| ### Exposing the default Kubernetes user facing cluster roles (cluster-admin, view, edit) | ||
|
|
||
| Kubernetes clusters have default user-facing cluster roles: cluster-admin, view, edit. By default they are not usable, because |
There was a problem hiding this comment.
system:masters is the only one that is binded everywhere but GKE doesn't allow it to be used when impersonating requests.
| Currently, we only expose the cluster-admin role on GKE | ||
| installations, since system:masters group is not available for impersonation there. We will expand on this and when | ||
| discovering a cluster/installing kube-agent/creating a service account, we will | ||
| create cluster role bindings for those roles, accordingly linking them to the groups "default-cluster-admin", "default-view" and |
There was a problem hiding this comment.
Instead of automatically linking them, we should instead ship a kubernetes provision entry that does that job for us. THis is good because if customers do not want it, they can simply remove the resource from teleport and kubernetes agents will handle everything.
It's also good for users to be able to understand how it works
There was a problem hiding this comment.
Yep, I've described this option in the alternative. But I think overall it's better to go with the "expose by default" route, since combined with our now available fine-tuning of kubernetes resources access in Teleport roles it can make it easier for the new users and to those users who don't need complicated RBAC setup. We could also depend on that default in our documentation and simplify those as well (which we can't really do if we rely on the kube provision path). But ultimately it's a product decision.
There was a problem hiding this comment.
I think the option with less manually created resources, whether in Teleport or Kubernetes, is going to result in the better UX - especially for Day 0 users. I think we should strive to make Day 0 as easy as possible without requiring users to invoke tctl or learn about new Teleport resources if at all possible.
|
|
||
| ## Security | ||
|
|
||
| The introduction of this functionality does not directly impact the security of Teleport |
There was a problem hiding this comment.
the kube agent already has abitilty to impersonate all groups/users, so they can always impersonate someone in the cluster
KubeProvision doesn't do anything unless users create those resources, so there's no need to do anything here. And with exposing default roles I think we just do that on new install by default. The roles do not force clients to use them by themselves, we basically state that this is what default Teleport installation is. We will provide options to not create the default bindings though, as we do with the rest of Teleport RBAC setup. |
It's the opposite of what I believe is good user experience in this case. Our focus should be on improving the onboarding process for new users, ensuring they grasp how Teleport operates easily. At the same time, we need to offer solutions for existing customers to expand and allow Teleport to handle everything seamlessly. My suggestion is to create a $ cat << EOF > prod-cluster-values.yaml
roles: kube,app,discovery
authToken: 616346f2892f82de6a58f8dd09db78b8
proxyAddr: tele.local:3080
kubeClusterName: test
enableRBACSync: true <-- new setting
labels:
teleport.internal/resource-id: b34f651c-32de-45f6-86dc-ab5173f716c9
enterprise: true
EOF
$ helm install teleport-agent teleport/teleport-kube-agent -f prod-cluster-values.yaml --version 17.0.0-dev \
--create-namespace --namespace testImplementing this will automate the synchronization of default RBAC settings, greatly simplifying the subsequent setup steps where we guide users through configuring For clusters that are already enrolled, users will need to manually enable Making the enrollment process simpler will be a win-win situation. Check out #28276 where our @strideynet was a bit confused (lol). |
Hm, I feel that I propose the same thing functionally. I advocate for enabling it by default on (creating $ cat << EOF > prod-cluster-values.yaml
roles: kube,app,discovery
authToken: 616346f2892f82de6a58f8dd09db78b8
proxyAddr: tele.local:3080
kubeClusterName: test
rbac:
create: true <-- existing setting, defaults to true
bindDefaultRoles: true <-- new setting, also defaults to true
labels:
teleport.internal/resource-id: b34f651c-32de-45f6-86dc-ab5173f716c9
enterprise: true
EOFSo if either |
…netes RBAC to be provisioned.
…hen enrolling Kubernetes clusters.
|
@klizhentas can you take a look, we have a couple of product decisions to make here (mainly if just always expose default kubernetes roles on agent installation as a part of Teleport RBAC resources we create or use KubeProvision feature for it) |
| // DeleteKubeProvision deletes a KubeProvision. | ||
| rpc DeleteKubeProvision(DeleteKubeProvisionRequest) returns (google.protobuf.Empty); | ||
|
|
||
| // DeleteAllKubeProvisions removes all KubeProvisions. |
There was a problem hiding this comment.
DeleteAll is only required for the cache and is not recommended to be exposed in the gRPC layer.
|
@tigrato @hugoShaka @rosstimothy @klizhentas I've updated the RFD to reflect the decision we made to go with auto-provisioning Kubernetes roles from the Teleport roles, instead of default Kubernetes roles exposure, as a Day 1 improvement. |
|
|
||
| Presence of this field will signal that user wants Teleport to auto-provision these permissions into the Kubernetes clusters. | ||
| Teleport will translate permissions defined in that field into Kubernetes RBAC definitions and will automatically create roles and bindings | ||
| on the target Kubernetes clusters (as specified by the labels). If `namespaces` field includes '*' Teleport will create ClusterRole/ClusterRoleBinding |
There was a problem hiding this comment.
as specified by the labels
Is this referring to the kubernetes_labels field of a role? Should we mention this also applies to kubernetes_label_expressions as well?
There was a problem hiding this comment.
Yep, basically we should match the targeted Kubernetes clusters - if user can access it, we should have role provisioned there.
| $ kubectl get clusterRoles -o yaml > raw_cluster_roles.yaml | ||
| $ tctl create -f kube_provision_staging.yaml --import raw_cluster_roles.yaml |
There was a problem hiding this comment.
Can we simplify this flow a little bit? Instead of requiring two yaml files could we pass in the name and labels for the kube_provision resource? What do you think about something like the following (naming could use some further thought)?
kubectl get clusterRoles -o yaml | tctl kube import --name=staging-rbac --labels=env=staging
kube provision "staging-rbac" had been created.There was a problem hiding this comment.
Yep, I think looks good. Should we call it tctl kube import-provision? And also add option to provide input file.
|
|
||
| Even though we will always run the reconciliation loop, by default it will be a no-op, since three will | ||
| be no resources to provision, so users need to explicitly create KubeProvision resources to start actively using this new feature. | ||
| We also will explicitly require labels to be present for the resource to be provisioned to the Kubernetes cluster, making it |
There was a problem hiding this comment.
| We also will explicitly require labels to be present for the resource to be provisioned to the Kubernetes cluster, making it | |
| We also will explicitly require labels or label_expressions to be present for the resource to be provisioned to the Kubernetes cluster, making it |
There was a problem hiding this comment.
Here we talk about labels on the resource itself, not in the roles.
| As mentioned above adding pattern matching to the `kubernetes_permissions` field is probably something we will need to add in the future, improving | ||
| fine-grained control of RBAC users can setup. |
There was a problem hiding this comment.
There are some potential drawbacks and security issues that may arise from doing this correct? Teleport will need to be "smart" and act as the arbiter of access because Kubernetes doesn't natively support pattern matching natively.
|
|
||
| We will track changes to the KubeProvision resources in the Audit log. There will be three new events: | ||
|
|
||
| * `kube.provision.create` |
There was a problem hiding this comment.
Will this only be emitted what a KubeProvision resource is created? Should any implicit creation of ephemeral KubeProvision resource from a role also have an audit trail?
There was a problem hiding this comment.
As I mentioned earlier, ephemeral KubeProvision is just a logical abstraction and nothing actually is created. I don't think we need additional audit trail here, because it's already audited by the role manipulation - this auto-provisioning is directly linked to roles.
| * `kube.provision.delete` | ||
|
|
||
| Issued when a KubeProvision resource is created/updated/deleted accordingly. | ||
|
|
There was a problem hiding this comment.
You've touched briefly on compatibility above, but we should include the section as described in RFD 0 and fill in the relevant details and expectations.
| ## Backward Compatibility | |
| - This is opt-in... | |
| - Older kube agents will not reconcile any KubeProvision or Roles with kubernetes_provisions defined | |
| - Helm chart changes | |
| - ... |
There was a problem hiding this comment.
I can add this section so it's more explicit, but now it's just all opt-in. I've changed approach to impersonate "system:masters" so no changes to charts are needed and all agents are capable of provisioning, as long as user created kubeProvision or a role with kubernetes_permissions (and user didn't remove access to "system:masters", but that's an edge case and they probably don't want us to provision in that situation).
| If `kubernetes_permissions` field is set in the `allow` part of the Teleport role, fields `kubernetes_resources`, `kubernetes_users` and `kubernetes_groups` | ||
| should not be set, we will return an error if user will try to create a role mixing those fields. In the future we might relax this rule. |
There was a problem hiding this comment.
Won't this block us from adding kubernetes_permissions to the editor role?
Also, I think we are populating kubernetes_resources in v7 when it's empty. Maybe a more accurate statement would be "kubernetes_permissions requires kubernetes_resources to be the default wildcard rule.
There was a problem hiding this comment.
Did you mean access role (editor is about editing Teleport resources)?
I think we probably shouldn't put kubernetes_permissions in there, kubernetes_resources are present there yes, but this is more because as you said we populate these if you have kubernetes_labels/expressions set on a role and access role has wildcard labels for Kubernetes so you could see all clusters. If we add kubernetes_permissions by default to the access role it means we will always create teleport:access cluster role, which might be unexpected for users who didn't plan to use kube auto-provisioning, although it might an interesting possibility to explore 🤔 - currently access role doesn't actually give you access to the Kubernetes resources, because you need to have some groups as well, if we add kubernetes_permission it will mean you would have actual access to everything inside the Kubernetes clusters. Maybe start smaller and add this later?
Generally idea about forbidding mixing of the fields is to lower confusion among the users - basically you have two modes:
- Use groups/users and
kubernetes_resources- user is responsible for creating Kubernetes RBAC in that case. - Use
kubernetes_permissions- Teleport is responsible for Kubernetes RBAC in that case.
Maybe we can get away with the default wildcard rule, but I think it would be better if users didn't even see kubernetes_resources field if they are using kubernetes_permissions
There was a problem hiding this comment.
Maybe we can get away with the default wildcard rule, but I think it would be better if users didn't even see kubernetes_resources field if they are using kubernetes_permissions
I agree but I think this is not how it's implemented today. Hwne kubernetes_resources is not set, the wildcar item is added to the list and this is what you get when you tctl get role/... .
There was a problem hiding this comment.
I think actually we should implicitly convert kubernetes_permissions to kubernetes_resources when doing permissions check on the Teleport side. Doing wildcard kubernetes resources can mess things up if user has other roles, that rely on kubernetes_resources. I'm not even sure us defaulting to wildcard resources for a role is the right behaviour - imagine situation, user has kuberole1 that has powerful wide-access kubernetes_groups but then is limited by the kubernetes_resources. Now admin wants to give access to other Kubernetes clusters, so they create role kuberole2 with additional kubernetes_labels so the user can see the clusters, but they forget to add kubernetes_resources; we then create wildcard resources for them and now the user with that role has access to everything that powerful kubernetes_groups allows.
So we either need to forbid mixing of kubernetes_permissions roles and kubernetes_resources, which I think is not a good UX; or we translate permissions to kubernetes_resources analog just-in-time when doing permissions checks, which is more complicated, but much better UX. And we don't create default wildcard kubernetes_resources if kubernetes_permissions are present.
There was a problem hiding this comment.
And we don't create default wildcard kubernetes_resources if kubernetes_permissions are present.
I missed this part initially when reading the comment. I think doing conditional defaults like this would cause inconsistencies and surprise behaviour changes.
Case 1:
- I create a simple role v7 without kube_resources nor kube_permissions
- the resource in the backend has the wildcard
- I tctl edit the resource to add kube_permission
- I now have a resource with both kube_permission and wildcard kube_resources
Case 2:
- I create a simple role v7 without kube_resources nor kube_permissions
- the resource in the backend has the wildcard
- I add the kube_permission field to the local manifest and
tctl create -f - I now have a resource with kube_permission and without kube_resources
Such "smart" default behaviours are a big issue in Teleport today because we evaluate defaults before writing the resource to the backend, so they are persisted. We end up with an hysteresis and this messes with people, and especially infrastructure as code tooling.
There was a problem hiding this comment.
In the way I propose you won't be able to tctl edit resource to add kube_permissions without explicitly deleting kube_resources wildcard. In case 2 as I understand you mean that you would overwrite resource, which to me also seems like a deliberate action - if you first create role without anything and then you also try to create it with a new yaml file it's a bit weird, but you end up with what you wrote to the yaml file. If you first get yaml of the resource and edit it to add kube_permissions you again need to explicitly delete kube_permissions from there or it would return an error.
And as I mentioned, I'm not sure that creating wildcard by default for kube_resources is the right thing to do - a role with wildcard kube_resources is a dangerous role, it can screw up permissions if combined with another role that has wide-access kubernetes_groups. Though removing it would be a breaking change. But it's not linked to kube_permissions directly if we don't allow mixing of the fields, and it's something we can think about later.
| a much simpler process, since user will be able to setup RBAC completely in Teleport if they want. We will also be able | ||
| to amend our documentation to make our guides simpler, which will also help users to start with Teleport more easily. | ||
|
|
||
| #### UX |
There was a problem hiding this comment.
I think the biggest UX challenge is that this adds yet another way of configuring Kuberntes RBAC. Most users also don't understand how Teleport and kube RBAC interact.
IIUC, a user wanting to control kube RBAC would have 3 approaches:
- create users/groups with restricted rights and use
kubernetes_labels+kubernetes_users/groups - create admin RBAC and use
kubernetes_labels+kubernetes_resourcesto restrict what users can do - use the new shiny
kubernetes_permissionsand let the Teleport agent create the RBAC you need
User were already confused when we had 2 ways of configuring RBAC, but now they will have 3 😅
To reduce the confusion, I think we should have a dedicated documentation page comparing the 3 approaches and advising users to use kubernetes_permissions if they don't have an existing setup.
There was a problem hiding this comment.
Yep we can do that. We also can start put emphasis on the new kubernetes_permissions in our guides, examples etc so it's easier for the new users and describe usage of groups + kubernetes_resources as a more advanced use case.
There was a problem hiding this comment.
use the new shiny kubernetes_permissions and let the Teleport agent create the RBAC you need
This won't cover all cases that we currently support like with deny clauses, resource filtering...
I wonder if it will be simple to explain how everything works and when you should use one or two together.
| Presence of this field will signal that user wants Teleport to auto-provision these permissions into the Kubernetes clusters. | ||
| Teleport will translate permissions defined in that field into Kubernetes RBAC definitions and will automatically create roles and bindings | ||
| on the target Kubernetes clusters (as specified by the labels). If `namespaces` field includes '*' Teleport will create ClusterRole/ClusterRoleBinding |
There was a problem hiding this comment.
How do we avoid conflicts between two agents in the same cluster with identical role names? Hash the cluster name and prefix/suffix the kube RBAC resources?
This is definitely a thing people are doing and this will make adoption harder for them as they tend to have consistent role naming rules across their multiple Teleport clusters.
There was a problem hiding this comment.
You mean case when Kubernetes cluster is enrolled in two Teleport clusters? Yeah in that case we can a collision 🤔
We can add prefix, so it would be something like teleport:telecluster1:kuberole1, I think we should just use cluster names instead of a hash. It does increase role names, unfortunately, but I guess it's not that bad since they are not supposed to be used manually by the users. I think it's still better for users if they are readable though.
(edited: Removed "don't" from "I don't think we should..." 🤦 )
| Existing users will not see any changes and don't need to migrate anything. To use the new feature clients will need to explicitly | ||
| set the new `kubernetes_permissions` field in the role definition. | ||
|
|
||
| ### Improving Day 2 experience - Automatically provisioning RBAC resources to Kubernetes clusters. |
There was a problem hiding this comment.
Is this is needed now that we have the kubernetes_permission field which terribly powerful and will solve most of our getting started issues.
I feel like if you have a use-case requiring the kube_provision resource, ArgoCD/fleet management/provisioning tools are a better fit than Teleport anyway as your use-case is complex and you are an advanced user.
There was a problem hiding this comment.
This is supposed to be a use case for more advanced users. We had direct request from some customers for that, so looks like they see it as a better approach than using their provisioning tools 🤷
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
|
|
||
| Presence of this field will signal that user wants Teleport to auto-provision these permissions into the Kubernetes clusters. | ||
| Teleport will translate permissions defined in that field into Kubernetes RBAC definitions and will automatically create roles and bindings | ||
| on the target Kubernetes clusters (as specified by the labels). If `namespaces` field includes '*' Teleport will create ClusterRole/ClusterRoleBinding |
There was a problem hiding this comment.
If
namespacesfield includes '*' Teleport will create ClusterRole/ClusterRoleBinding
I believe that using * should not imply ClusterRole/ClusterRoleBinding. First, some resources are cluster-wide and need to be distinguished from resources that can be namespaced. Second, * typically means all namespaces, but some-team-* only matches a subset but both include a matcher.
Therefore, from my perspective, namespaces: ['*'] should indicate creating a role and rolebinding in every namespace. If namespaces is omitted, it should signify the creation of a clusterrole.
There was a problem hiding this comment.
creating a role and rolebinding in every namespace
Oh no, namespace watcher and race-conditions incoming.
I am a user that can create namespaces and create resources in the namespace *. If I kubectl create ns foo || kubectl apply -f manifest.yaml -n foo the second command will likely fail as the RBAC is not there yet.
There was a problem hiding this comment.
Oh no, namespace watcher and race-conditions incoming.
@hugoShaka, you will always need to monitor namespaces, and the issues will persist. There's no guarantee that the namespace specified in kubernetes_permissions will exist at any given moment, but it may exist at a later time. Therefore, a namespace watcher is essential, and we must reconcile based on that, even if you don't use the wildcard.
If your role specifies namespace: abc and you run kubectl create ns abc || kubectl apply ..., you will encounter the same problem. For me, translating namespace: * into a cluster role is problematic because it doesn’t align well with specifying cluster-wide resources. Cluster-wide resources don’t have a namespace, so this approach seems inconsistent.
| on the target Kubernetes clusters (as specified by the labels). If `namespaces` field includes '*' Teleport will create ClusterRole/ClusterRoleBinding | ||
| pair on the Kubernetes cluster, otherwise Role/Binding pair will be created in the each listed namespace. | ||
|
|
||
| Subject for the bindings will be a group named `teleport:*RoleName*`, e.g. for a Teleport role `kube-role` the resulting |
There was a problem hiding this comment.
| Subject for the bindings will be a group named `teleport:*RoleName*`, e.g. for a Teleport role `kube-role` the resulting | |
| Subject for the bindings will be a group named `teleport:md5(clusterName,roleName)`, e.g. for a Teleport role `kube-role` the resulting |
We should hash the role and keep it under 63 chars because kube forces us to
There was a problem hiding this comment.
Ok, then we can do teleport:clusterName:roleName by default and if it exceeds 63 characters we switch to teleport:hash(clusterName,rolename), it would leave most of times role names to be human readable.
| Subject for the bindings will be a group named `teleport:*RoleName*`, e.g. for a Teleport role `kube-role` the resulting | ||
| group name will be `teleport:kube-role`. This will allow Kube service to setup correct impersonation group headers based on the roles available to the user. | ||
|
|
||
| If `kubernetes_permissions` field is set in the `allow` part of the Teleport role, fields `kubernetes_resources`, `kubernetes_users` and `kubernetes_groups` |
There was a problem hiding this comment.
Why would it be a blocker? If I have 200 roles that use kube_users, kube_groups, kube_resources, why can't I mix them given that kube permissions are always additions.
There was a problem hiding this comment.
To lessen potential confusion and give us an option to enable it later, if we allow it from the beginning and then find some problems we can't roll it back. If you're existing user with 200 roles you already have complicated system and probably some way of herding your Kubernetes RBAC, feels like mixing approaches might be not the cleanest way. You still can create completely new roles with kubernetes_permissions or convert your existing ones. For newer users it should be less confusing though. And as I said, we can add this functionality later once we see how it's going.
| Pattern matching for resource names that is supported by the `kubernetes_resources` field will not be supported (same as in native Kubernetes RBAC) | ||
| by the `kubernetes_permissions` in the | ||
| first implementation, but we will be able to add it later. Not allowing pattern matching, though limiting functionality compared to the `kubernetes_resources` field, |
There was a problem hiding this comment.
I don't believe we can implement this correctly and safely. First, it would require polling all resources, matching them against the keyword, and updating the role to allow access to a subset. This means we would need to process everything in real time to avoid leaking resources to users, and we would likely encounter role name filter limits.
Given this, I think we will always need a combination of kubernetes_permissions and kubernetes_resources because they function differently. One provisions Kubernetes RBAC, while the other dynamically filters resources. If our intention is for kubernetes_permissions to become kubernetes_resources and enforce filters dynamically, I have concerns about its feasibility.
kubernetes_permissions and kubernetes_resources working together seems a better approach but I wonder if people will understand how to use them.
There was a problem hiding this comment.
Yes, I mentioned that we might need additional checks relying on kubernetes_resources. What I had in mind that we could translate something like resourceNames: ['env-*]intoresourceNames: [""]on the Kubernetes cluster role and addkubernetes_resourcesfilter forenv-, so we get all resources from the Kubernetes cluster and filter them on Teleport. We could do that transparently for the user, so they don't have to set up kubernetes_resources` manually.
| a much simpler process, since user will be able to setup RBAC completely in Teleport if they want. We will also be able | ||
| to amend our documentation to make our guides simpler, which will also help users to start with Teleport more easily. | ||
|
|
||
| #### UX |
There was a problem hiding this comment.
use the new shiny kubernetes_permissions and let the Teleport agent create the RBAC you need
This won't cover all cases that we currently support like with deny clauses, resource filtering...
I wonder if it will be simple to explain how everything works and when you should use one or two together.
| kubernetes_labels: | ||
| 'env': 'staging' | ||
| kubernetes_permissions: | ||
| - namespaces: ["main-company-app"] |
There was a problem hiding this comment.
will we support namespace matching?
There was a problem hiding this comment.
You mean something like - namespaces: ["env-*"]? I think we can, but not sure if we should include it in the first version, since we don't include pattern matching in the resource names, so maybe just keep it simple at start 🤔
| #### Upgrade path | ||
|
|
||
| Existing users will not see any changes and don't need to migrate anything. To use the new feature clients will need to explicitly | ||
| set the new `kubernetes_permissions` field in the role definition. |
There was a problem hiding this comment.
This isn't entirely accurate. Users will need to upgrade their agents to a version that supports this feature and upgrade via the Helm chart. If they use the auto-upgrader, the agent won't have the necessary permissions to create, delete, or update (cluster) roles.
Another solution is to impersonate random roles in the cluster to create them, but this isn't always possible. Some customers prevent the agent from impersonating certain groups or users in the cluster with restrictive cluster roles, so this solution may often fail.
| Kubernetes RBAC will be auto-provisioned using the KubeProvisions functionality described in the next section of this RFD. For the best UX any change to | ||
| a role that contains `kubernetes_permissions` will trigger auto-provisioning immediately to reduce latency between the moments when role was created | ||
| or changed and when clients can use Kubernetes cluster with the updated RBAC. |
There was a problem hiding this comment.
Could you please clarify the reasoning behind this sentence? What is preventing us from having KubeProvisions auto-provision immediately? Can you expand on this?
A watcher in kube agents would do exactly what is needed and allow real-time synchronization in both directions, ensuring that if someone changes the cluster role in kube, the Teleport version is re-enforced and re-enforce the old version when teleport object changes.
I also find it cumbersome to have two fields defined in completely different objects that operate on the same feature and objects. From my perspective, we should have either kubernetes_permissions or KubeProvisions, but not both. People may not realize they are the same thing and could make mistakes.
There was a problem hiding this comment.
Originally KubeProvisions were just about provisioning, so it made sense with the simple approach of a reconciliation loop that we often use. Introduction of roles requires short latency for better UX, that why I added this mentioning of immediate reaction, we also trigger that on any changes to KubeProvisions. But yeah, you are right, we can add watchers for changes on the Kubernetes cluster side and reapply provisions immediately, having real-time enforcement in both directions.
I also find it cumbersome to have two fields defined in completely different objects that operate on the same feature and objects. From my perspective, we should have either kubernetes_permissions or KubeProvisions, but not both. People may not realize they are the same thing and could make mistakes.
I think they are not really the same. And I think it's fine if users logically separate them. Maybe I should remove mentioning of ephemeral KubeProvisions from the RFD, seems it misleadingly shows them as something of a same. kubernetes_provisions logically is a different feature, it can be done through mostly the same code KubeProvisions use, but it's just an implementation details.
| will need to make decision and create KubeProvisions resources to actively start using this feature. | ||
|
|
||
| In order to be able to reconcile the state, Teleport will need to perform CRUD operations on Kubernetes RBAC resources. | ||
| When performing CRUD operations, Teleport will impersonate the "system:masters" group for most cases, but on GKE `cluster-admin` |
There was a problem hiding this comment.
This is not always feasible, as customers might block impersonation of system:masters. In my opinion, we should ask users to explicitly grant permissions to edit or update roles and cluster roles, rather than randomly impersonating system:masters. It's clear and users know what to expect from teleport
There was a problem hiding this comment.
I think we can ignore those cases, at least just in the beginning (but probably longterm as well). The goal here is with first version to provide benefits for 90% of the users. Clients who have disabled system:masters probably do it because they kinda "don't trust" teleport with too much permissions, in that case giving us permission to create cluster roles and bindings defeats the purpose, because it basically gives full permissions to do anything again. Those clients probably are not the ones who would want us to auto-provision stuff. So I think for at least the first version we can just avoid all that additional complexity with permissions/chart upgrade and delay decision if we should add it in the future.
| ```yaml | ||
| kind: role | ||
| metadata: | ||
| name: staging-kube-access | ||
| version: v7 | ||
| spec: | ||
| allow: | ||
| kubernetes_labels: | ||
| 'env': 'staging' | ||
| kubernetes_permissions: | ||
| - namespaces: ["main-company-app"] | ||
| - resources: ["pod"] | ||
| apiGroups: [""] | ||
| verbs: ["get", "list", "watch"] | ||
| deny: {} | ||
| ``` |
There was a problem hiding this comment.
| ```yaml | |
| kind: role | |
| metadata: | |
| name: staging-kube-access | |
| version: v7 | |
| spec: | |
| allow: | |
| kubernetes_labels: | |
| 'env': 'staging' | |
| kubernetes_permissions: | |
| - namespaces: ["main-company-app"] | |
| - resources: ["pod"] | |
| apiGroups: [""] | |
| verbs: ["get", "list", "watch"] | |
| deny: {} | |
| ``` | |
| ```yaml | |
| kind: role | |
| metadata: | |
| name: staging-kube-access | |
| version: v7 | |
| spec: | |
| allow: | |
| kubernetes_labels: | |
| 'env': 'staging' | |
| kubernetes_permissions: | |
| - namespaces: ["main-company-app"] | |
| resources: ["pod"] | |
| apiGroups: [""] | |
| verbs: ["get", "list", "watch"] | |
| deny: {} |
Was this what you intended? Or did you intend to use `- namespaces: ["main-company-app"]` as a separate entry in the array?
There was a problem hiding this comment.
Nope, I've updated it incorrectly - it should match the definition example above. namespaces is a field under kubernetes_permissions and field rules contains the rules. Updated b01f73d
There was a problem hiding this comment.
Globally LGTM, especially the kubernetes_permissions part.
I'm leaving on PTO soon and won't be able to do another pass, my biggest warnings regarding with this proposal would be:
- Whether "Improving Day 2 experience" is really needed. I feel like building a provisioning system might not be required for Teleport to achieve its mission of easily providing access to Kube clusters. I know customers asked for this, but at this point they should just use a provisioning system which will provide better experience and support than what we'll build.
- I think that ns wildcards are a dangerous path. This implies dynamically watching namespaces, there's no way to do this in time without a webhook (and even with a webhook we have races). The implementation will be sub-par and cause friction. I'd vote for keeping this out of the proposal for now and only supporting static namespaces.
- The last risk is how kubernetes_permissions and kubernetes_resources interact. There are a lot of threads in the PR about this with valid points. There's both the technical complexity of how we'll combine the features, and the cognitive complexity for users to know which one they should use. Maybe naming
kubernetes_permissionsdifferently would help but I don't have any good suggestions.
This PR adds RFD for improving RBAC setup capabilities of Teleport.
Context: Implements #39628 and also addressing Sasha's comments about improving Day 1 UX.
Rendered version