-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EKS]: IAM Roles for Service Accounts enhancements for usage across multiple clusters #1408
Comments
This would be very useful as it is definitely a problem we run into. Would this work cross-account? We often associate service accounts with roles in accounts other than the one in which EKS is running. |
@mikestef9 thanks for raising this. When running large clusters/platforms on top of AWS and requiring IAM there are a few more constraints than just the trust relationship payload size.
|
@gregoryfranklin yes we plan to make cross account roles work with this proposal. @ajohnstone we haven't settled on implementation yet. One idea is to use a CSI driver to retrieve a pod’s OIDC token from the Kubelet and exchange the token for IAM Role credentials from EKS. Another is a new credential provider in all AWS SDKs, but that would be hampered by requiring all clients to update. You manage trust through the EKS API now. The IAM trust policy only is updated once to trust the EKS service principal. |
This might solve a challenge I am currently having with IRSA. There are situations where I have to replace an existing cluster with a new one and then shift traffic to the new cluster, either due to disaster recovery or a change which cant be applied without downtime. Currently this would require all developers to update their trust policy with the new OIDC provider, so that the applications can authenticate from the new cluster. Would this new solution require all roles to be associated with the new cluster or could that be solved somehow when replacing a cluster? |
You would still need to call the EKS API to add the mapping, but this can be automated using IaC tools as part of the new cluster creation process. There would be no need to update a role trust policy when creating a new cluster, and you no longer run into trust policy character limits when you need a role associated with many clusters. |
It sounds like we would still have the same issues as we do today. In my scenario individual development teams setup their own resources and IAM roles with Terraform, this is a separate state from the one which creates the cluster. The manifests are deployed with FluxCD into the cluster. This enables us to replace a cluster over night without any development team being the wiser as the new cluster will pull down the same manifests as the old one does. The issue here today is that each development team would have to update the OIDC provider in their trust policy. Correct me if I am wrong but it sounds like there is still a need in the new proposed solution for each team to associate their IAM roles with service accounts in the new cluster. Basically keeping the same issue that we have today but solved with a different IaC resource. |
If teams are creating the IAM roles with terraform, can they not lookup the cluster id using a data source? We store the current list of OIDC providers in an SSM parameter when we create a cluster. Terraform is therefore able to lookup the appropriate IDs in order to generate the policy. |
With this proposal, yes there would be an EKS API call to make for each cluster instead of a trust policy update. Additionally, there would no longer be a need to annotate a service account with the IAM role Arn. So is what you are looking for more along the lines of some way to trust an IAM role for all clusters in an account, without having to specify each cluster? |
@mikestef9 yeah that is basically it. I do not want to have to inform development teams to rerun their Terraform if I replace the cluster. It would be nice to either configure trust to an account of some sort of regex pattern for the cluster name. |
If a CSI Driver is used, I hope it's not a daemonset. We're quickly running into issues with the number of daemonsets we need to run on nodes that consume resources we can then not use to run workloads (currently at about 7 per node). At large cluster sizes, this is not insignificant where you might end up with 10k workload pods and 28k daemonset pods. If using the vpc-cni-plugin and IP allocations out of a VPC, it's even worse since now you're consuming potentially scarce IPs. We've also run into the trust policy size limits already. Talking from personal experience of dealing with both the AssumeRoleWithWebIdentity & AWS SSO default credential provider chain updates to multiple language SDKs: It is a nice long term change but actually doing it was an absolute nightmare. None of the AWS SDK teams coordinate their work when service teams add new methods leaving all customers in the lurch to deal with it. When you are dealing with 100s of devs and multiple languages this was a giant time sink that AWS was not helpful with. |
This is a must.
IMHO, i can't label the today integration.. where EKS elements are treat as a foreign element of AWS as "production-grade". Does anyone knows a workaround, besides producing resource roles (nodes, cluster, etc) where everyone can access everyone else AWS resources.. ?? Do we have an ETA for this feature ? we kinda struggle.. |
Just faced the same issue. I just stopped a initiative to migrate all our services to the new IRSA as the effort to migrate all assume role policies is too high (additional to the AWS SDK library updates...). We use cloudformation across the place, so its a nightmare to get all our product teams on board to update their roles. Considering possible Disaster Recovery cases when i need to re-create everything or even deploy into another region would not be possible as we cannot assume the roles anymore 😞, there is just no good way to keep the roles synced with possible OIDC provider changes. |
We're running into this as well using ClusterAPI + AWS integration to spin up clusters - the ability to use a regex or wildcard to allow clusters to use existing IRSA roles would be great. |
Yes we also just ran head first into the 4096 character IAM policy size limit trying to scale out by adding more clusters. And we're basically stuck right now until we can do something about this. |
I've also just ran into this issue. So this is a thumbs up from me! |
My team uses ClusterAPI to manage clusters in AWS right now, and we've been discussing an option that does not require any changes to IRSA which some here may find helpful. The general idea is to run all the clusters under a single trust domain or identity provider. The big requirement is that you ensure that namespaces are managed the same way in all clusters under the trust domain. There will be no way for IRSA to distinguish between same named namespaces in clusters under the trust domain. Configure the clusters to use the same service account issuer. You have the option to also make them all use the same signing key, but that's not a requirement. The service account issuer string needs to be a URL that can serve the OIDC discovery documents publicly, such as S3. You also need to set the service account jwks URI to point into the same location. This will ensure that the Extract the discovery documents from the clusters. The Merging the jwks files is only necessary if you continue use cluster specific signing keys, which we recommend since that allows you to evict individual signing keys from the trust domain without affecting other clusters. If you use the same signing key for all clusters though, then you need only copy the OIDC discovery documents from one of the clusters to the hosting solution since all clusters will have identical documents. You also never need to update the jwks file again as you add and remove clusters under the trust domain. Buyer beware on this! Once all of that is set up, you can define a single IAM OIDC provider to represent the trust domain. Point it to the endpoint you set up for hosting the OIDC discovery documents. Now create IAM roles as usual using the new provider. All workloads running in clusters under the same trust domain will be able to use the roles as if they run under a single cluster. Note the point about namespace handling mentioned earlier. It would be great if EKS allowed the service account issuer to be specified because then this same solution could be used. |
I share the same concerns that @phillebaba has expressed: even following IaC best practices updating all the roles trust policies after that a new cluster is provisioned can be pretty complex at scale. EKS clusters should be able to rely on a shared OIDC provider: |
We are in the same situation as @phillebaba and @GaruGaru and would love for it to be possible for a given SA in any cluster in the account to assume the IRSA role automatically, whether that is using a shared OIDC provider, or a more flexible (wildcard) trust policy syntax. It's crucial to be able to do blue/green cluster failovers without a highly coupled, hard-to-scale IaC setup. |
The solution shared by @javanthropus seems to solve the problem as it’s been stated, albeit with a extra few steps such as creating the .well-known/openid-configuration and associated jwks, and then building a method of retrieving the public key half of the service account signing key from EKS control planes to place in the jwks if you want a unique signing key per cluster (which is optimal from a security and management perspective). Currently I’m not entirely sure it is possible to get the public key portion of the signing key from EKS control planes, or using user-managed keys as SA token signing keys. Enabling the retrieval of that information via the AWS API would make this solution possible today. For reference, the pod-identity-webhook repository does document all the steps needed for creating a provider independently of an EKS cluster. I also have a POC repository demonstrating how to generate a signing keypair and use it as the signing key, as well as some IaC to leverage this setup (granted, this was done using RKE2, not EKS, because there’s no obvious way to get the public key from EKS). |
@mikestef9 Any chance you'd be ready and willing to share some more details/design around how this is going to work ? |
Any update about this feature? Thanks |
Instead of calling the EKS API (We have 30+ roles per cluster), can we keep the annotation for the same effect? |
This would have some potentially problematic security implications wouldn't it? If you have permission within kubernetes to modify service accounts in any namespace in any cluster, you could grant access to assume that role. Something outside the cluster needs to be able to restrict by cluster/namespace at least. |
Fair point, I hadn't thought of that, just thinking of a way to automate this for all our regional clusters |
I'm curious if the new EKS Pod Identity Associations are intended to be part of the solution/alternative for this.
|
Is it worth running yet another daemonset I wonder? 🤔
|
And no support for cross-account roles likely requiring application code changes. |
Closing this issue since IAM Roles for Service Accounts enhancements was launched yesterday. Please see announcement details - Please feel free to open a new issue if you have feedback/enhancement requests for the newly launched EKS Pod Identity feature. |
From the docs it is not really clear to me if the pod identity feature is gonna work in a cross-account scenario. Did someone test it already or can give more clarity here? |
Thank you for the feedback. EKS Pod Identity works in cross account scenario. We are in the process of updating EKS user guide to give more clarity on this. We are targeting for the updates to be available in user guide next week or so. |
While the EKS user guide is getting updated, we published a deep dive blog post (link) that covers multiple cross account scenarios. |
We use IRSA to provide IAM roles to workloads in a shared cluster. Tenants provision roles in their accounts and we attach them to their containers to appear on startup, akin to an EC2 instance role. I wouldn't call any of the two currently available modes "cross-account" support. Now, instead of OIDC providers, the tenants need to fill the trust policy of their roles with all these external roles from EKS clusters' accounts, leading to the repeat of the original limitation that spawned this issue. |
@Anomander If EKS were to extend the CreatePodIdentityAssoscation() with an additional parameter, that takes the tenants role as input would it meet your needs?
Once you create the above association, during runtime, the pods would be automatically injected with the temp AWS AWS credentials of role-b, without the need for the application/SDKs to do the (double) role assumption. Would that meet your requirement? |
we have similar challenges for cross-account. I think that that suggestion would work for us. It's the double role assumption that is most problematic for us. |
The need to provision an extra role in the cluster's AWS account would make this more troublesome to administer than the OIDC-provider mechanism. Tenants typically have privileges to provision IAM roles into their own AWS accounts, but not into the cluster's AWS account. So we would continue to consider CreatePodIdentityAssociation() to be non-useful. |
Double assumptions also expose people to the chained role assumption hard 1h TTL which I've had to work around/caused issues many times. |
What goes into the trust policy of
If the plan is to put Additionally, as for @johngmyers, provisioning a role for each unique service account on the platform side is untenable and will quickly run into the 5000 role limit. |
We tend to provision a role per <cluster, serviceaccount> tuple, so that we can scope the permissions to only those resources that need to be accessed by the particular cluster. Unfortunately, the AWS-provided policies aren't good at limiting scope like this. We maintain a Terraform module which keeps and distributes the mapping from cluster names to ODIC provider paths. That and the need to provision exactly one OIDC provider resource per <cluster, tenant-account> tuple are inconveniences. As we use a Kustomize-based technology for managing configuration of the |
I believe the session tags supported by Pod Identity would help alleviate the need for this, correct? |
Does the recommended node role permission restrict what the node can pass as a value for that session tag? Or could someone obtaining node role credentials for one cluster access resources owned by a different cluster by passing the other cluster's name in the session tag when making a AssumeRoleForPodIdentity call? Without these session tags being supportable in self-managed clusters, it's unlikely that the recommended IAM policies for the various workloads would make good use of them. If a policy uses the |
its not the node IAM role that is involved in this, its the EKS auth API
You can see more about session tags here along with an example policy {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AuthorizetoGetSecretValue",
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"secretsmanager:ResourceTag/kubernetes-namespace": "${aws:PrincipalTag/kubernetes-namespace}",
"secretsmanager:ResourceTag/eks-cluster-name": "${aws:PrincipalTag/eks-cluster-name}"
}
}
}
]
} |
Community Note
Tell us about your request
IAM Roles for Service Accounts (IRSA) enables you to associate an IAM role with a Kubernetes service account, and follow the principle of least privilege by giving pods only the AWS API permissions they need, without sharing permissions to all pods running on the same node. This feature works well with a smaller number of clusters, but becomes more difficult to manage as the number of EKS clusters grows, notably:
Given these pain points, EKS is considering a change to the way IRSA works, moving credential vending to the EKS control plane (similar to how ECS and Lambda works). With this change, a trust policy would only need to be updated once to trust a service principal like
eks-pods.amazonaws.com
, then you would call an EKS API to provide the IAM role to service account mapping, ex.We are looking for your feedback on this proposal, and to hear any additional pain points encountered with IRSA today that would not be solved by such a solution.
Are you currently working around this issue?
Creating and managing duplicate roles to be used across multiple clusters
The text was updated successfully, but these errors were encountered: