-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webhook not injecting Env Vars or Volumes/VolumeMounts on initial deployment to a new cluster #174
Comments
I am experiencing the same issue. Our cluster is a Kops managed cluster. We deployed a service that had two replicas. We noticed that one of the pod was able to access the s3 bucket but the other wasn't. On investigation, the pod that was not able to access the bucket didn't have |
This looks very similar to an issue we are hitting. Here was our conclusion (courtesy of @StrongMonkey):
@jsilverio22 should be able to share a WIP PR soon. |
In our case, we are probably exacerbating the problem by:
|
In my case, I'm running on EKS, however I'm performing my deployments via Helm charts, where the ServiceAccounts and related annotations are being deployed via the values at the same time (using the charts). I also tried pre-provisioning the ServiceAccount with the annotations, but it did not change the behavior, and lead me to believe it was something else altogether. In a similar situation to @cjellick , my entire cluster provisioning process is being done programmatically via Crossplane. Just to reiterate, I do not believe it's a problem with my configurations, because when I delete the pods after their first instantiation everything works fine, but the initial deployment of the pods do not pick up their IRSA privileges. |
I'm seeing this problem on simple re-deployments. Like a deployment recreates the pods for reasons (like moving nodes) and environment variables simply won't be put in place. I'm running the pod identity webhook on multiple nodes so there should always be one online to respond. |
This is a blocker for us to adopt pod identity (as a switch from IRSA). We create IAM Installing Waiting a minute and restarting Deployment gives us new pods with correct mutations. |
We are use using Argocd to deploy our services and are also looking at pod identity. Our services are deployed using a Helm chart and we added a To try and solve the race condition, we added an Argocd preSync hook on the In an effort to solve this issue, we added a customisation to Argocd and only mark the We verified that the Argocd customisation works by downscaling the ACK EKS controller to zero before creating the Argocd application of the service. When the Argocd application of the service is added to Argocd, all the resources are out of sync and Argocd is waiting for the |
In our case, in a big EKS cluster where ServiceAccount, EKSPodIdentityAssociations and Deployment are created immediately one after another, we noticed this problem happening almost every time. Adding 5 seconds of sleep time before creating deployment resolved the issue in ~90% of cases but it still happens in some cases. |
What happened:
Deploying multiple services on my cluster such as cluster-autoscaler, external-dns, ebs-csi-drivers. On initial deployment the pods do not receive the environment vars, volumes, and volumeMounts.
When I manually delete the affected pods after automated deployment, I get everything as expected from the webhook.
I've followed every possible AWS document on troubleshooting IRSA. I initially thought it could be a race condition post cluster instantiation, but I tested delaying the deployments as long as 10 minutes and the results are the same.
What you expected to happen:
Environment vars, volumes, and volumeMounts are injected into the deployment's pod specs without need for manual deletion of the pods.
How to reproduce it (as minimally and precisely as possible):
Create an EKS cluster, create an IAM OIDC provider, Create a IAM Policy, Create a Role, Attach the Policy to the Role, Create a trust relationship in the role referring to the OIDC provider and the Kubernetes service account, and finally do a helm release with appropriate values to specify the corresponding namespace, service account name, and annotations required for the roleARN to tie it all together.
Anything else we need to know?:
Not that I can think of, but feel free to ask for more :).
Environment:
aws eks describe-cluster --name <name> --query cluster.platformVersion
): eks.3aws eks describe-cluster --name <name> --query cluster.version
): 1.24 (also tested 1.23 with same result)The text was updated successfully, but these errors were encountered: