-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Csi-attacher Looses Connection to Driver Unix Socket #1875
Comments
Update: jsafrane@ had already filed an issue for this on the external-provisioner: The provisioner exits after 30 minutes of idle. · Issue #1099 · kubernetes-csi/external-provisioner (and even merged a fix into csi-lib-utils ❤️) Once the new sidecars that include the fix are released, we can include them in a release to TLDR the latest versions of csi-attacher (v4.4.2), csi-provisioner (v3.6.2), csi-resizer (v1.9.2) will restart due to a lost connection to driver unix socket when sidecar not in use for ~40 minutes. If you are deploying driver via helm and replace the version tag for each sidecar with an older version, you should not see this issue. You can follow along on Kubernetes Slack in the csi channel in this thread: https://kubernetes.slack.com/archives/C8EJ01Z46/p1702657617011189 Hi @dmitrii-didenko, I was able to isolate this bug the v4.4.2 of the csi-attacher (as well as the latest csi-provisioner and csi-resizer versions). I will cleanup and post my logs by tomorrow. I have tried:
Reproduction steps:
|
The latest patch version of csi-attacher, csi-provisioner, and csi-resizer only had dependency upgrades. This suggests that one of those upgrades might have caused this regression. We are almost finished releasing v1.26.0 of aws-ebs-csi-driver for helm, which includes dependency upgrades. I will look for this restart bug there to see if it still occurring before filing an issue on the external-attacher project. Thank you for raising this issue @dmitrii-didenko! |
Started adding logs here AndrewSirenko/csi-sidecar-container-restart-issue-logs You will find that if we use csi-attacher <= 4.4.1, that the csi-attacher will no longer restart, but that the csi-provisioner sidecar will still restart. Let me know if there is a more helpful way to present the information. I am trying to reproduce on driver v1.26.0 now. Edit: Can confirm this still occurs on driver v1.26.0. See Proof of restart issue for driver v1.26 and latest sidecars |
Thank you very mush for update! So if I got you right the only workaround is to configure the following?
|
@dmitrii-didenko the workarounds are to either:
|
Update, jsafrane@ had already filed an issue for this on the external-provisioner: The provisioner exits after 30 minutes of idle. · Issue #1099 · kubernetes-csi/external-provisioner (and even merged a fix into csi-lib-utils). Once the new sidecars that include the fix are released, we can include them in a release to |
Sig-storage has released new versions of the sidecars. We will include them in our next patch release. |
@AndrewSirenko Thanks! Do you have an ETA for the next patch release? |
Update for Jan 4 evening, EKS-D images are still not out due to an internal blocker. I will submit the release PRs as soon as they are available (merging the release PRs takes some CI time). Update for Jan 4 morning @faganihajizada, EKS-D images are still not out yet as of last night, but hopefully they'll release today and we can push our helm release. Hi @faganihajizada, we're waiting on EKS Distro to release the patched versions of the sidecars during their bi-weekly release. Their ETA is today. We will start the release process as soon as those sidecar images are released. ETA for the helm patch release is most likely tomorrow (jan 4). We will also start the EKS add-on release process today, but our team is not in control about when that will be released (typically a few business days after the helm release). |
The new EKS-D csi sidecar images were released 24 minutes ago. We are starting our Helm release now. |
Helm release is out on our release-1.26 branch. Add-on release will likely* come out sometime on the later half of next week. @faganihajizada |
Thank you @AndrewSirenko 🤝 An engineer in the team tried it, and we got:
Do you have ETA for update https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/Chart.yaml in master? |
Done @faganihajizada , we were waiting on the team to confirm there wouldn't be merge conflicts /close |
@AndrewSirenko: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thank you everybody! |
One last update: the v1.26.1 add-on is released in all regions. Thank you!
|
/kind bug
What happened?
We are observing the issue where csi-attacher sidecar container exits frequently with the following message:
According to this change kubernetes-csi/external-attacher#123 I assume that ebs-plugin container indeed closes socket somehow which causes csi-attacher to be restarted. However, we see no suspicious logs from ebs-plugin at the event time:
What you expected to happen?
csi-attacher should not exit and keep connected via socket
How to reproduce it (as minimally and precisely as possible)?
Unfortunately, we do not have exact steps to reproduce this.
Anything else we need to know?:
We noticed one thing - this happens only on clusters with frequent resizing events (by cluster autoscaler or spot instances changes).
Do we have any options we can enable to collect more information on this?
Environment
The text was updated successfully, but these errors were encountered: