-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failling Test: [sig-storage] Flexvolume expansion tests are flaky #71470
Comments
/milestone v1.13 I am addind this to 1.13 milestone as critical-urgent until investigated and triaged otherwise. The Code freeze deadline for the fix is EOD tomorrow, 11/28 |
@gnufied yes I will look at it. At first glance it seems to be timing This is what the test does By("Waiting for file system resize to finish")
pvc, err = waitForFSResize(pvc, c)
Expect(err).NotTo(HaveOccurred(), "while waiting for fs resize to finish")
pvcConditions := pvc.Status.Conditions
Expect(len(pvcConditions)).To(Equal(0), "pvc should not have conditions") The Unfortunately we don't log what condition was there on the PVC, so unsure what was left uncleared. The kubelet updates |
@aniket-s-kulkarni I am wondering if kubelet logs have some clues about why condition update failed. I do agree that if |
I think the root cause of this issue is - same PVC is being considered twice for resizing while fs resizing was pending on the node. The flow is like this:
Here is how we know that volume was considered twice for resize:
This flake was not introduced by flexvolume change afaict and existed before. I think timing issue with flex is just triggering it. All resize operation are idempotent and expand controller anyways rejects the resize request by comparing PV size and hence it should not be too bad. In worst case, this WILL cause a condition to be added to the PVC unnecessarily (and that is what is happening here). I will make a fix for this. But IMO - this should not be release blocker. |
Thanks for digging in to this @gnufied. Based on #71434 (comment) we will move this to v1.14 and document it as a known issue for Flex resize. /milestone v1.14 |
Fix tags: /kind flake |
The condition exists as a messaging to the user about stage of volume resizing, but there aren't any actions taken based on that. So - user will not have any usage issues with PVC or anything, guess just the messaging will be kinda confusing(when resizing operation is over, to end user it will appear as it isn't). Also - the issue itself is kinda rare. For example, this test flaked only twice between 11/13 and 11/27, so most users will most likely not run into this. But nontheless - I am working on a fix. |
Thanks @gnufied this is now moved to 1.14 |
/assign |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Hey @gnufied , in terms of v1.14 release, how close is the fix for this / what's the current status? |
We missed fixing this in v1.14. We are working on a fix that will land in 1.15. |
/remove-priority critical-urgent |
/milestone v1.15 |
@gnufied: You must be a member of the kubernetes/kubernetes-milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@mariantalla: You must be a member of the kubernetes/kubernetes-milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This should be fixed now.. /close |
@gnufied: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is a child issue of #71434
For example - https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-slow/21996
/sig storage
@aniket-s-kulkarni will you be able to look into this?
The text was updated successfully, but these errors were encountered: