-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky Handle EBS Ack timer test for windows #3904
Conversation
Just curious, do we know the root cause of the timing issue with Windows exactly? Is this expected behavior, or is it concerning that for Windows the EBS ResourceAttachment may not be removed from the local map of attachments within Agent's Docker task state engine even after double the specified timeout amount? (i.e., expiresAt is set to 5ms, but the ResourceAttachment is still not removed after 10ms) |
This is for sure not the expected behavior, I suspect that while we're handling the attachment, we're also already sleeping for 10 ms (or have started to sleep). In a real world situation, I don't think 5ms would be an ideal timeout/wait duration for an Ack (mainly since it'll take a bit to actually attach an EBS volume onto the instance). Here's also an example of where the tests is being flaky -> https://github.com/aws/amazon-ecs-agent/actions/runs/6164709032/job/16731863418 Although thinking about it as well as per offline discussion with @singholt , it might be better to just increase the sleep timer to 30ms or something around that ballpark. We don't want to run into a situation where it never gets removed and we're stuck in this forever loop. |
Thanks for the explanation! Proposed change to increase sleep timer in this test sounds good to me. I suggest maybe adding a code comment adjacent to the sleep timer providing context from your explanation as to why we chose 30ms. |
Summary
The goal of this PR is to un-flake the
TestHandleEBSAckTimeout
test for windows due to timing issue. The current mitigation is to retry the windows unit test until it passes.Implementation details
TestHandleEBSAckTimeout
will now continuously sleep for 5 ms and iterate until the EBS ResourceAttachment has been removed from the local map of attachments within Agent's Docker task state engine once it hits the ack timeout.Testing
Unit test
Created a dummy PR that runs
TestHandleEBSAckTimeout
100x -> mye956#53Description for the changelog
Fix flaky EBS watcher test for windows.
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.