-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global timer makes litmus create verdict sooner than the AUT is ready causing the node drain test to fail #2098
Comments
noting probable source of those 90s |
@martin-mat, excellent and thanks for the investigation. Environment variables for timeouts: |
Add possibility to change duration of node drain litmus chaos test. This is needed for CNFs with longer startup/shutdown. Additionally, fix litmus waiter code and timeout module. Slight refactor of LitmusManager module. Refs: cnti-testcatalog#2098 Signed-off-by: Martin Matyas <[email protected]>
After thorough analysis of the issue, there were few different issues found:
All fixes in #2102 |
Add possibility to change duration of node drain litmus chaos test. This is needed for CNFs with longer startup/shutdown. Additionally, fix litmus waiter code and timeout module. Slight refactor of LitmusManager module. Refs: #2098 Signed-off-by: Martin Matyas <[email protected]>
Describe the bug
During the node drain test, a global "run-to-completion" timer of 90 seconds seems to be applied. AUTs that are not ready within the 90 sec after the node drain cause the verdict to be an error which makes the node drain test fail even the AUT comes back to operation.
Note, the global timer contradicts #1838 that mentions a timer of 30 mins.
To Reproduce
Steps to reproduce the behavior:
CNTI testsuite 1.3.0
Note the TOTAL_CHAOS_DURATION env value in the ChaosEngine resource showing the 90sec test duration concerned in this ticket:
Expected behavior
The global timer of 1m30sec shall be configurable via env or in the CNF config or relaxed as mentioned in #1838. Note, the node drain impacts all pods running on the drained node, not only the AUT components. Some components (e.g. clusters) need an extended time for an ordered termination and start.
Device (please complete the following information):
$ uname -a
Linux ip-10-0-17-74 6.5.0-1020-aws #20 SMP Wed May 1 16:10:50 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: