-
Notifications
You must be signed in to change notification settings - Fork 7k
[core] Make DrainRaylet + ShutdownRaylet Fault Tolerant #57861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Make DrainRaylet + ShutdownRaylet Fault Tolerant #57861
Conversation
Signed-off-by: joshlee <[email protected]>
dayshah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return not node["Alive"] | ||
| return True | ||
|
|
||
| wait_for_condition(node_is_dead, timeout=30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how long does this test take to finish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout=30 is no bueno! maybe need to tune some timeout env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total test takes around 2.5 - 3 seconds on avg to complete? The wait_for_condition itself though is only a millisecond or two, looks like 30 sec even as a failsafe was a bit excessive 😅. I'll reduce it down to 1 sec.
|
@Sparks0219 Jiajun's PR is merged, can rebase now |
Signed-off-by: joshlee <[email protected]>
Signed-off-by: joshlee <[email protected]>
Signed-off-by: joshlee <[email protected]> Signed-off-by: Kamil Kaczmarek <[email protected]>
…57861) Signed-off-by: joshlee <[email protected]> Signed-off-by: xgui <[email protected]>
Signed-off-by: joshlee <[email protected]> Signed-off-by: elliot-barn <[email protected]>
…57861) Signed-off-by: joshlee <[email protected]>
…57861) Signed-off-by: joshlee <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
Description
Making DrainRaylet and ShutdownRaylet Fault Tolerant and Idempotent. Added cpp tests for DrainRaylet to verify idempotency and added a python integration test. Not adding cpp/python tests for ShutdownRaylet as it's evidently idempotent and not much point in cpp testing since the callbacks are set in main.cc so would just be re-implementing this logic in node_manager_test.cc.