Skip to content

Conversation

@Sparks0219
Copy link
Contributor

@Sparks0219 Sparks0219 commented Oct 17, 2025

Description

Making DrainRaylet and ShutdownRaylet Fault Tolerant and Idempotent. Added cpp tests for DrainRaylet to verify idempotency and added a python integration test. Not adding cpp/python tests for ShutdownRaylet as it's evidently idempotent and not much point in cpp testing since the callbacks are set in main.cc so would just be re-implementing this logic in node_manager_test.cc.

@Sparks0219 Sparks0219 added the go add ONLY when ready to merge, run all tests label Oct 17, 2025
@Sparks0219 Sparks0219 marked this pull request as ready for review October 17, 2025 22:50
@Sparks0219 Sparks0219 requested a review from a team as a code owner October 17, 2025 22:50
@Sparks0219 Sparks0219 requested review from dayshah and edoakes October 17, 2025 22:50
@ray-gardener ray-gardener bot added the core Issues that should be addressed in Ray Core label Oct 18, 2025
Copy link
Contributor

@dayshah dayshah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but I think @jjyao's pr will conflict with this #57834 and want that to go in before 2.51 branch cut, so gonna wait on merging this until that goes in

return not node["Alive"]
return True

wait_for_condition(node_is_dead, timeout=30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does this test take to finish?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout=30 is no bueno! maybe need to tune some timeout env var?

Copy link
Contributor Author

@Sparks0219 Sparks0219 Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total test takes around 2.5 - 3 seconds on avg to complete? The wait_for_condition itself though is only a millisecond or two, looks like 30 sec even as a failsafe was a bit excessive 😅. I'll reduce it down to 1 sec.

@edoakes
Copy link
Collaborator

edoakes commented Oct 20, 2025

@Sparks0219 Jiajun's PR is merged, can rebase now

Signed-off-by: joshlee <[email protected]>
cursor[bot]

This comment was marked as outdated.

Signed-off-by: joshlee <[email protected]>
@dayshah dayshah merged commit 2bbd13a into ray-project:master Oct 20, 2025
6 checks passed
kamil-kaczmarek pushed a commit that referenced this pull request Oct 20, 2025
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants