Skip to content

Conversation

@bveeramani
Copy link
Member

This PR adds a test to verify that DataOpTask handles node failures correctly during execution. To enable this testing, callback seams are added to DataOpTask that allow tests to simulate preemption scenarios by killing and restarting nodes at specific points during task execution.

Summary

  • Add callback seams (block_ready_callback and metadata_ready_callback) to DataOpTask for testing purposes
  • Add has_finished property to track task completion state
  • Create create_stub_streaming_gen helper function to simplify test setup
  • Refactor existing DataOpTask tests to use the new helper function
  • Add new parametrized test test_on_data_ready_with_preemption to verify behavior when nodes fail during execution

Test plan

  • Existing tests pass with refactored code
  • New preemption test validates that on_data_ready handles node failures correctly by testing both block and metadata callback scenarios

Signed-off-by: Balaji Veeramani <[email protected]>
@bveeramani bveeramani requested a review from a team as a code owner October 18, 2025 20:23
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a solid improvement to the test suite for DataOpTask. The introduction of callback seams for testing preemption scenarios is a good pattern, and the refactoring of existing tests using the new create_stub_streaming_gen helper function significantly improves clarity and maintainability. The new preemption test is well-designed, especially with the consideration of disabling object inlining to properly test metadata fetching from a failed node. I've identified a couple of minor issues: a logical error in DataOpTask where a callback receives an incorrect argument, and a redundant line in the new test. Once these are addressed, this will be an excellent contribution.

cursor[bot]

This comment was marked as outdated.

…or.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Balaji Veeramani <[email protected]>
Comment on lines 1090 to 1098
# Ray inlines small objects (including metadata) by storing them directly with
# the object reference itself rather than in the remote node's object store.
# Consequently, when the streaming executor calls `ray.get` on metadata from a
# node that has died, the call succeeds because the inlined metadata is not
# stored in the failed node's object store. To explicitly test the case where
# metadata resides in the object store (and becomes unavailable when the node
# dies), we disable inlining by setting the maximum inline size to 0. This
# simulates scenarios where metadata is too large to inline, which can occur in
# practice when schemas contain many fields.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@israbbani Can you confirm I'm not lying here

@bveeramani bveeramani changed the title Add preemption test for DataOpTask and refactor test utilities [Data] Add preemption test for DataOpTask and refactor test utilities Oct 18, 2025
@bveeramani bveeramani added the go add ONLY when ready to merge, run all tests label Oct 18, 2025
cursor[bot]

This comment was marked as outdated.

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Oct 19, 2025
@bveeramani bveeramani marked this pull request as draft October 20, 2025 16:28
Signed-off-by: Balaji Veeramani <[email protected]>
@bveeramani bveeramani marked this pull request as ready for review October 20, 2025 17:16
Signed-off-by: Balaji Veeramani <[email protected]>
@bveeramani bveeramani requested a review from a team as a code owner October 21, 2025 18:54
Signed-off-by: Balaji Veeramani <[email protected]>
@bveeramani bveeramani enabled auto-merge (squash) October 22, 2025 00:00
@bveeramani bveeramani merged commit 222c22e into master Oct 22, 2025
7 checks passed
@bveeramani bveeramani deleted the add-test branch October 22, 2025 01:11
nrghosh pushed a commit to nrghosh/ray that referenced this pull request Oct 22, 2025
…es (ray-project#57883)

This PR adds a test to verify that DataOpTask handles node failures
correctly during execution. To enable this testing, callback seams are
added to DataOpTask that allow tests to simulate preemption scenarios by
killing and restarting nodes at specific points during task execution.

## Summary
- Add callback seams (`block_ready_callback` and
`metadata_ready_callback`) to `DataOpTask` for testing purposes
- Add `has_finished` property to track task completion state
- Create `create_stub_streaming_gen` helper function to simplify test
setup
- Refactor existing `DataOpTask` tests to use the new helper function
- Add new parametrized test `test_on_data_ready_with_preemption` to
verify behavior when nodes fail during execution

## Test plan
- Existing tests pass with refactored code
- New preemption test validates that `on_data_ready` handles node
failures correctly by testing both block and metadata callback scenarios

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…es (ray-project#57883)

This PR adds a test to verify that DataOpTask handles node failures
correctly during execution. To enable this testing, callback seams are
added to DataOpTask that allow tests to simulate preemption scenarios by
killing and restarting nodes at specific points during task execution.

## Summary
- Add callback seams (`block_ready_callback` and
`metadata_ready_callback`) to `DataOpTask` for testing purposes
- Add `has_finished` property to track task completion state
- Create `create_stub_streaming_gen` helper function to simplify test
setup
- Refactor existing `DataOpTask` tests to use the new helper function
- Add new parametrized test `test_on_data_ready_with_preemption` to
verify behavior when nodes fail during execution

## Test plan
- Existing tests pass with refactored code
- New preemption test validates that `on_data_ready` handles node
failures correctly by testing both block and metadata callback scenarios

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…es (ray-project#57883)

This PR adds a test to verify that DataOpTask handles node failures
correctly during execution. To enable this testing, callback seams are
added to DataOpTask that allow tests to simulate preemption scenarios by
killing and restarting nodes at specific points during task execution.

## Summary
- Add callback seams (`block_ready_callback` and
`metadata_ready_callback`) to `DataOpTask` for testing purposes
- Add `has_finished` property to track task completion state
- Create `create_stub_streaming_gen` helper function to simplify test
setup
- Refactor existing `DataOpTask` tests to use the new helper function
- Add new parametrized test `test_on_data_ready_with_preemption` to
verify behavior when nodes fail during execution

## Test plan
- Existing tests pass with refactored code
- New preemption test validates that `on_data_ready` handles node
failures correctly by testing both block and metadata callback scenarios

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Aydin Abiar <[email protected]>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…es (ray-project#57883)

This PR adds a test to verify that DataOpTask handles node failures
correctly during execution. To enable this testing, callback seams are
added to DataOpTask that allow tests to simulate preemption scenarios by
killing and restarting nodes at specific points during task execution.

## Summary
- Add callback seams (`block_ready_callback` and
`metadata_ready_callback`) to `DataOpTask` for testing purposes
- Add `has_finished` property to track task completion state
- Create `create_stub_streaming_gen` helper function to simplify test
setup
- Refactor existing `DataOpTask` tests to use the new helper function
- Add new parametrized test `test_on_data_ready_with_preemption` to
verify behavior when nodes fail during execution

## Test plan
- Existing tests pass with refactored code
- New preemption test validates that `on_data_ready` handles node
failures correctly by testing both block and metadata callback scenarios

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants