Skip to content

Conversation

@ryankert01
Copy link
Contributor

@ryankert01 ryankert01 commented Nov 15, 2025

Description

I add a param that can determin whether we should check the order or not, release restriction for flaky tests.

Related issues

Closes #58561

Additional information

I ran test 20 times using a simple script that rans python -m pytest python/ray/data/tests/test_execution_optimizer_limit_pushdown.py::test_limit_pushdown_conservative:

  • master: Passed: 18, Failed: 2
  • feature/flaky-test_limit_pushdown_conservative: Passed: 20, Failed: 0

@ryankert01 ryankert01 requested a review from a team as a code owner November 15, 2025 06:13
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the limit pushdown tests by introducing an option to disable ordering checks, which helps in stabilizing flaky tests where the output order is not guaranteed. The main change is in the _check_valid_plan_and_result helper function, which now conditionally checks for order. When order is not checked, it compares the results as sorted lists. I've suggested a minor improvement to use collections.Counter for a more idiomatic and potentially more performant unordered comparison. The application of check_ordering=False to the relevant test cases seems correct and addresses the flakiness issue.

@ryankert01 ryankert01 changed the title Enhance limit pushdown tests with ordering checks [Data][Flaky] test_limit_pushdown_conservative fails due to non-deterministic task ordering Nov 15, 2025
@ryankert01 ryankert01 changed the title [Data][Flaky] test_limit_pushdown_conservative fails due to non-deterministic task ordering [Data][Flaky] fix test_limit_pushdown_conservative fails due to non-deterministic task ordering Nov 15, 2025
@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Nov 15, 2025
Copy link
Contributor

@400Ping 400Ping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ryankert01 ryankert01 force-pushed the feature/flaky-test_limit_pushdown_conservative branch from d4cf9d8 to 033adb0 Compare November 15, 2025 11:28
@ryankert01
Copy link
Contributor Author

thanks for the review @400Ping !
PTAL @owenowenisme

@ryankert01 ryankert01 force-pushed the feature/flaky-test_limit_pushdown_conservative branch from c73cd4c to 838b73f Compare November 16, 2025 19:23
@ryankert01 ryankert01 requested review from a team as code owners November 16, 2025 19:23
@ryankert01 ryankert01 force-pushed the feature/flaky-test_limit_pushdown_conservative branch from 0c388c8 to 89432eb Compare November 16, 2025 19:32
Comment on lines +26 to +27
if check_ordering:
assert actual_result == expected_result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we can't always use rows_same (i.e., always use check_ordering=False)? Are there any tests in this file where the ordering is guaranteed by Ray Data's interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's still some need to check ordering, for example test 6 the ordering is guaranteed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I guess there are a few tests that call sort

In a follow-up PR, would you mind updating all of the other tests in this module where the ordering isn't guaranteed? Seems like most of the tests assume ordering when it's not guaranteed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, I think it'd be good to break up the different test cases into separate test functions for the reasons described here: https://learn.microsoft.com/en-us/dotnet/core/testing/unit-testing-best-practices#avoid-multiple-act-tasks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll open a refactor pr soon!

ryankert01 and others added 4 commits November 17, 2025 07:17
Added check_ordering parameter to _check_valid_plan_and_result function to allow for ordering checks to be optional. Updated multiple test cases to utilize the new parameter.

Signed-off-by: Ryan Huang <[email protected]>

Update test_execution_optimizer_limit_pushdown.py

Signed-off-by: Ryan Huang <[email protected]>

Update python/ray/data/tests/test_execution_optimizer_limit_pushdown.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ryan Huang <[email protected]>

remove unnessary relaxation

Signed-off-by: Ryan Huang <[email protected]>

remove unnessary relaxation

Updated the test to not check ordering in the execution optimizer limit pushdown.

Signed-off-by: Ryan Huang <[email protected]>

Refactor to_hashable function for clarity

Signed-off-by: Ryan Huang <[email protected]>

Fix formatting in test_execution_optimizer_limit_pushdown

Signed-off-by: Ryan Huang <[email protected]>

please linter

Signed-off-by: ryankert01 <[email protected]>
Co-authored-by: You-Cheng Lin <[email protected]>
Signed-off-by: Ryan Huang <[email protected]>
Signed-off-by: ryankert01 <[email protected]>
Removed comment about comparing as multisets when ordering doesn't matter.

Signed-off-by: Ryan Huang <[email protected]>
Signed-off-by: ryankert01 <[email protected]>
Signed-off-by: ryankert01 <[email protected]>
@ryankert01 ryankert01 force-pushed the feature/flaky-test_limit_pushdown_conservative branch from 89432eb to 1a6a21f Compare November 16, 2025 23:17
@owenowenisme owenowenisme added the go add ONLY when ready to merge, run all tests label Nov 17, 2025
Copy link
Member

@owenowenisme owenowenisme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bveeramani bveeramani merged commit 1c37a45 into ray-project:master Nov 17, 2025
7 checks passed
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…eterministic task ordering (ray-project#58655)

## Description
I add a param that can determin whether we should check the order or
not, release restriction for flaky tests.

## Related issues
Closes ray-project#58561

## Additional information

I ran test 20 times using a simple script that rans `python -m pytest
python/ray/data/tests/test_execution_optimizer_limit_pushdown.py::test_limit_pushdown_conservative`:

- master: `Passed: 18, Failed: 2`
- feature/flaky-test_limit_pushdown_conservative: `Passed: 20, Failed:
0`

---------

Signed-off-by: ryankert01 <[email protected]>
Signed-off-by: Ryan Huang <[email protected]>
Co-authored-by: You-Cheng Lin <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
bveeramani pushed a commit that referenced this pull request Nov 24, 2025
…ix ordering assumption (#58746)

## Description
**Split multi-case test function**
- `test_limit_pushdown_conservative` → 10 separate tests (basic fusion,
limit fusion reversed, multiple limit fusion, maprows, mapbatches,
filter, project, sort, complex interweaved operations, and between two
map operators)

**Fixed ordering assumptions**
- Added `check_ordering=False` to union tests (blocks may interleave)
- Added `check_ordering=False` to project test with
`override_num_blocks` (parallel execution)

## Related issues
Related to #58655 

## Additional information

---------

Signed-off-by: ryankert01 <[email protected]>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…eterministic task ordering (ray-project#58655)

## Description
I add a param that can determin whether we should check the order or
not, release restriction for flaky tests.

## Related issues
Closes ray-project#58561

## Additional information

I ran test 20 times using a simple script that rans `python -m pytest
python/ray/data/tests/test_execution_optimizer_limit_pushdown.py::test_limit_pushdown_conservative`:

- master: `Passed: 18, Failed: 2`
- feature/flaky-test_limit_pushdown_conservative: `Passed: 20, Failed:
0`

---------

Signed-off-by: ryankert01 <[email protected]>
Signed-off-by: Ryan Huang <[email protected]>
Co-authored-by: You-Cheng Lin <[email protected]>
Signed-off-by: YK <[email protected]>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…ix ordering assumption (ray-project#58746)

## Description
**Split multi-case test function**
- `test_limit_pushdown_conservative` → 10 separate tests (basic fusion,
limit fusion reversed, multiple limit fusion, maprows, mapbatches,
filter, project, sort, complex interweaved operations, and between two
map operators)

**Fixed ordering assumptions**
- Added `check_ordering=False` to union tests (blocks may interleave)
- Added `check_ordering=False` to project test with
`override_num_blocks` (parallel execution)

## Related issues
Related to ray-project#58655

## Additional information

---------

Signed-off-by: ryankert01 <[email protected]>
Signed-off-by: YK <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…eterministic task ordering (ray-project#58655)

## Description
I add a param that can determin whether we should check the order or
not, release restriction for flaky tests.

## Related issues
Closes ray-project#58561

## Additional information

I ran test 20 times using a simple script that rans `python -m pytest
python/ray/data/tests/test_execution_optimizer_limit_pushdown.py::test_limit_pushdown_conservative`:

- master: `Passed: 18, Failed: 2`
- feature/flaky-test_limit_pushdown_conservative: `Passed: 20, Failed:
0`

---------

Signed-off-by: ryankert01 <[email protected]>
Signed-off-by: Ryan Huang <[email protected]>
Co-authored-by: You-Cheng Lin <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…ix ordering assumption (ray-project#58746)

## Description
**Split multi-case test function**
- `test_limit_pushdown_conservative` → 10 separate tests (basic fusion,
limit fusion reversed, multiple limit fusion, maprows, mapbatches,
filter, project, sort, complex interweaved operations, and between two
map operators)

**Fixed ordering assumptions**
- Added `check_ordering=False` to union tests (blocks may interleave)
- Added `check_ordering=False` to project test with
`override_num_blocks` (parallel execution)

## Related issues
Related to ray-project#58655 

## Additional information

---------

Signed-off-by: ryankert01 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data][Flaky] test_limit_pushdown_conservative fails due to non-deterministic task ordering

4 participants