Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import ray
from ray.data import Dataset
from ray.data._internal.logical.interfaces import Plan
from ray.data._internal.util import rows_same
from ray.data.block import BlockMetadata
from ray.data.datasource import Datasource
from ray.data.datasource.datasource import ReadTask
Expand All @@ -19,8 +20,13 @@ def _check_valid_plan_and_result(
expected_plan: Plan,
expected_result: List[Dict[str, Any]],
expected_physical_plan_ops=None,
check_ordering=True,
):
assert ds.take_all() == expected_result
actual_result = ds.take_all()
if check_ordering:
assert actual_result == expected_result
Comment on lines +26 to +27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we can't always use rows_same (i.e., always use check_ordering=False)? Are there any tests in this file where the ordering is guaranteed by Ray Data's interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's still some need to check ordering, for example test 6 the ordering is guaranteed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I guess there are a few tests that call sort

In a follow-up PR, would you mind updating all of the other tests in this module where the ordering isn't guaranteed? Seems like most of the tests assume ordering when it's not guaranteed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, I think it'd be good to break up the different test cases into separate test functions for the reasons described here: https://learn.microsoft.com/en-us/dotnet/core/testing/unit-testing-best-practices#avoid-multiple-act-tasks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll open a refactor pr soon!

else:
assert rows_same(pd.DataFrame(actual_result), pd.DataFrame(expected_result))
assert ds._plan._logical_plan.dag.dag_str == expected_plan

expected_physical_plan_ops = expected_physical_plan_ops or []
Expand Down Expand Up @@ -83,6 +89,7 @@ def f2(x):
ds,
"Read[ReadRange] -> Limit[limit=5] -> Project[Project]",
[{"id": i} for i in range(5)],
check_ordering=False,
)

# Test 6: Limit should stop at Sort operations (AllToAll)
Expand Down