-
Notifications
You must be signed in to change notification settings - Fork 7k
[Data] Make test_dataset_throughput deterministic and refactor throughput stats
#58693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to make test_dataset_throughput deterministic by increasing the workload and introducing a tolerance for the throughput assertions. The changes look good and should help improve the test's stability. I've added a couple of minor style suggestions to align a new variable name with PEP 8 conventions.
08f8c78 to
7f86e78
Compare
python/ray/data/tests/test_stats.py
Outdated
| ray.init(num_cpus=2) | ||
|
|
||
| f = dummy_map_batches_sleep(0.01) | ||
| f = dummy_map_batches_sleep(0.03) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A shorter sleep time is better because it reduces the execution time. However, we choose 0.03 instead of 0.02 because using 0.02 resulting in 1 failure during 20 test runs
|
@owenowenisme @bveeramani PTAL, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR decreases the likeliness that this test fails, but ultimately, the test still relies on nondeterministic timing. It's also brittle because it uses regex that can break with minor formatting changes.
Rather than adjusting the parameters, could you rewrite this test as a unit test?
Separately, I realized that the "per node" throughputs actually represent the per-task throughput. Based on this, I think we should:
- Remove the "per node throughput" for the "Dataset throughput" section, because the average per-task throughput across all operators isn't really useful, and
- Rename "per node throughput" to "per task throughput" in the "Operator throughput" sections
The tests could look something like this:
def test_dataset_throughput_calculation():
"""Test throughput calculations using mock block stats."""
from ray.data._internal.stats import DatasetStats
from ray.data.block import BlockStats, BlockExecStats
# Helper to create minimal block stats with only timing fields
def create_block_stats(start_time, end_time, num_rows):
exec_stats = BlockExecStats()
exec_stats.start_time_s = start_time
exec_stats.end_time_s = end_time
exec_stats.wall_time_s = end_time - start_time
return BlockStats(
num_rows=num_rows,
size_bytes=None,
exec_stats=exec_stats
)
# Simulate 3 overlapping blocks
blocks = [
create_block_stats(0.0, 2.0, 100),
create_block_stats(0.5, 2.5, 100),
create_block_stats(1.0, 3.0, 100),
]
stats = DatasetStats(metadata={"Map": blocks}, parent=None)
summary = stats.to_summary()
# Throughput: total rows / total execution duration
# Total rows = 300
# Duration = max end_time - min start_time = 3.0s
# 300 rows / 3s = 100 rows/s
# TODO: You'll need to expose this as a property so that it's testable.
assert summary.num_rows_per_s == 100
def test_operator_throughput_calculation():
... # A similar unit test. You might need to do some refactoring.
# summary is a OperatorStatsSummary here, not DatasetStatsSummary
# TODO: You'll need to similarly expose this property.
assert summary.num_rows_per_s == 100
assert summary.num_rows_per_task_s == 100|
@dancingactor lemme know if you have any questions. |
|
Thanks for your detailed feedback! I have two questions: 1.My understanding is that we should remove the original 2.
May I confirm that this means we should modify the current |
|
@dancingactor that's right! |
|
Just to confirm, I need to do following things in this PR
|
That sounds right. One note of warning -- |
|
Hey @dancingactor, just following up here. Lemme know if I can provide any info or help to move this along! |
|
Hi @bveeramani, since I am new to ray, I spend some time understanding the context and the codebase. I almost completed the |
6f974a2 to
fabe20e
Compare
|
Hi @bveeramani, could you please advise on how to correctly test the new Currently I try to directly execute |
|
Ah, this sounds like your Ray Core version is out-of-date. Are you building Ray Core from source, or using the |
|
Thanks! I will try the |
Awesome! Lemme know if you run into any problems. Happy to help you out |
|
Hi @bveeramani, tests % pytest /Users/ryanchen/github/ray/python/ray/data/tests/test_stats.py
Test session starts (platform: darwin, Python 3.10.19, pytest 7.4.4, pytest-sugar 0.9.5)
rootdir: /Users/ryanchen/github/ray
configfile: pytest.ini
plugins: docker-tools-3.1.3, sphinx-0.5.1.dev0, forked-1.4.0, anyio-4.11.0, asyncio-0.17.2, sugar-0.9.5, timeout-2.1.0, shutil-1.8.1, lazy-fixtures-1.1.2, rerunfailures-11.1.2, pytest_httpserver-1.1.3, virtualenv-1.8.1, mock-3.14.0, aiohttp-1.1.0
asyncio: mode=auto
timeout: 180.0s
timeout method: signal
timeout func_only: False
collecting ...
python/ray/data/tests/test_stats.py ss✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓s✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 100% ██████████
Results (729.00s):
44 passed
3 skipped |
7f7199b to
7efd4f6
Compare
963260c to
da9e36b
Compare
bveeramani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Left a few comments
| out += "\nDataset memory:\n" | ||
| out += "* Spilled to disk: {}MB\n".format(dataset_mb_spilled) | ||
|
|
||
| # For throughput, we compute both an observed Ray Data dataset throughput |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comments were moved to https://github.com/ray-project/ray/pull/58693/files#diff-4dba40d789c60bfba4ae769f109b39979aa7d6977390329e7e2bb0e666569009R1221-R1226
the comment for "estimated single node" was removed since we removed this part from class Dataset
| node_count_stats["count"], | ||
| ) | ||
| if output_num_rows_stats and self.time_total_s and wall_time_stats: | ||
| # For throughput, we compute both an observed Ray Data operator throughput |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments were moved to
- https://github.com/ray-project/ray/pull/58693/files#diff-4dba40d789c60bfba4ae769f109b39979aa7d6977390329e7e2bb0e666569009R1386-R1388
- https://github.com/ray-project/ray/pull/58693/files#diff-4dba40d789c60bfba4ae769f109b39979aa7d6977390329e7e2bb0e666569009R1396-R1399
…as unit tests 2. Remove the "per node throughput" for the "Dataset throughput" section 3. Rename "per node throughput" to "per task throughput" in the "Operator throughput" sections Signed-off-by: dancingactor <[email protected]>
bveeramani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚢
Signed-off-by: Balaji Veeramani <[email protected]>
test_dataset_throughput deterministic and refactor throughput stats
|
Thanks! Really appreciate your time and guidance for this issue! |
…oughput stats (ray-project#58693) This PR makes three improvements to Ray Data's throughput statistics: 1. **Makes `test_dataset_throughput` deterministic**: The original test was flaky because it relied on actual task execution timing. This PR rewrites it as unit tests (`test_dataset_throughput_calculation` and `test_operator_throughput_calculation`) using mocked `BlockStats` objects, making the tests fast and reliable. 2. **Removes "Estimated single node throughput" from Dataset-level stats**: This metric was misleading at the dataset level since it summed wall times across all operators, which doesn't accurately represent single-node performance. The "Ray Data throughput" metric (total rows / total wall time) remains and provides the meaningful dataset-level throughput. 3. **Renames "Estimated single node throughput" to "Estimated single task throughput"**: At the operator level, this metric divides total rows by the sum of task wall times. The new name more accurately reflects what it measures—the throughput if all work were done by a single task serially. --------- Signed-off-by: dancingactor <[email protected]> Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Balaji Veeramani <[email protected]>
This PR makes three improvements to Ray Data's throughput statistics:
Makes
test_dataset_throughputdeterministic: The original test was flaky because it relied on actual taskexecution timing. This PR rewrites it as unit tests (
test_dataset_throughput_calculationandtest_operator_throughput_calculation) using mockedBlockStatsobjects, making the tests fast and reliable.Removes "Estimated single node throughput" from Dataset-level stats: This metric was misleading at the
dataset level since it summed wall times across all operators, which doesn't accurately represent single-node
performance. The "Ray Data throughput" metric (total rows / total wall time) remains and provides the meaningful
dataset-level throughput.
Renames "Estimated single node throughput" to "Estimated single task throughput": At the operator level,
this metric divides total rows by the sum of task wall times. The new name more accurately reflects what it
measures—the throughput if all work were done by a single task serially.