-
Notifications
You must be signed in to change notification settings - Fork 7k
[Data] Fix obj_store_mem_max_pending_output_per_task reporting #58864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Srinath Krishnamachari <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly adjusts the memory estimation for pending task outputs by incorporating MAX_SAFE_BLOCK_SIZE_FACTOR when no block samples are available. The accompanying tests are thorough and validate the new logic across various scenarios. I have one suggestion to refactor the implementation for better code clarity and to reduce duplication.
| if context.target_max_block_size is None: | ||
| return None | ||
| bytes_per_output = context.target_max_block_size | ||
| bytes_per_output = ( | ||
| DEFAULT_TARGET_MAX_BLOCK_SIZE * MAX_SAFE_BLOCK_SIZE_FACTOR | ||
| ) | ||
| else: | ||
| bytes_per_output = ( | ||
| context.target_max_block_size * MAX_SAFE_BLOCK_SIZE_FACTOR | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for determining bytes_per_output when a sample is not available can be simplified to reduce code duplication. Both branches of the if/else statement perform the same multiplication with MAX_SAFE_BLOCK_SIZE_FACTOR. You can determine the target_block_size first and then perform the multiplication once.
target_block_size = context.target_max_block_size
if target_block_size is None:
target_block_size = DEFAULT_TARGET_MAX_BLOCK_SIZE
bytes_per_output = target_block_size * MAX_SAFE_BLOCK_SIZE_FACTORSigned-off-by: Srinath Krishnamachari <[email protected]>
python/ray/data/_internal/execution/interfaces/op_runtime_metrics.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Srinath Krishnamachari <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Tests not updated for MAX_SAFE_BLOCK_SIZE_FACTOR change
The code change multiplies bytes_per_output by MAX_SAFE_BLOCK_SIZE_FACTOR when estimating pending task outputs, but three existing test assertions in test_task_pool_resource_reporting, test_task_pool_resource_reporting_with_bundling, and test_actor_pool_resource_reporting still expect the old calculation without the factor. These tests will fail because the actual memory usage will be 1.5x larger than their expected values.
python/ray/data/tests/test_executor_resource_management.py#L222-L228
ray/python/ray/data/tests/test_executor_resource_management.py
Lines 222 to 228 in 523bb08
| assert op.metrics.obj_store_mem_pending_task_inputs == pytest.approx(1600, rel=0.5) | |
| assert op.metrics.obj_store_mem_pending_task_outputs == pytest.approx( | |
| 2 # Number of active tasks | |
| * ctx._max_num_blocks_in_streaming_gen_buffer | |
| * ctx.target_max_block_size, | |
| rel=0.5, | |
| ) |
python/ray/data/tests/test_executor_resource_management.py#L282-L288
ray/python/ray/data/tests/test_executor_resource_management.py
Lines 282 to 288 in 523bb08
| assert op.metrics.obj_store_mem_pending_task_inputs == pytest.approx(2400, rel=0.5) | |
| assert op.metrics.obj_store_mem_pending_task_outputs == pytest.approx( | |
| 1 # Number of active tasks | |
| * ctx._max_num_blocks_in_streaming_gen_buffer | |
| * ctx.target_max_block_size, | |
| rel=0.5, | |
| ) |
python/ray/data/tests/test_executor_resource_management.py#L355-L361
ray/python/ray/data/tests/test_executor_resource_management.py
Lines 355 to 361 in 523bb08
| assert op.metrics.obj_store_mem_pending_task_inputs == pytest.approx(3200, rel=0.5) | |
| assert op.metrics.obj_store_mem_pending_task_outputs == pytest.approx( | |
| 2 # We launched 4 tasks across 2 actor, but only 2 tasks run at a time | |
| * ctx._max_num_blocks_in_streaming_gen_buffer | |
| * ctx.target_max_block_size, | |
| rel=0.5, | |
| ) |
…roject#58864) > Thank you for contributing to Ray! 🚀 > Please review the [Ray Contribution Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html) before opening a pull request. >⚠️ Remove these instructions before submitting your PR. > 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete. ## Description > Briefly describe what this PR accomplishes and why it's needed. ### [Data] Fix obj_store_mem_max_pending_output_per_task reporting Fix `obj_store_mem_max_pending_output_per_task` when sample is unavailable to factor in, - `bytes_per_output` = `MAX_SAFE_BLOCK_SIZE_FACTOR` * `target_max_block_size`. ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Srinath Krishnamachari <[email protected]> Signed-off-by: YK <[email protected]>
…roject#58864) > Thank you for contributing to Ray! 🚀 > Please review the [Ray Contribution Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html) before opening a pull request. >⚠️ Remove these instructions before submitting your PR. > 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete. ## Description > Briefly describe what this PR accomplishes and why it's needed. ### [Data] Fix obj_store_mem_max_pending_output_per_task reporting Fix `obj_store_mem_max_pending_output_per_task` when sample is unavailable to factor in, - `bytes_per_output` = `MAX_SAFE_BLOCK_SIZE_FACTOR` * `target_max_block_size`. ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Srinath Krishnamachari <[email protected]>
Description
[Data] Fix obj_store_mem_max_pending_output_per_task reporting
Fix
obj_store_mem_max_pending_output_per_taskwhen sample is unavailable to factor in,bytes_per_output=MAX_SAFE_BLOCK_SIZE_FACTOR*target_max_block_size.Related issues
Additional information