-
Notifications
You must be signed in to change notification settings - Fork 7k
[core] Admission control for pulling objects to the local node #13514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
stephanie-wang
merged 18 commits into
ray-project:master
from
stephanie-wang:admission-control
Jan 22, 2021
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
91a1d1f
Admission control, TODO: tests, object size
stephanie-wang 136574c
Unit tests for admission control and some bug fixes
stephanie-wang 4f38ef4
Add object size to object table, only activate pull if object size is…
stephanie-wang cc34ea2
Some fixes, reset timer on eviction
stephanie-wang e1edc19
doc
stephanie-wang 70cf694
Merge remote-tracking branch 'upstream/master' into admission-control
stephanie-wang add49bd
update
stephanie-wang 1e9d091
Trigger OOM from the pull manager
stephanie-wang 987d710
don't spam
stephanie-wang b522cd4
doc
stephanie-wang 03ffed7
Update src/ray/object_manager/pull_manager.cc
stephanie-wang 2bc1cee
Remove useless tests
stephanie-wang 0070cfa
Merge branch 'admission-control' of github.com:stephanie-wang/ray int…
stephanie-wang 9114c94
Fix test
stephanie-wang 7662e14
osx build
stephanie-wang 73375ce
Skip broken test
stephanie-wang ce6d79b
tests
stephanie-wang 68c9efd
Skip failing tests
stephanie-wang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -296,6 +296,89 @@ def driver(): | |
| ray.get(driver.remote()) | ||
|
|
||
|
|
||
| @pytest.mark.skip( | ||
| reason="This hangs due to a deadlock between a worker getting its " | ||
| "arguments and the node pulling arguments for the next task queued.") | ||
| @pytest.mark.timeout(30) | ||
| def test_pull_bundles_admission_control(shutdown_only): | ||
| cluster = Cluster() | ||
| object_size = int(6e6) | ||
| num_objects = 10 | ||
| num_tasks = 10 | ||
| # Head node can fit all of the objects at once. | ||
rkooo567 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| cluster.add_node( | ||
| num_cpus=0, | ||
| object_store_memory=2 * num_tasks * num_objects * object_size) | ||
| cluster.wait_for_nodes() | ||
| ray.init(address=cluster.address) | ||
|
|
||
| # Worker node can only fit 1 task at a time. | ||
| cluster.add_node( | ||
| num_cpus=1, object_store_memory=1.5 * num_objects * object_size) | ||
| cluster.wait_for_nodes() | ||
|
|
||
| @ray.remote | ||
| def foo(*args): | ||
| return | ||
|
|
||
| args = [] | ||
| for _ in range(num_tasks): | ||
| task_args = [ | ||
| ray.put(np.zeros(object_size, dtype=np.uint8)) | ||
| for _ in range(num_objects) | ||
| ] | ||
| args.append(task_args) | ||
|
|
||
| tasks = [foo.remote(*task_args) for task_args in args] | ||
| ray.get(tasks) | ||
|
|
||
|
|
||
| @pytest.mark.skip( | ||
| reason="This hangs due to a deadlock between a worker getting its " | ||
| "arguments and the node pulling arguments for the next task queued.") | ||
| @pytest.mark.timeout(30) | ||
| def test_pull_bundles_admission_control_dynamic(shutdown_only): | ||
| # This test is the same as test_pull_bundles_admission_control, except that | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a nice test! |
||
| # the object store's capacity starts off higher and is later consumed | ||
| # dynamically by concurrent workers. | ||
| cluster = Cluster() | ||
| object_size = int(6e6) | ||
| num_objects = 10 | ||
| num_tasks = 10 | ||
| # Head node can fit all of the objects at once. | ||
| cluster.add_node( | ||
| num_cpus=0, | ||
| object_store_memory=2 * num_tasks * num_objects * object_size) | ||
| cluster.wait_for_nodes() | ||
| ray.init(address=cluster.address) | ||
|
|
||
| # Worker node can fit 2 tasks at a time. | ||
| cluster.add_node( | ||
| num_cpus=1, object_store_memory=2.5 * num_objects * object_size) | ||
| cluster.wait_for_nodes() | ||
|
|
||
| @ray.remote | ||
| def foo(*args): | ||
| return | ||
|
|
||
| @ray.remote | ||
| def allocate(*args): | ||
| return np.zeros(object_size, dtype=np.uint8) | ||
|
|
||
| args = [] | ||
| for _ in range(num_tasks): | ||
| task_args = [ | ||
| ray.put(np.zeros(object_size, dtype=np.uint8)) | ||
| for _ in range(num_objects) | ||
| ] | ||
| args.append(task_args) | ||
|
|
||
| tasks = [foo.remote(*task_args) for task_args in args] | ||
| allocated = [allocate.remote() for _ in range(num_objects)] | ||
| ray.get(tasks) | ||
| del allocated | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| import pytest | ||
| import sys | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -397,6 +397,12 @@ class ReferenceCounter : public ReferenceCounterInterface, | |
| absl::optional<absl::flat_hash_set<NodeID>> GetObjectLocations( | ||
| const ObjectID &object_id) LOCKS_EXCLUDED(mutex_); | ||
|
|
||
| /// Get an object's size. This will return 0 if the object is out of scope. | ||
| /// | ||
| /// \param[in] object_id The object whose size to get. | ||
| /// \return Object size, or 0 if the object is out of scope. | ||
| size_t GetObjectSize(const ObjectID &object_id) const; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you write a comment? (and explain when 0 is returned?) |
||
|
|
||
| /// Handle an object has been spilled to external storage. | ||
| /// | ||
| /// This notifies the primary raylet that the object is safe to release and | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we just use ray_start_cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was too lazy to figure out how to parametrize it properly :D Also, I was running into trouble where the non-head node would connect first, so the rest of the test wouldn't run properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh lol. I think you can just do like