-
Notifications
You must be signed in to change notification settings - Fork 7k
[RLlib] Decentralized multi-agent learning; PR #01 #21421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
be8b0c7
e9cc32a
83ea25c
43db716
488443f
2e35dc0
cf07481
51fae10
d4168ca
affa2c6
c872b25
bd1a0bc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,6 +23,15 @@ | |
| "buffer_size": 2000000, | ||
| # TODO(jungong) : update once Apex supports replay_buffer_config. | ||
| "replay_buffer_config": None, | ||
| # Whether all shards of the replay buffer must be co-located | ||
| # with the learner process (running the execution plan). | ||
| # This is preferred b/c the learner process should have quick | ||
| # access to the data from the buffer shards, avoiding network | ||
| # traffic each time samples from the buffer(s) are drawn. | ||
| # Set this to False for relaxing this constraint and allowing | ||
| # replay shards to be created on node(s) other than the one | ||
| # on which the learner is located. | ||
| "replay_buffer_shards_colocated_with_driver": True, | ||
| "learning_starts": 50000, | ||
| "train_batch_size": 512, | ||
| "rollout_fragment_length": 50, | ||
|
|
@@ -31,6 +40,7 @@ | |
| "worker_side_prioritization": True, | ||
| "min_iter_time_s": 30, | ||
| }, | ||
| _allow_unknown_configs=True, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why need this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are using Trainer's merge utility. It requires that if the second config (APEX-DDPG's) contains new keys that you set this to True.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👌 👌 |
||
| ) | ||
|
|
||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I wonder, why do they need to be on the same node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For APEX, local node is the learner, so data (one in the buffer shards) never has to travel again. I think that's the sole intention here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see I see. to be honest, this doesn't feel like a requirement to me, more like an optimization.
since we don't have viability guarantee from Ray core, If it's up to me, I would choose to do this as a best-effort thing.
like trying to colocate everything, and if that fails, schedule the other rb shards anywhere.
then we don't have the while loop, and this scheduling can finish in at most 2 steps.
it is obviously too big of a change. maybe just add a note/todo somewhere???
as written, I am a little worried a stack may fail with mysterious error message like "fail to schedule RB actors" while there are enough CPUs, just a small head node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: This is nothing new that I introduced here for APEX. We have always forced all replay shards to be located on the driver. This change actually allows users (via setting this new flag to False) to relax this constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a comment to explain this more. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appreciate!