-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Allow multiple (nested) action, reward, done keys in env
,vec_env
and collectors
#1462
Conversation
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts: # test/mocking_classes.py # test/test_env.py
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
env
module
Signed-off-by: Matteo Bettini <[email protected]>
Up to here benchmark runs have shown all performance has been maintained |
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
env
moduleenv
,vec_env
and collectors
This looks great! I have a few questions though:
|
Basically the idea is that at every step when some reset flag will be true, The other possibility is that you only put the dones you catually need in your done spec. i.e. if your agent can be done, but you want to reset only when some groups are done, than your done spec should have group granularity and not agent granularity.
That snippet i posted is just an example pushing the flexibility of a composite done spec to the extreme. I do not see a use for that in multiagent. In multiagent we suggest to stick to what is outlined in #1463 |
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
Benchmarks up to here
|
Signed-off-by: Matteo Bettini <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
@vmoens PR is ready for review, here the last benchmark results (i dunno how attendibili they are ahaha)
|
This reverts commit e8e410e.
Signed-off-by: Matteo Bettini <[email protected]>
# Conflicts: # torchrl/data/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of this PR, I think we should rename the "_action_spec" in "full_action_spec" etc.
For the rest I just have some minor comments, nice work!
Signed-off-by: Matteo Bettini <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Co-authored-by: Vincent Moens <[email protected]>
Signed-off-by: Matteo Bettini <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for this!
…`vec_env` and `collectors` (pytorch#1462) Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>
…`vec_env` and `collectors` (pytorch#1462) Signed-off-by: Matteo Bettini <[email protected]> Co-authored-by: vmoens <[email protected]>
Depends on #512
This PR is an important milestone in the journey to #1463.
It will also allow single-agent users to have more than one action, reward, and done keys.
This is very important when your agents has multiple action (e.g., some discrete, some continuous ).
Main design problem: multiple dones
One major design choice here though is that if we allow multiple done keys, we need a "_reset" key for each (same for "truncated" , "is_init" and all the other done-based keys).
This is because we need to independently reset each done (e.g., in multiagent we want to reset only some done agents).
So the question is, what is the best way to do this? because we can imagine having a done_spec like
With a spec of this type, we place a "_reset" key for each done present (and that has any trues).
Same will be valid for "truncated" and "is_init".
A _reset td will look like
This solution is already implemented in the PR
cc @hyerra