Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support PettingZoo Parallel API and action mask #305
Support PettingZoo Parallel API and action mask #305
Changes from 3 commits
d776e31
f66b514
12cd0f6
a3479be
b289a3f
33652c4
2b60e51
4a0de3e
9d561e2
eaca450
f6899b6
8ba3044
1a66a74
b46ee0f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking if we don't have to re-normalize the probabilities here so they add up to 1.
Does torch.multinomial do this internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, it seems there is no need to add up to 1, according to the doc:
https://pytorch.org/docs/stable/generated/torch.multinomial.html
But I'm not so sure honestly (I'm a newbie on RL). So please feel free to fix if you see something wrong with the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, seems it requires to normalize the value with softmax as far as I understand, so implemented that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a bit cleaner to add special treatment for action_mask inside
make_actor_critic_func
but I'm fine with this solution too 👍There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that it meant to do inside
default_make_actor_critic_func
. Fixed it so anyway 🙏