-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Action Masking #1404
Comments
Please! I do need this feature so bad. Could anyone provide any RL library that supports action space? |
You could do this in a class ActionMask(TensorDictModule):
...
def forward(self, td):
if td['observation'] == 1:
td['action'][:, 0] = 0
policy = Sequential(EGreedyWrapper(Sequential(mlp, QValueModule())), ActionMask()) |
@1030852813 , TensorFlow Agents supports action masking. It allows to pass mask to their
The good thing about their implementation is that the policies are designed to use the same form of |
Thanks for reporting this! We should enable this use case, you are totally right. What is the domain of the actions? Should this work for continuous and discrete or just discrete? Can you provide a small example of what you'd like to see as a transform for that? In the example you gave the mask is changing continuously, I guess that should be an input to the transform too? |
Thank you for your response. I am considering a discrete action space at this point, but it would be useful if it is possible to define masks for continuous action space as well. I have not come up with the full list of transforms I need, so I am currently following the DQN example (https://pytorch.org/rl/tutorials/coding_dqn.html) without transform (in other words, empty Environment SettingHere is a simplified version of my environment that needs action mask (not the actual environment):
Modifications to make the action mask workGiven this environment, here is the list of modifications I needed to make to run the DQN example with action masking. I am new to Torch, so Feel free to let me know if there are better ways of implementation.
The action mask module is attached after the
Question and/or SuggestionBased on the observation, I believe it would be helpful to have variants of Also, in my case, action mask is similar to one hot discrete tensor (should have at least one 1, except for the terminal state) but may have multiple 1's, so wonder if there is a spec which satisfies the requirements. I am using For a specific type of actor (including Q-actor), it would be too much to implement action masking features since different algorithms may utilize action mask differently. I believe it is reasonable to add an argument As I mentioned, I would be appreciated if there are better (or more desirable) ways of implementation. |
Could you look at the PR above and let me know if that helps solving your problem? |
Hi there, dropping by since it would be a very useful feature for our RL4CO library :) |
@vmoens , I re-installed torchrl (masked_action branch) and tensordict based on your notebook, and copy-pasted the script in the section Masked actions in env: ActionMask transform without modification. I confirmed that
I get the same attribute error when I change action_spec from |
I can see the point. A self-explanatory name and a clear purpose are great. |
@vmoens , is there anything I can check with to address the |
@vmoens , thank you for your help which made the implementation cleaner. I made several modifications to make the action_mask work with the dqn tutorial. I am still validating if there are any unhandled exceptions, and I would appreciate more efficient implementations. Modified Locations (Note:
The corresponding location is as follows:
|
Thanks for this. |
No problem. I will let you know if I have any updates about this. |
The PR should be fixed now, the notebook works on my side. If you're happy with it we can merge it. |
I confirmed that the notebook works on my side as well. If possible, I suggest I still need to make the same modification to I see that there is an ongoing discussion on I think PR can be merged when |
As discussed with @matteobettini, we think EGreedy and DQN compatibility is a separate issue that deserves to be addressed in independent PRs. Happy to tackle these asap. |
Motivation
I recently started learning TorchRL, and creating a custom environment (using torchrl.envs.EnvBase) based on the documentation (https://pytorch.org/rl/reference/envs.html). For my environment, I would like to apply an action mask, such that the environment does not allow infeasible action based on the observation (for example, suppose the action is to choose a trump card, the number of
A
is limited, such that it cannot be chosen once all theA
's are drawn). So far, I could not find a way to implement action masking for the environment, but it would be a convenient feature to implement similar environment.Solution
It would be convenient if I can include a mask as a part of the
observation_spec
, such that the environment can tell feasible/infeasible actions based on the observation (even when a random action is chosen). Currently, my environment cannot passtorchrl.envs.utils.check_env_specs()
since infeasible actions are chosen.If it is not reasonable to implement this feature, any alternative way to implement an action mask is appreciated.
Checklist
I searched with the keyword 'Action Masking', but could not find relevant issues. Sorry if I missed something.
The text was updated successfully, but these errors were encountered: