[Question] Why does MaskablePPO does not mask with some logic with last observation? #234
Open
4 tasks done
Labels
question
Further information is requested
❓ Question
At
MaskablePPO
class, the change for getting the masks is to ask the environment to provide it by he functionget_action_mask
. I can see that theget_action_mask
only gets the environment object as input, but at that point we also have theself._last_obs
variable. To provide the action mask more information about the observation it is facing, It would be interesting to provide that method with the last observation object, isn't it? I am thinking about a game that has some logic that we want to keep and code to prevent the agent making those actions.I assume that I am not the first thinking this so, is it a performance killer to do like so? Has it something to do with the environment vectorizations?
Checklist
The text was updated successfully, but these errors were encountered: