Creative action space support: contains method, action interpoalation. #4837

ragavvenkatesan · 2019-05-22T19:00:03Z

Description of the problem

There are two ways of creatively engineering action spaces:

OpenAI gym actions has the following methods that the agents should internally use: sample and contains. The RL agents should adhere to these methods. If for instance, the RL agent only produces actions that are restricted by the contains method, then it will be easy to define any complicated action space with the contains method.
In parametric environments, along with action_masking, there must be a provision to return back to the agent, what action was actually used. In this case, if the agent provides an invalid action and inside the step method, the action was decoded, translated or interpolated to a valid one, the step method should return the used action which should be consumed by the agent for further back propagation. By default, this could be the same action that the agent rolled out.

Either of these facilities will provide a lot of capability of creatively engineering complex action spaces including for Safe RL techniques.

The text was updated successfully, but these errors were encountered:

ericl · 2019-05-24T00:21:47Z

OpenAI gym actions has the following methods that the agents should internally use: sample and contains. The RL agents should adhere to these methods. If for instance, the RL agent only produces actions that are restricted by the contains method, then it will be easy to define any complicated action space with the contains method.

IIUC, the proposal is to use .contains() as a boolean function to see if the action is valid, and if not, try sampling a new action? One issue I can see is that this could be very slow to sample if the containment is restrictive.
By the way, for any built-in action space we already clip actions to the space so that contains() should always be true (clip_actions=True by default).

In parametric environments, along with action_masking, there must be a provision to return back to the agent, what action was actually used. In this case, if the agent provides an invalid action and inside the step method, the action was decoded, translated or interpolated to a valid one, the step method should return the used action which should be consumed by the agent for further back propagation. By default, this could be the same action that the agent rolled out.

This sounds pretty useful. Perhaps it could be an extra method of the environment, preprocess_action(), which can return a modified action. If the action is modified, then we use that one for learning instead of the sampled one.

ragavvenkatesan · 2019-06-04T19:53:11Z

I don't know what 'IIUC' stand for.
Even though the built-in action space clips, it is still different from contains method. Consider a space where the actual space is a Box space between (0,1), but the contains will return false if the action is between (0.5, 0.6). This is a complex space that can't be implemented by using simple box.

The way, I suggest to implement the second method is not on the environment but on the actions class. The environment should ideally be agnostic of how actions are handled. The way I am visualizing this is to have a decode method on the action class. Similar to how the contains method returns a bool, the decode method should consume an action and return another action. Ideally, the decode method simply returns the input argument action back, but you can use the decode method as an interpolator for solving Safe RL problems, for instance.

The RLLib, internally can use the actions's decode method to infer what the actual action used was and backpropagate only that.

ragavvenkatesan · 2019-06-04T19:59:12Z

I am willing to contribute these things. I am new to the library so still getting the hang of it. If you can provide some insight on where this will fit design-wise and if we agree on the design itself, I am willing to do this myself.

I need this anyway for the research I am doing now. so.. :D

ericl · 2019-06-05T09:50:47Z

@ragavvenkatesan it would be great to add this.

The way, I suggest to implement the second method is not on the environment but on the actions class.

Makes sense. How would you do this, would the user be responsible for subclassing the right gym action space to add decode?

Ideally, the decode method simply returns the input argument action back, but you can use the decode method as an interpolator for solving Safe RL problems, for instance.

Sounds good. Btw the decode can probably be done right after where clip_action() is called in sampler.py.

ps: IIUC == if i understand correctly

ragavvenkatesan · 2019-06-05T17:46:11Z

Yes, I suppose there will be a new method added to sub-classed action classes called decode or even __call__. The problem in that case is it is not going to be backward compatible to existing Gym classes unless we look for a method called decode: if it doesn't exist, behave as is, otherwise, call it.
If I am correct, this logic should go somewhere here.

Alternatively, if we want a smoother backward compatibility with all gym classes, the Parameteric Environment could be made to return along with avail_actions a key called used_actions. This is a little uglier because the environment has to get dirty with actions and has the decoding mechanism tied into it instead of in actions. I am swinging between these two options.

ericl · 2019-06-05T23:53:06Z

It seems ok to check the method exists before trying to call it. I think that would be cleaner than putting it inside the observation data which seems more limiting.

…

On Thu, Jun 6, 2019, 1:46 AM Ragav Venkatesan ***@***.***> wrote: Yes, I suppose there will be a new method added to sub-classed action classes called decode or even _call__. The problem in that case is it is not going to be backward compatible to existing Gym classes unless we look for a method called decode and if it doesn't exist, behave as it is otherwise, call it. Alternatively, if we want backward compatibility with all gym classes, the Parameteric Environment could be made to return along with avail_actions a key called used_actions. This is a little uglier because the environment has to get dirty with actions and has the decoding mechanism tied into it instead of in actions. I am swinging between these two options. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4837?email_source=notifications&email_token=AAADUSXJL2LEBLCEZSDKLQ3PY73XHA5CNFSM4HOXDFN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAPSGQ#issuecomment-499185946>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAADUSUSLMMPJRUOCP6G7KTPY73XHANCNFSM4HOXDFNQ> .

ericl added the P3 Issue moderate in impact or severity label Mar 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creative action space support: contains method, action interpoalation. #4837

Creative action space support: contains method, action interpoalation. #4837

ragavvenkatesan commented May 22, 2019

ericl commented May 24, 2019

ragavvenkatesan commented Jun 4, 2019

ragavvenkatesan commented Jun 4, 2019

ericl commented Jun 5, 2019

ragavvenkatesan commented Jun 5, 2019 •

edited

Loading

ericl commented Jun 5, 2019 via email

Creative action space support: contains method, action interpoalation. #4837

Creative action space support: contains method, action interpoalation. #4837

Comments

ragavvenkatesan commented May 22, 2019

Description of the problem

ericl commented May 24, 2019

ragavvenkatesan commented Jun 4, 2019

ragavvenkatesan commented Jun 4, 2019

ericl commented Jun 5, 2019

ragavvenkatesan commented Jun 5, 2019 • edited Loading

ericl commented Jun 5, 2019 via email

ragavvenkatesan commented Jun 5, 2019 •

edited

Loading