-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creative action space support: contains method, action interpoalation. #4837
Comments
IIUC, the proposal is to use
This sounds pretty useful. Perhaps it could be an extra method of the environment, |
I don't know what 'IIUC' stand for. The way, I suggest to implement the second method is not on the environment but on the actions class. The environment should ideally be agnostic of how actions are handled. The way I am visualizing this is to have a The RLLib, internally can use the actions's |
I am willing to contribute these things. I am new to the library so still getting the hang of it. If you can provide some insight on where this will fit design-wise and if we agree on the design itself, I am willing to do this myself. I need this anyway for the research I am doing now. so.. :D |
@ragavvenkatesan it would be great to add this.
Makes sense. How would you do this, would the user be responsible for subclassing the right gym action space to add
Sounds good. Btw the decode can probably be done right after where clip_action() is called in sampler.py. ps: IIUC == if i understand correctly |
Yes, I suppose there will be a new method added to sub-classed action classes called Alternatively, if we want a smoother backward compatibility with all gym classes, the Parameteric Environment could be made to return along with |
It seems ok to check the method exists before trying to call it. I think
that would be cleaner than putting it inside the observation data which
seems more limiting.
…On Thu, Jun 6, 2019, 1:46 AM Ragav Venkatesan ***@***.***> wrote:
Yes, I suppose there will be a new method added to sub-classed action
classes called decode or even _call__. The problem in that case is it is
not going to be backward compatible to existing Gym classes unless we look
for a method called decode and if it doesn't exist, behave as it is
otherwise, call it.
Alternatively, if we want backward compatibility with all gym classes, the
Parameteric Environment could be made to return along with avail_actions
a key called used_actions. This is a little uglier because the
environment has to get dirty with actions and has the decoding mechanism
tied into it instead of in actions. I am swinging between these two options.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4837?email_source=notifications&email_token=AAADUSXJL2LEBLCEZSDKLQ3PY73XHA5CNFSM4HOXDFN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAPSGQ#issuecomment-499185946>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAADUSUSLMMPJRUOCP6G7KTPY73XHANCNFSM4HOXDFNQ>
.
|
Description of the problem
There are two ways of creatively engineering action spaces:
OpenAI gym actions has the following methods that the agents should internally use:
sample
andcontains
. The RL agents should adhere to these methods. If for instance, the RL agent only produces actions that are restricted by thecontains
method, then it will be easy to define any complicated action space with thecontains
method.In parametric environments, along with
action_masking
, there must be a provision to return back to the agent, what action was actually used. In this case, if the agent provides an invalid action and inside the step method, the action was decoded, translated or interpolated to a valid one, the step method should return the used action which should be consumed by the agent for further back propagation. By default, this could be the same action that the agent rolled out.Either of these facilities will provide a lot of capability of creatively engineering complex action spaces including for Safe RL techniques.
The text was updated successfully, but these errors were encountered: