-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
action and state offsets? #2
Comments
The action and state space for a lot of environments are NOT normalised between (-1, 1), but we still need to some how bound the output values of the neural network, so a So the actions given to the environment are modified accordingly: For example, in mountain car continuous env: the action space is between (-1, 1), and as the mean value [ (1 + (-1)) / 2 ] is 0 we do not require an offset, and the value of bound = 1, since our network only outputs between (-1, 1), so,
in HAC, the higher level policy also need to output a goal state, so we bound that in a similar way. But the state space of mountain car continuous env is defined as here the
for position variable: similarly, the for velocity variable: So, the net action is bound between min value = [-1.2, -0.07] and max value = [0.6, 0.07] The clip high/low are simply the max and min values of the action space. We use this to clip the output after adding noise to ensure that the action values after adding noise does not exceed the environment bounds. These can be obtained easily by going through the documentation of the environment. |
Thanks, that helps a lot! What about the |
No, the |
Thanks. |
I am just curious what do the action/state offset values mean?
https://github.com/nikhilbarhate99/Hierarchical-Actor-Critic-HAC-PyTorch/blob/master/train.py#L36
I can't seem to figure it out. How do you determine them, for example, for a new environment?
Similarly, for the clip low/high values for both action and states? If you could explain those as well I would appreciate it.
Thank you.
The text was updated successfully, but these errors were encountered: