action and state offsets? #2

drozzy · 2019-09-29T01:32:16Z

I am just curious what do the action/state offset values mean?

https://github.com/nikhilbarhate99/Hierarchical-Actor-Critic-HAC-PyTorch/blob/master/train.py#L36

I can't seem to figure it out. How do you determine them, for example, for a new environment?

Similarly, for the clip low/high values for both action and states? If you could explain those as well I would appreciate it.

Thank you.

nikhilbarhate99 · 2019-09-29T06:20:50Z

The action and state space for a lot of environments are NOT normalised between (-1, 1), but we still need to some how bound the output values of the neural network, so a Tanh activation function at the end of the network does not sufficiently bound the output because the spaces are not normalised.

So the actions given to the environment are modified accordingly:
action = ( network output (Tanh) * bounds ) + offset

For example, in mountain car continuous env:

the action space is between (-1, 1), and as the mean value [ (1 + (-1)) / 2 ] is 0 we do not require an offset, and the value of bound = 1, since our network only outputs between (-1, 1), so,

action = ( network output (Tanh) * bounds ) + offset
i.e action = (network output * 1) + 0

in HAC, the higher level policy also need to output a goal state, so we bound that in a similar way.
(here the output goal state is considered as action for the high level policy)

But the state space of mountain car continuous env is defined as [position, velocity] between min value = [-1.2, -0.07] and max value = [0.6, 0.07],

here the position variable (-1.2, 0.6) is NOT normalised to (-1,1) and its mean value [ (0.6 + (-1.2)) / 2 ] is 0.3

action = ( network output (Tanh) * bounds ) + offset

for position variable:
action = (network output * 0.9) + 0.3
this bounds the value of the action to (-1.2, 0.6)

similarly, the velocity variable (-0.07, 0.07) is NOT normalised to (-1,1) and its mean value [ (0.6 + (-1.2)) / 2 ] is
0, so,

for velocity variable:
action = (network output * 0.07) + 0
this bounds the value of the action to (-0.07, 0.07)

So, the net action is bound between min value = [-1.2, -0.07] and max value = [0.6, 0.07]

The clip high/low are simply the max and min values of the action space. We use this to clip the output after adding noise to ensure that the action values after adding noise does not exceed the environment bounds. These can be obtained easily by going through the documentation of the environment.

drozzy · 2019-09-30T04:03:04Z

Thanks, that helps a lot!

What about the exploration_action_noise and exploration_state_noise values?
Are they derived from action/state spaces somehow?

nikhilbarhate99 · 2019-10-01T11:46:35Z

No, the exploration_action_noise and exploration_state_noise are hyper parameters that need to be tuned by experimentation.

drozzy · 2019-10-02T17:26:37Z

Thanks.

drozzy closed this as completed Sep 30, 2019

drozzy reopened this Sep 30, 2019

drozzy closed this as completed Oct 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

action and state offsets? #2

action and state offsets? #2

drozzy commented Sep 29, 2019 •

edited

Loading

nikhilbarhate99 commented Sep 29, 2019

drozzy commented Sep 30, 2019 •

edited

Loading

nikhilbarhate99 commented Oct 1, 2019

drozzy commented Oct 2, 2019

action and state offsets? #2

action and state offsets? #2

Comments

drozzy commented Sep 29, 2019 • edited Loading

nikhilbarhate99 commented Sep 29, 2019

drozzy commented Sep 30, 2019 • edited Loading

nikhilbarhate99 commented Oct 1, 2019

drozzy commented Oct 2, 2019

drozzy commented Sep 29, 2019 •

edited

Loading

drozzy commented Sep 30, 2019 •

edited

Loading