Query regarding 'advantages' in a2c #43

akileshbadrinaaraayanan · 2017-06-21T07:23:11Z

The actor net takes state as input and outputs a policy containing the probability of each action. In train_model(), the ground truth for training actor net is 'advantages' which is not a probability distribution over possible actions. So, how does the categorical cross-entropy computation between the predicted output of actor net and 'advantages' work?

Thanks,
Akilesh

dnddnjs · 2017-06-24T08:37:58Z

Advantage means r + gamma * V(s') - V(s)
and loss function of policy network will be log(action_prob) * advantages which is form of cross entropy.

action_prob will be [p1, p2] and advantages will be [0, advantages] if actual action of agent is 2.
then cross entropy calculation just becomes [log(action_prob) * advantages]

akileshbadrinaaraayanan · 2017-06-27T09:11:21Z

Thanks a lot!

akileshbadrinaaraayanan · 2017-07-11T16:24:59Z

Hi,

Categorical cross-entropy in Keras with TF backend expects a probability distribution. However, advantages is not a probability distribution in this case. I have observed that it works fairly good but could you please explain how exactly it works. Categorical cross-entropy is generally defined between two probability distributions p(x) and q(x)

Thanks,
Akilesh

fredcallaway · 2017-07-11T17:47:49Z

#54

akileshbadrinaaraayanan closed this as completed Jun 27, 2017

akileshbadrinaaraayanan reopened this Jul 11, 2017

akileshbadrinaaraayanan closed this as completed Jul 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query regarding 'advantages' in a2c #43

Query regarding 'advantages' in a2c #43

akileshbadrinaaraayanan commented Jun 21, 2017

dnddnjs commented Jun 24, 2017

akileshbadrinaaraayanan commented Jun 27, 2017

akileshbadrinaaraayanan commented Jul 11, 2017

fredcallaway commented Jul 11, 2017

Query regarding 'advantages' in a2c #43

Query regarding 'advantages' in a2c #43

Comments

akileshbadrinaaraayanan commented Jun 21, 2017

dnddnjs commented Jun 24, 2017

akileshbadrinaaraayanan commented Jun 27, 2017

akileshbadrinaaraayanan commented Jul 11, 2017

fredcallaway commented Jul 11, 2017