Can anyone explain Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) in Deep Q learning with Doom.ipynb #65

ParmpalGill · 2019-07-28T13:11:05Z

why multiply by action and use reduce sum instead of argmax?

yonigottesman · 2019-09-26T08:25:09Z

I think its because actions is a 1hot vector and there is 1 only in the chosen action,
So multiplying will give you a vector of zeros instead of one place which will hold the qvalue.
the reduce_sum just gets this number out because all the rest are zeros.
What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can anyone explain Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) in Deep Q learning with Doom.ipynb #65

Can anyone explain Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) in Deep Q learning with Doom.ipynb #65

ParmpalGill commented Jul 28, 2019 •

edited

Loading

yonigottesman commented Sep 26, 2019

Can anyone explain Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) in Deep Q learning with Doom.ipynb #65

Can anyone explain Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) in Deep Q learning with Doom.ipynb #65

Comments

ParmpalGill commented Jul 28, 2019 • edited Loading

yonigottesman commented Sep 26, 2019

ParmpalGill commented Jul 28, 2019 •

edited

Loading