You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think its because actions is a 1hot vector and there is 1 only in the chosen action,
So multiplying will give you a vector of zeros instead of one place which will hold the qvalue.
the reduce_sum just gets this number out because all the rest are zeros.
What do you think?
why multiply by action and use reduce sum instead of argmax?
The text was updated successfully, but these errors were encountered: