Intrinsic reward calculation, sum or mean? #33

aklein1995 · 2021-07-30T07:38:35Z

Hi!

I have a question related to how the intrinsic rewards are calculated.
Why do you use the sum(1) instead of mean(1)?

Line 76 in e383fb9

    
           intrinsic_reward = (target_next_feature - predict_next_feature).pow(2).sum(1) / 2

That would calculate the sum along the 512 output neurons, which is different than calculating the mean along those outputs.

Hope you could clear me,
Thank you in advance

FlaminG0 · 2023-06-20T09:04:36Z

Have you get any idea now? I am also confused here, it is different from calculating the MSE. I am also wander why 2 is divided here, not n like MSE.

Thanks in advance

cangozpi · 2024-03-05T19:26:32Z

Could it be that this difference does not matter because we are using reward_rms to normalize the intrinsic rewards ?

Provide feedback