Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsic reward calculation, sum or mean? #33

Open
aklein1995 opened this issue Jul 30, 2021 · 2 comments
Open

Intrinsic reward calculation, sum or mean? #33

aklein1995 opened this issue Jul 30, 2021 · 2 comments

Comments

@aklein1995
Copy link

Hi!

I have a question related to how the intrinsic rewards are calculated.
Why do you use the sum(1) instead of mean(1)?

intrinsic_reward = (target_next_feature - predict_next_feature).pow(2).sum(1) / 2

That would calculate the sum along the 512 output neurons, which is different than calculating the mean along those outputs.

At the original release with tensorflow, they use reduce_mean, and im a little bit confused.
https://github.com/openai/random-network-distillation/blob/f75c0f1efa473d5109d487062fd8ed49ddce6634/policies/cnn_gru_policy_dynamics.py#L241

Hope you could clear me,
Thank you in advance

@FlaminG0
Copy link

Have you get any idea now? I am also confused here, it is different from calculating the MSE. I am also wander why 2 is divided here, not n like MSE.

Thanks in advance

@cangozpi
Copy link

cangozpi commented Mar 5, 2024

Could it be that this difference does not matter because we are using reward_rms to normalize the intrinsic rewards ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants