-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are you actually using the learned intrinsic reward for the agent? #9
Comments
Also new to the repo, but here the loss is composed of both intrinsic and extrinsic reward:
|
Thanks @ruoshiliu. Yes, I saw the loss. But in addition to optimizing the loss you also need to use the intrinsic rewards (which is the result from optimizing its loss) for the agent as stated in the paper. Only optimizing the loss is not equivalent to using the intrinsic reward as an outcome of optimizing its loss. |
@ferreirafabio What do you mean by "use the intrinsic rewards"? Can you point out which section in the paper stated that? |
Yes |
Hi,
I can only see that you optimize the intrinsic loss in your code. Can you point me to the line where you add the intrinsic rewards to the actual environment/extrinsic rewards?
In some areas of your code I can see comments like
# total reward = int reward
which would, according to the original paper, be wrong, no?
Thank you.
The text was updated successfully, but these errors were encountered: