You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the RND paper on page 15, it mentions that extrinsic rewards are clipped in [-1,1].
But in the official RND code in atari_wrappers.py it clips extrinsic rewards using the ClipRewardEnv function which does:
"""Bin reward to {+1, 0, -1} by its sign."""returnfloat(np.sign(reward))
I believe the implementation and the explanation in the paper is a little different.
In your implementation (jcwleo) you are clipping by doing:
In the RND paper on page 15, it mentions that extrinsic rewards are clipped in [-1,1].
But in the official RND code in atari_wrappers.py it clips extrinsic rewards using the ClipRewardEnv function which does:
I believe the implementation and the explanation in the paper is a little different.
In your implementation (jcwleo) you are clipping by doing:
I believe this is different than the official implementation. Does anyone have an explanation of this discrepancy and what to use ?
The text was updated successfully, but these errors were encountered: