-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
divide by zero error #4
Comments
It's caused by this and it's actually raising from the I've forked and partially fixed the issues and made a couple other changes (plus made the PER memory more configurable).
I'm happy to PR my changes into this repo if @rlcode wants them. |
As referenced in (Schaul et al., 2015), as TD error approaches 0 we will have divide by zero errors. They fix this via: Where epsilon is a small value to prevent this. I think you are missing this from your algorithm? I am pretty confident that if you have been testing on cartpole you with never run into this issue, however in discrete state spaces (like mazes) this becomes a real problem. |
Also, according to the paper, when store a new transition (s, a, r, s_) to the memory the priority should be the maximum priority among the leaf node right? But in the code it used the TD error of the s and s_ which is different from the paper. I am wondering whether this is a bug or not. |
Hi there! I faced the same issue and what I did is to sample another value of that same interval, until it is not an integer (given that the capacity is initialized to np.zeros ). In the prioritized memory I added the following:
This did the trick for me. Hope it does the same to you. |
If anyone is still wondering why it pulls 0 from the replay memory, it is because the location in the replay memory that was sampled was not filled out yet and thus contained the initial values with which we initialized the replay buffer. i.e., 0's. If you set a condition that the training does not start until the buffer is completely filled, then you never encounter this issue. |
Brilliant! |
Hello, thank you for the work. I am facing the issue of dividing by zero error in the line below when calling the sample function to sample memory. Any idea why?
is_weight /= is_weight.max()
The text was updated successfully, but these errors were encountered: