-
-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TargetNetwork #966
TargetNetwork #966
Conversation
@jeremiahpslewis I tried to replicate the error of the experiments in a8acb2d. This seems to be an error that happens with very low probability. It looks like an action "0" is occasionally sampled an leads to an out of bounds error. This 0 I think is a dummy action used in RLTrajectories padding... So there's a sneaky bug to find. Edit: this is because the prioritized replay experiment uses NStepBatchSampler, which is not adapted to the EpisodesBuffer yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HenriDeh I can review things again once I've submitted my master's thesis end of next week. Looking forward to digging into this!
This PR started when I wanted to clean up a bit the RLCore structure. Then I noticed that TargetNetworks are currently named TwinNetworks (incorrect naming see #961). I went down the rabbit hole of refactoring it in addition to simply renaming it. I created a doc page, I invite you to read it to understand the point of this PR.
I've refactored all DQNs algorithms, they are now agnostic to the use of a target or not. MPO also uses the new struct (here it is enforced by type restrictions).
I have yet to create some test for the new struct but I'd like your opinion on what I converged to.
PR Checklist