Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TargetNetwork #966

Merged
merged 30 commits into from
Sep 14, 2023
Merged

TargetNetwork #966

merged 30 commits into from
Sep 14, 2023

Conversation

HenriDeh
Copy link
Member

@HenriDeh HenriDeh commented Aug 17, 2023

This PR started when I wanted to clean up a bit the RLCore structure. Then I noticed that TargetNetworks are currently named TwinNetworks (incorrect naming see #961). I went down the rabbit hole of refactoring it in addition to simply renaming it. I created a doc page, I invite you to read it to understand the point of this PR.

I've refactored all DQNs algorithms, they are now agnostic to the use of a target or not. MPO also uses the new struct (here it is enforced by type restrictions).

I have yet to create some test for the new struct but I'd like your opinion on what I converged to.

PR Checklist

  • Update NEWS.md?
  • Unit tests for all structs / functions?
  • Integration and correctness tests using a simple env?
  • PR Review?
  • Add or update documentation?
  • Write docstrings for new methods?

@HenriDeh
Copy link
Member Author

HenriDeh commented Aug 18, 2023

@jeremiahpslewis I tried to replicate the error of the experiments in a8acb2d. This seems to be an error that happens with very low probability.

It looks like an action "0" is occasionally sampled an leads to an out of bounds error. This 0 I think is a dummy action used in RLTrajectories padding... So there's a sneaky bug to find.

Edit: this is because the prioritized replay experiment uses NStepBatchSampler, which is not adapted to the EpisodesBuffer yet.

Copy link
Member

@jeremiahpslewis jeremiahpslewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HenriDeh I can review things again once I've submitted my master's thesis end of next week. Looking forward to digging into this!

@HenriDeh HenriDeh merged commit 3af7512 into main Sep 14, 2023
11 of 12 checks passed
@HenriDeh HenriDeh deleted the cleaning branch September 14, 2023 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants