TargetNetwork #966

HenriDeh · 2023-08-17T10:38:12Z

This PR started when I wanted to clean up a bit the RLCore structure. Then I noticed that TargetNetworks are currently named TwinNetworks (incorrect naming see #961). I went down the rabbit hole of refactoring it in addition to simply renaming it. I created a doc page, I invite you to read it to understand the point of this PR.

I've refactored all DQNs algorithms, they are now agnostic to the use of a target or not. MPO also uses the new struct (here it is enforced by type restrictions).

I have yet to create some test for the new struct but I'd like your opinion on what I converged to.

PR Checklist

Update NEWS.md?
Unit tests for all structs / functions?
Integration and correctness tests using a simple env?
PR Review?
Add or update documentation?
Write docstrings for new methods?

HenriDeh · 2023-08-18T08:59:10Z

@jeremiahpslewis I tried to replicate the error of the experiments in a8acb2d. This seems to be an error that happens with very low probability.

It looks like an action "0" is occasionally sampled an leads to an out of bounds error. This 0 I think is a dummy action used in RLTrajectories padding... So there's a sneaky bug to find.

Edit: this is because the prioritized replay experiment uses NStepBatchSampler, which is not adapted to the EpisodesBuffer yet.

jeremiahpslewis

@HenriDeh I can review things again once I've submitted my master's thesis end of next week. Looking forward to digging into this!

HenriDeh added 28 commits August 11, 2023 11:39

move learner stuff to learners.jl

13c9305

regroup includes

1b09e95

move approximator

06fc9b2

docstring

d0b0abf

move export

479427d

rm duplicate line

e3e26b5

refactor twinnetwork

776b29a

typos

a456076

bump compats and versions

e963e5e

bump compats

175cdef

dqn

42104cc

change include order

402f575

dqn mpe

16fb409

Merge branch 'main' into cleaning

127382f

dqn gpu

43ae4cd

finalise DQN

706456b

export forward

c897849

IQN

758eeb9

rainbow

d863700

prioritized

44b33ef

QR and REM

9269b71

mpo

e676f1d

activate all tests

6437730

Documentation

1c010fc

rename

8353282

fix cartpole gpu

c23c354

Merge branch 'main' into cleaning

46fb7d2

add tests

a8acb2d

HenriDeh requested a review from jeremiahpslewis August 17, 2023 14:11

change constructor

05c90b9

fix

f98a6fb

jeremiahpslewis reviewed Aug 18, 2023

View reviewed changes

HenriDeh mentioned this pull request Aug 23, 2023

NStepBatchSampler JuliaReinforcementLearning/ReinforcementLearningTrajectories.jl#56

Merged

HenriDeh mentioned this pull request Sep 8, 2023

add retrace #972

Closed

6 tasks

jeremiahpslewis approved these changes Sep 14, 2023

View reviewed changes

HenriDeh merged commit 3af7512 into main Sep 14, 2023
11 of 12 checks passed

HenriDeh deleted the cleaning branch September 14, 2023 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TargetNetwork #966

TargetNetwork #966

HenriDeh commented Aug 17, 2023 •

edited

Loading

HenriDeh commented Aug 18, 2023 •

edited

Loading

jeremiahpslewis left a comment

TargetNetwork #966

TargetNetwork #966

Conversation

HenriDeh commented Aug 17, 2023 • edited Loading

HenriDeh commented Aug 18, 2023 • edited Loading

jeremiahpslewis left a comment

Choose a reason for hiding this comment

HenriDeh commented Aug 17, 2023 •

edited

Loading

HenriDeh commented Aug 18, 2023 •

edited

Loading