Missing features in RLCore #961

HenriDeh · 2023-08-11T14:02:56Z

Giving the discussion in #960, I'd make a list of components of RLCore that I feel are missing or incomplete.

Target networks: they are currently named TwinNetwork. This is incorrect, twin_networks refer to another concept in RL. They would wrap an approximator, hold a target version of it, and handle the updating of both in one optimise! method.
Double Q Network, aka the actual "twin networks" (as in TD3). Currently they must be implemented manually in the algorithms. They are typically optional though. So for the sake of experimenting, it would be nice to have a wrapper around a q-network approximator to automatically make a double of it and manage forwarding and training automatically.
Ideally, DQN could wrap a standard Approximator or a TargetNetwork. In the latter case, it would encapsulate 4 NN in one struct, all of that seamlessly.
State-Value-approximator and Action-Value-Approximator. Specialized versions of Approximators that would handle the common stuff happening when training these common types of networks. They would provide an interface to the user to, for example, overload the loss function (see the old Add Retrace and a QNetwork abstraction #615).
A clean collection of Bellman Target Operators for value approximator updates: n-step TD, GAE, Retrace, Vtrace. Complemented with documentation on their usage (which trajectory and sampler to use).
A clear distinction between Policies and Learners. It occurred to me that DQN-based use a QBasedPolicy(::Learner) because each algorithm learns differently, but use the learner q_network in the same way. On the other hand, policy gradient algorithms use a custom XXXPolicy struct and that's it. The ActorCritic struct is a candidate for some policy gradient algorithms (eg. PPO, MPO, SAC) but there should be an underlying Actor and a Critic object. Currently the struct is not very general.

I'm kind of using this issue as a notepad but I figured it'd be nice to share this.

The text was updated successfully, but these errors were encountered:

HenriDeh mentioned this issue Aug 17, 2023

TargetNetwork #966

Merged

6 tasks

jeremiahpslewis mentioned this issue Mar 26, 2024

ReinforcementLearning.jl v0.12 #1061

Open

4 tasks

jeremiahpslewis added this to the v0.12 milestone Mar 27, 2024

jeremiahpslewis added the v0.12 label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing features in RLCore #961

Missing features in RLCore #961

HenriDeh commented Aug 11, 2023 •

edited

Loading

Missing features in RLCore #961

Missing features in RLCore #961

Comments

HenriDeh commented Aug 11, 2023 • edited Loading

HenriDeh commented Aug 11, 2023 •

edited

Loading