You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Giving the discussion in #960, I'd make a list of components of RLCore that I feel are missing or incomplete.
Target networks: they are currently named TwinNetwork. This is incorrect, twin_networks refer to another concept in RL. They would wrap an approximator, hold a target version of it, and handle the updating of both in one optimise! method.
Double Q Network, aka the actual "twin networks" (as in TD3). Currently they must be implemented manually in the algorithms. They are typically optional though. So for the sake of experimenting, it would be nice to have a wrapper around a q-network approximator to automatically make a double of it and manage forwarding and training automatically.
Ideally, DQN could wrap a standard Approximator or a TargetNetwork. In the latter case, it would encapsulate 4 NN in one struct, all of that seamlessly.
State-Value-approximator and Action-Value-Approximator. Specialized versions of Approximators that would handle the common stuff happening when training these common types of networks. They would provide an interface to the user to, for example, overload the loss function (see the old Add Retrace and a QNetwork abstraction #615).
A clean collection of Bellman Target Operators for value approximator updates: n-step TD, GAE, Retrace, Vtrace. Complemented with documentation on their usage (which trajectory and sampler to use).
A clear distinction between Policies and Learners. It occurred to me that DQN-based use a QBasedPolicy(::Learner) because each algorithm learns differently, but use the learner q_network in the same way. On the other hand, policy gradient algorithms use a custom XXXPolicy struct and that's it. The ActorCritic struct is a candidate for some policy gradient algorithms (eg. PPO, MPO, SAC) but there should be an underlying Actor and a Critic object. Currently the struct is not very general.
I'm kind of using this issue as a notepad but I figured it'd be nice to share this.
The text was updated successfully, but these errors were encountered:
Giving the discussion in #960, I'd make a list of components of RLCore that I feel are missing or incomplete.
Ideally, DQN could wrap a standard Approximator or a TargetNetwork. In the latter case, it would encapsulate 4 NN in one struct, all of that seamlessly.
XXXPolicy
struct and that's it. The ActorCritic struct is a candidate for some policy gradient algorithms (eg. PPO, MPO, SAC) but there should be an underlying Actor and a Critic object. Currently the struct is not very general.I'm kind of using this issue as a notepad but I figured it'd be nice to share this.
The text was updated successfully, but these errors were encountered: