Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing features in RLCore #961

Open
1 of 5 tasks
HenriDeh opened this issue Aug 11, 2023 · 0 comments
Open
1 of 5 tasks

Missing features in RLCore #961

HenriDeh opened this issue Aug 11, 2023 · 0 comments
Labels
Milestone

Comments

@HenriDeh
Copy link
Member

HenriDeh commented Aug 11, 2023

Giving the discussion in #960, I'd make a list of components of RLCore that I feel are missing or incomplete.

  • Target networks: they are currently named TwinNetwork. This is incorrect, twin_networks refer to another concept in RL. They would wrap an approximator, hold a target version of it, and handle the updating of both in one optimise! method.
  • Double Q Network, aka the actual "twin networks" (as in TD3). Currently they must be implemented manually in the algorithms. They are typically optional though. So for the sake of experimenting, it would be nice to have a wrapper around a q-network approximator to automatically make a double of it and manage forwarding and training automatically.
    Ideally, DQN could wrap a standard Approximator or a TargetNetwork. In the latter case, it would encapsulate 4 NN in one struct, all of that seamlessly.
  • State-Value-approximator and Action-Value-Approximator. Specialized versions of Approximators that would handle the common stuff happening when training these common types of networks. They would provide an interface to the user to, for example, overload the loss function (see the old Add Retrace and a QNetwork abstraction #615).
  • A clean collection of Bellman Target Operators for value approximator updates: n-step TD, GAE, Retrace, Vtrace. Complemented with documentation on their usage (which trajectory and sampler to use).
  • A clear distinction between Policies and Learners. It occurred to me that DQN-based use a QBasedPolicy(::Learner) because each algorithm learns differently, but use the learner q_network in the same way. On the other hand, policy gradient algorithms use a custom XXXPolicy struct and that's it. The ActorCritic struct is a candidate for some policy gradient algorithms (eg. PPO, MPO, SAC) but there should be an underlying Actor and a Critic object. Currently the struct is not very general.

I'm kind of using this issue as a notepad but I figured it'd be nice to share this.

@HenriDeh HenriDeh mentioned this issue Aug 17, 2023
6 tasks
@jeremiahpslewis jeremiahpslewis added this to the v0.12 milestone Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants