Adds PBT algorithm to rl games #3399

ooctipus · 2025-09-09T08:00:40Z

Description

This PR introduces the Population Based Training algorithm originally implemented in

Petrenko, Aleksei, et al. "Dexpbt: Scaling up dexterous manipulation for hand-arm systems with population based training." arXiv preprint arXiv:2305.12127 (2023).

Pbt algorithm offers a alternative to scaling when increasing number of environment has margin effect.
It takes idea in natural selection and stochastic property in rl-training to always keeps the top performing agent while replace weak agent with top performance to overcome the catastrophic failure, and improve the exploration.

Training view, underperformers are rescued by best performers and later surpasses them and become best performers

Note:
PBT is still at beta phase and has below limitations:

in theory It can work with any rl algorithm but current implementation only works for rl-games
The API could be furthur simplified without needing explicitly input num_policies or policy_idx, which allows for dynamic max_population, but it is for future work

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

Mayankm96 · 2025-09-09T13:13:27Z

Thank you for this feature! Would be good to add the images that we are working on for PBT.

Is PBT just an add-on to the library? Are the assumptions in there specific to RL-Games or we can make it generic enough to use with RSL-RL too?

ooctipus · 2025-09-09T18:01:01Z

Yes PBT should be an addon to the library, currently it is specific design to rl-games by attaching to rl-games module. It is possible to make it generic, but will take more time and work.

# Description This PR introduces the Population Based Training algorithm originally implemented in Petrenko, Aleksei, et al. "Dexpbt: Scaling up dexterous manipulation for hand-arm systems with population based training." arXiv preprint arXiv:2305.12127 (2023). Pbt algorithm offers a alternative to scaling when increasing number of environment has margin effect. It takes idea in natural selection and stochastic property in rl-training to always keeps the top performing agent while replace weak agent with top performance to overcome the catastrophic failure, and improve the exploration. Training view, underperformers are rescued by best performers and later surpasses them and become best performers <img width="1078" height="509" alt="Screenshot from 2025-09-09 00-55-11" src="https://github.com/user-attachments/assets/34434bf1-5cb6-4956-a344-49c9969d4861" /> Note: PBT is still at beta phase and has below limitations: 1. in theory It can work with any rl algorithm but current implementation only works for rl-games 2. The API could be furthur simplified without needing explicitly input num_policies or policy_idx, which allows for dynamic max_population, but it is for future work ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

ooctipus requested review from Mayankm96, Toni-SM, jtigue-bdai, kellyguo11 and pascal-roth as code owners September 9, 2025 08:00

ooctipus force-pushed the pbt branch from 99d608f to fcc5707 Compare September 9, 2025 08:05

ooctipus added 2 commits September 9, 2025 01:08

added pbt

6ffdff4

added change log

b3c3354

ooctipus force-pushed the pbt branch from 2e75590 to b3c3354 Compare September 9, 2025 08:08

ooctipus added 3 commits September 9, 2025 01:12

simplify import

c6fd04c

added docstring

d3a33e9

pass precommit

c83a110

kellyguo11 approved these changes Sep 9, 2025

View reviewed changes

kellyguo11 merged commit 40c8d16 into isaac-sim:main Sep 9, 2025
8 checks passed

ooctipus deleted the pbt branch October 22, 2025 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds PBT algorithm to rl games #3399

Adds PBT algorithm to rl games #3399

Uh oh!

ooctipus commented Sep 9, 2025 •

edited

Loading

Uh oh!

Mayankm96 commented Sep 9, 2025

Uh oh!

ooctipus commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adds PBT algorithm to rl games #3399

Adds PBT algorithm to rl games #3399

Uh oh!

Conversation

ooctipus commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Screenshots

Checklist

Uh oh!

Mayankm96 commented Sep 9, 2025

Uh oh!

ooctipus commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ooctipus commented Sep 9, 2025 •

edited

Loading