Skip to content

Conversation

@ooctipus
Copy link
Collaborator

@ooctipus ooctipus commented Sep 9, 2025

Description

This PR introduces the Population Based Training algorithm originally implemented in

Petrenko, Aleksei, et al. "Dexpbt: Scaling up dexterous manipulation for hand-arm systems with population based training." arXiv preprint arXiv:2305.12127 (2023).

Pbt algorithm offers a alternative to scaling when increasing number of environment has margin effect.
It takes idea in natural selection and stochastic property in rl-training to always keeps the top performing agent while replace weak agent with top performance to overcome the catastrophic failure, and improve the exploration.

Training view, underperformers are rescued by best performers and later surpasses them and become best performers
Screenshot from 2025-09-09 00-55-11

Note:
PBT is still at beta phase and has below limitations:

  1. in theory It can work with any rl algorithm but current implementation only works for rl-games
  2. The API could be furthur simplified without needing explicitly input num_policies or policy_idx, which allows for dynamic max_population, but it is for future work

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@Mayankm96
Copy link
Contributor

Thank you for this feature! Would be good to add the images that we are working on for PBT.

Is PBT just an add-on to the library? Are the assumptions in there specific to RL-Games or we can make it generic enough to use with RSL-RL too?

@ooctipus
Copy link
Collaborator Author

ooctipus commented Sep 9, 2025

Yes PBT should be an addon to the library, currently it is specific design to rl-games by attaching to rl-games module. It is possible to make it generic, but will take more time and work.

@kellyguo11 kellyguo11 merged commit 40c8d16 into isaac-sim:main Sep 9, 2025
8 checks passed
kellyguo11 pushed a commit that referenced this pull request Sep 9, 2025
# Description

This PR introduces the Population Based Training algorithm originally
implemented in

Petrenko, Aleksei, et al. "Dexpbt: Scaling up dexterous manipulation for
hand-arm systems with population based training." arXiv preprint
arXiv:2305.12127 (2023).

Pbt algorithm offers a alternative to scaling when increasing number of
environment has margin effect.
It takes idea in natural selection and stochastic property in
rl-training to always keeps the top performing agent while replace weak
agent with top performance to overcome the catastrophic failure, and
improve the exploration.

Training view, underperformers are rescued by best performers and later
surpasses them and become best performers
<img width="1078" height="509" alt="Screenshot from 2025-09-09 00-55-11"
src="https://github.com/user-attachments/assets/34434bf1-5cb6-4956-a344-49c9969d4861"
/>


Note:
PBT is still at beta phase and has below limitations:

1. in theory It can work with any rl algorithm but current
implementation only works for rl-games
2. The API could be furthur simplified without needing explicitly input
num_policies or policy_idx, which allows for dynamic max_population, but
it is for future work

## Screenshots

Please attach before and after screenshots of the change if applicable.

<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->
ooctipus added a commit to ooctipus/IsaacLab that referenced this pull request Sep 20, 2025
# Description

This PR introduces the Population Based Training algorithm originally
implemented in

Petrenko, Aleksei, et al. "Dexpbt: Scaling up dexterous manipulation for
hand-arm systems with population based training." arXiv preprint
arXiv:2305.12127 (2023).

Pbt algorithm offers a alternative to scaling when increasing number of
environment has margin effect.
It takes idea in natural selection and stochastic property in
rl-training to always keeps the top performing agent while replace weak
agent with top performance to overcome the catastrophic failure, and
improve the exploration.

Training view, underperformers are rescued by best performers and later
surpasses them and become best performers
<img width="1078" height="509" alt="Screenshot from 2025-09-09 00-55-11"
src="https://github.com/user-attachments/assets/34434bf1-5cb6-4956-a344-49c9969d4861"
/>


Note:
PBT is still at beta phase and has below limitations:

1. in theory It can work with any rl algorithm but current
implementation only works for rl-games
2. The API could be furthur simplified without needing explicitly input
num_policies or policy_idx, which allows for dynamic max_population, but
it is for future work

## Screenshots

Please attach before and after screenshots of the change if applicable.

<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->
@ooctipus ooctipus deleted the pbt branch October 22, 2025 21:38
george-nehma pushed a commit to george-nehma/IsaacLab-Dreamerv3 that referenced this pull request Oct 24, 2025
# Description

This PR introduces the Population Based Training algorithm originally
implemented in

Petrenko, Aleksei, et al. "Dexpbt: Scaling up dexterous manipulation for
hand-arm systems with population based training." arXiv preprint
arXiv:2305.12127 (2023).

Pbt algorithm offers a alternative to scaling when increasing number of
environment has margin effect.
It takes idea in natural selection and stochastic property in
rl-training to always keeps the top performing agent while replace weak
agent with top performance to overcome the catastrophic failure, and
improve the exploration.

Training view, underperformers are rescued by best performers and later
surpasses them and become best performers
<img width="1078" height="509" alt="Screenshot from 2025-09-09 00-55-11"
src="https://github.com/user-attachments/assets/34434bf1-5cb6-4956-a344-49c9969d4861"
/>


Note:
PBT is still at beta phase and has below limitations:

1. in theory It can work with any rl algorithm but current
implementation only works for rl-games
2. The API could be furthur simplified without needing explicitly input
num_policies or policy_idx, which allows for dynamic max_population, but
it is for future work

## Screenshots

Please attach before and after screenshots of the change if applicable.

<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants