RL libraries training performance comparison #4109

Toni-SM · 2025-11-28T18:56:05Z

Description

Reopening pending PR (closed at that point) for when the cleanup and removal of the internal repository was performed.

This PR updates the agent configuration (to be as similar as possible) for the Isaac-Humanoid-v0 task to ensure a more accurate comparison of the RL libraries when generating the Training Performance table.

To this end:

A common Training time info (e.g.: Training time: XXX.YY seconds) is printed when running existing train.py scripts. Currently the RL libraries output training information in different formats and extends.
A note is added to involved agent configurations to notify/ensure that any modification should be propagated to the other agent configuration files.
The commands used to benchmark the RL libraries is added to docs, for clearness and repro.

Screenshots

Difference between current agent configuration (red) and new agent configuration (green) showing that the new configuration does not represent a radical change in learning

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

…table

greptile-apps · 2025-11-28T18:59:38Z

Greptile Overview

Greptile Summary

This PR standardizes RL library benchmarking by adding consistent training time reporting and aligning agent configurations for the Isaac-Humanoid-v0 task.

Key Changes:

Added standardized training time instrumentation to all 4 training scripts (rsl_rl, rl_games, sb3, skrl)
Added notice headers to agent configs linking them to the performance comparison table
Updated agent configs to be more consistent (observation normalization, value loss coefficients)
Updated documentation with new GPU model (RTX 4090), updated performance numbers, and reproducible benchmark commands

Issues Found:

Critical: sb3_ppo_cfg.yaml has n_timesteps: 3.2e4 which is 2000x too small (should be ~6.6e7 for 500 iterations). While --max_iterations CLI flag overrides this, the default config value is misleading.
Critical: skrl_ppo_cfg.yaml has timesteps: 32000 which is similarly too small
Critical: rl_games_ppo_cfg.yaml has critic_coef: 4 while all other configs use value_loss_coef: 2.0, creating a 2x difference that undermines fair performance comparison

Confidence Score: 2/5

This PR has critical configuration inconsistencies that compromise the fairness of performance benchmarking
The training scripts are correctly instrumented, but three agent configs have critical issues: (1) rl_games uses 2x higher value loss coefficient than others, (2) sb3 and skrl have misleading default timestep values that are 2000x too small. These issues undermine the stated goal of "ensuring accurate comparison of RL libraries"
The agent configuration files need immediate attention: rl_games_ppo_cfg.yaml, sb3_ppo_cfg.yaml, and skrl_ppo_cfg.yaml all have critical configuration errors

Important Files Changed

File Analysis

Filename	Score	Overview
source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/rl_games_ppo_cfg.yaml	2/5	Added notice header, disabled mixed_precision, fixed reward_shaper scale to 1.0, but `critic_coef: 4` is still 2x higher than other configs
source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/rsl_rl_ppo_cfg.py	5/5	Added notice header, enabled observation normalization, changed `value_loss_coef` from 1.0 to 2.0, updated `save_interval` to 100
source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml	2/5	Added notice header, changed `vf_coef` from 0.5 to 2.0, but `n_timesteps: 3.2e4` is critically too small (should be ~6.6e7)
source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/skrl_ppo_cfg.yaml	2/5	Added notice header, removed `rewards_shaper_scale`, updated checkpoint intervals, but `timesteps: 32000` is critically too small

Sequence Diagram

sequenceDiagram
    participant User
    participant TrainScript as train.py
    participant EnvWrapper as VecEnvWrapper
    participant Agent as RL Agent
    participant Runner as Agent Runner
    
    User->>TrainScript: Execute with --task Isaac-Humanoid-v0 --max_iterations 500
    TrainScript->>TrainScript: Load agent config from YAML/PY
    TrainScript->>TrainScript: Override config with CLI args
    Note over TrainScript: Start timing (time.time())
    TrainScript->>EnvWrapper: Create and wrap environment
    EnvWrapper-->>TrainScript: Wrapped environment
    TrainScript->>Agent: Instantiate RL agent with config
    Agent-->>TrainScript: Agent instance
    TrainScript->>Runner: Create runner/trainer
    Runner-->>TrainScript: Runner instance
    TrainScript->>Runner: Start training (500 iterations)
    loop For each iteration (1-500)
        Runner->>EnvWrapper: Collect rollouts (4096 envs × 32 steps)
        EnvWrapper-->>Runner: Experience data
        Runner->>Agent: Update policy (5 epochs, 4 mini-batches)
        Agent-->>Runner: Updated weights
    end
    Runner-->>TrainScript: Training complete
    Note over TrainScript: End timing (time.time())
    TrainScript->>TrainScript: Print "Training time: XXX.YY seconds"
    TrainScript->>User: Training complete with timing info

greptile-apps

Additional Comments (1)

source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/skrl_ppo_cfg.yaml, line 91 (link)

logic: timesteps: 32000 is way too small. For 500 iterations with 4096 envs and 32 rollout steps, this should be ~6.6e7. While --max_iterations overrides this, the default config value should still be reasonable.

_{9 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

...ce/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/rl_games_ppo_cfg.yaml

source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml

kellyguo11 · 2025-12-03T15:52:28Z

@ClemensSchwarke @Mayankm96 @ooctipus could you take a quick look at the training config changes?

ClemensSchwarke · 2025-12-04T14:17:06Z

Looks good to me, but were these configs properly tuned before? If that is the case, then I would also check that the behavior is roughly the same with these changes.

Toni-SM · 2025-12-04T14:50:43Z

@ClemensSchwarke in the Screenshots section of the PR, there is the comparison between the old tuned config and the new one. Charts show behavior is the same.

ClemensSchwarke · 2025-12-04T16:32:15Z

Sorry, I was referring to the locomotion behavior, not the learning behavior. Again, if these were never tuned for good locomotion it doesn't really matter.

Toni-SM added 8 commits August 2, 2025 16:14

Make Isaac-Humanoid-v0 agent config indentical as far as possible

e1c08d5

Print training time

a270caa

Include training commands executed to build the Training Performance …

3e59997

…table

Add note about agent config

9ad22b4

Merge branch 'main' into toni/rl_libraries_perf

e579469

Update agent config notice

7df21d2

Update docs

1fa55cd

Update table

4da2b83

Toni-SM requested review from Mayankm96, jtigue-bdai, kellyguo11 and ooctipus as code owners November 28, 2025 18:56

github-actions bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Nov 28, 2025

greptile-apps bot reviewed Nov 28, 2025

View reviewed changes

...ce/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/rl_games_ppo_cfg.yaml Show resolved Hide resolved

source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml Outdated Show resolved Hide resolved

Toni-SM added 3 commits November 28, 2025 14:19

Revert modified value

020960c

Merge branch 'main' into toni/rl_libraries_perf

fb6cba0

Merge branch 'main' into toni/rl_libraries_perf

0231e77

kellyguo11 approved these changes Dec 3, 2025

View reviewed changes

kellyguo11 added this to Isaac Lab Dec 3, 2025

kellyguo11 moved this to In review in Isaac Lab Dec 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RL libraries training performance comparison #4109

RL libraries training performance comparison #4109

Toni-SM commented Nov 28, 2025

Uh oh!

greptile-apps bot commented Nov 28, 2025

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

kellyguo11 commented Dec 3, 2025

Uh oh!

ClemensSchwarke commented Dec 4, 2025

Uh oh!

Toni-SM commented Dec 4, 2025

Uh oh!

ClemensSchwarke commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RL libraries training performance comparison #4109

Are you sure you want to change the base?

RL libraries training performance comparison #4109

Conversation

Toni-SM commented Nov 28, 2025

Description

Screenshots

Checklist

Uh oh!

greptile-apps bot commented Nov 28, 2025

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

Uh oh!

Uh oh!

kellyguo11 commented Dec 3, 2025

Uh oh!

ClemensSchwarke commented Dec 4, 2025

Uh oh!

Toni-SM commented Dec 4, 2025

Uh oh!

ClemensSchwarke commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot left a comment •

edited

Loading