Skip to content

Conversation

@Toni-SM
Copy link
Contributor

@Toni-SM Toni-SM commented Nov 28, 2025

Description

Reopening pending PR (closed at that point) for when the cleanup and removal of the internal repository was performed.

This PR updates the agent configuration (to be as similar as possible) for the Isaac-Humanoid-v0 task to ensure a more accurate comparison of the RL libraries when generating the Training Performance table.

To this end:

  1. A common Training time info (e.g.: Training time: XXX.YY seconds) is printed when running existing train.py scripts. Currently the RL libraries output training information in different formats and extends.
  2. A note is added to involved agent configurations to notify/ensure that any modification should be propagated to the other agent configuration files.
  3. The commands used to benchmark the RL libraries is added to docs, for clearness and repro.

Screenshots

Difference between current agent configuration (red) and new agent configuration (green) showing that the new configuration does not represent a radical change in learning

Screenshot from 2025-11-28 13-19-14

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Nov 28, 2025
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 28, 2025

Greptile Overview

Greptile Summary

This PR standardizes RL library benchmarking by adding consistent training time reporting and aligning agent configurations for the Isaac-Humanoid-v0 task.

Key Changes:

  • Added standardized training time instrumentation to all 4 training scripts (rsl_rl, rl_games, sb3, skrl)
  • Added notice headers to agent configs linking them to the performance comparison table
  • Updated agent configs to be more consistent (observation normalization, value loss coefficients)
  • Updated documentation with new GPU model (RTX 4090), updated performance numbers, and reproducible benchmark commands

Issues Found:

  • Critical: sb3_ppo_cfg.yaml has n_timesteps: 3.2e4 which is 2000x too small (should be ~6.6e7 for 500 iterations). While --max_iterations CLI flag overrides this, the default config value is misleading.
  • Critical: skrl_ppo_cfg.yaml has timesteps: 32000 which is similarly too small
  • Critical: rl_games_ppo_cfg.yaml has critic_coef: 4 while all other configs use value_loss_coef: 2.0, creating a 2x difference that undermines fair performance comparison

Confidence Score: 2/5

  • This PR has critical configuration inconsistencies that compromise the fairness of performance benchmarking
  • The training scripts are correctly instrumented, but three agent configs have critical issues: (1) rl_games uses 2x higher value loss coefficient than others, (2) sb3 and skrl have misleading default timestep values that are 2000x too small. These issues undermine the stated goal of "ensuring accurate comparison of RL libraries"
  • The agent configuration files need immediate attention: rl_games_ppo_cfg.yaml, sb3_ppo_cfg.yaml, and skrl_ppo_cfg.yaml all have critical configuration errors

Important Files Changed

File Analysis

Filename Score Overview
source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/rl_games_ppo_cfg.yaml 2/5 Added notice header, disabled mixed_precision, fixed reward_shaper scale to 1.0, but critic_coef: 4 is still 2x higher than other configs
source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/rsl_rl_ppo_cfg.py 5/5 Added notice header, enabled observation normalization, changed value_loss_coef from 1.0 to 2.0, updated save_interval to 100
source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml 2/5 Added notice header, changed vf_coef from 0.5 to 2.0, but n_timesteps: 3.2e4 is critically too small (should be ~6.6e7)
source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/skrl_ppo_cfg.yaml 2/5 Added notice header, removed rewards_shaper_scale, updated checkpoint intervals, but timesteps: 32000 is critically too small

Sequence Diagram

sequenceDiagram
    participant User
    participant TrainScript as train.py
    participant EnvWrapper as VecEnvWrapper
    participant Agent as RL Agent
    participant Runner as Agent Runner
    
    User->>TrainScript: Execute with --task Isaac-Humanoid-v0 --max_iterations 500
    TrainScript->>TrainScript: Load agent config from YAML/PY
    TrainScript->>TrainScript: Override config with CLI args
    Note over TrainScript: Start timing (time.time())
    TrainScript->>EnvWrapper: Create and wrap environment
    EnvWrapper-->>TrainScript: Wrapped environment
    TrainScript->>Agent: Instantiate RL agent with config
    Agent-->>TrainScript: Agent instance
    TrainScript->>Runner: Create runner/trainer
    Runner-->>TrainScript: Runner instance
    TrainScript->>Runner: Start training (500 iterations)
    loop For each iteration (1-500)
        Runner->>EnvWrapper: Collect rollouts (4096 envs × 32 steps)
        EnvWrapper-->>Runner: Experience data
        Runner->>Agent: Update policy (5 epochs, 4 mini-batches)
        Agent-->>Runner: Updated weights
    end
    Runner-->>TrainScript: Training complete
    Note over TrainScript: End timing (time.time())
    TrainScript->>TrainScript: Print "Training time: XXX.YY seconds"
    TrainScript->>User: Training complete with timing info
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/skrl_ppo_cfg.yaml, line 91 (link)

    logic: timesteps: 32000 is way too small. For 500 iterations with 4096 envs and 32 rollout steps, this should be ~6.6e7. While --max_iterations overrides this, the default config value should still be reasonable.

9 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@kellyguo11
Copy link
Contributor

@ClemensSchwarke @Mayankm96 @ooctipus could you take a quick look at the training config changes?

@kellyguo11 kellyguo11 moved this to In review in Isaac Lab Dec 3, 2025
@ClemensSchwarke
Copy link
Collaborator

Looks good to me, but were these configs properly tuned before? If that is the case, then I would also check that the behavior is roughly the same with these changes.

@Toni-SM
Copy link
Contributor Author

Toni-SM commented Dec 4, 2025

@ClemensSchwarke in the Screenshots section of the PR, there is the comparison between the old tuned config and the new one. Charts show behavior is the same.

@ClemensSchwarke
Copy link
Collaborator

Sorry, I was referring to the locomotion behavior, not the learning behavior. Again, if these were never tuned for good locomotion it doesn't really matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants