-
Notifications
You must be signed in to change notification settings - Fork 2.8k
RL libraries training performance comparison #4109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Greptile OverviewGreptile SummaryThis PR standardizes RL library benchmarking by adding consistent training time reporting and aligning agent configurations for the Key Changes:
Issues Found:
Confidence Score: 2/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant TrainScript as train.py
participant EnvWrapper as VecEnvWrapper
participant Agent as RL Agent
participant Runner as Agent Runner
User->>TrainScript: Execute with --task Isaac-Humanoid-v0 --max_iterations 500
TrainScript->>TrainScript: Load agent config from YAML/PY
TrainScript->>TrainScript: Override config with CLI args
Note over TrainScript: Start timing (time.time())
TrainScript->>EnvWrapper: Create and wrap environment
EnvWrapper-->>TrainScript: Wrapped environment
TrainScript->>Agent: Instantiate RL agent with config
Agent-->>TrainScript: Agent instance
TrainScript->>Runner: Create runner/trainer
Runner-->>TrainScript: Runner instance
TrainScript->>Runner: Start training (500 iterations)
loop For each iteration (1-500)
Runner->>EnvWrapper: Collect rollouts (4096 envs × 32 steps)
EnvWrapper-->>Runner: Experience data
Runner->>Agent: Update policy (5 epochs, 4 mini-batches)
Agent-->>Runner: Updated weights
end
Runner-->>TrainScript: Training complete
Note over TrainScript: End timing (time.time())
TrainScript->>TrainScript: Print "Training time: XXX.YY seconds"
TrainScript->>User: Training complete with timing info
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/skrl_ppo_cfg.yaml, line 91 (link)logic:
timesteps: 32000is way too small. For 500 iterations with 4096 envs and 32 rollout steps, this should be ~6.6e7. While--max_iterationsoverrides this, the default config value should still be reasonable.
9 files reviewed, 3 comments
...ce/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/rl_games_ppo_cfg.yaml
Show resolved
Hide resolved
source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml
Outdated
Show resolved
Hide resolved
|
@ClemensSchwarke @Mayankm96 @ooctipus could you take a quick look at the training config changes? |
|
Looks good to me, but were these configs properly tuned before? If that is the case, then I would also check that the behavior is roughly the same with these changes. |
|
@ClemensSchwarke in the Screenshots section of the PR, there is the comparison between the old tuned config and the new one. Charts show behavior is the same. |
|
Sorry, I was referring to the locomotion behavior, not the learning behavior. Again, if these were never tuned for good locomotion it doesn't really matter. |
Description
This PR updates the agent configuration (to be as similar as possible) for the
Isaac-Humanoid-v0task to ensure a more accurate comparison of the RL libraries when generating the Training Performance table.To this end:
Training time: XXX.YY seconds) is printed when running existingtrain.pyscripts. Currently the RL libraries output training information in different formats and extends.Screenshots
Difference between current agent configuration (red) and new agent configuration (green) showing that the new configuration does not represent a radical change in learning
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there