-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Problem
I need to use TensorBoard to log rewards and other metrics while training a custom task using the rl_games library. The task involves a UR5e robot arm with a processing tool mounted on its flange, which is required to reach multiple waypoints. I have copied rl_games_ppo_cfg.yaml from the Isaac-Reach-UR10-v0 task. The training runs successfully, but unfortunately, there are issues with the logging frequency when using the PPO configuration from rl_games, but not for rsl_rl or skrl library.
Description
These are the logs from one training run with rl_games PPO configuration:
I have altered the following parameters and set them to save_best_after: 10 and save_frequency: 10:
Lines 70 to 71 in 75b6715
| save_best_after: 200 | |
| save_frequency: 100 |
For some reason, the change in metrics happens periodically at epoch 313. These are the logs from the terminal around the change in metrics logging. Apparently, the best reward remains constant until epoch 313 is reached:
This change is also shown in this video for the timestep at 626:
record_sudden_change.mp4
Attempts to fix this error
I have also checked whether this logging issue occurs for the rl_rsl library using rsl_rl_ppo_cfg.py with max_iterations=500, but there the logging does work as intended:
The same for the skrl library with skrl_ppo_cfg.yaml:
UR10 Reach task
This is the Tensorboard logging output from the Isaac-Reach-UR10-v0 task showing two metrics. If I zoom in one logged metric, I can see that the update happens after every 15 epochs:
System Info
- Commit: 75b6715
- Isaac Sim Version: 5.0.0
- OS: Ubuntu 22.04.5 LTS
- GPU: Quadro RTX 6000
- CUDA: 13.0
- GPU Driver: 580.65.0
Checklist
- I have checked that there is no similar issue in the repo
- I have checked that the issue is not in running Isaac Sim itself and is related to the repo