print_statistics() output of ContinuousA2CBase() might be wrong due to frame update implementation? #275

yutaizhou · 2024-03-06T19:05:17Z

Hello!

Thank you for the excellent library. I may have found a bug in how frame is tracked across training, and it comes from the implementation of where the frame = self.frame // self.num_agents update is inserted, which differs across both ContinuousA2CBase.train() and DiscreteA2CBase.train

In ContinuousA2CBase.train(), the update is inserted before self.frame += curr_frames, which I believe is the wrong implementation. Whereas in DiscreteA2CBase.train(), the update is inserted after self.frame += curr_frames, which I believe is the correct implementation.

After one interation of PPO training using num_envs=512 and horizon_length=16, ContinuousA2CBase.train() prints outs:

fps step: 6744 fps step and policy inference: 6571 fps total: 6360 epoch: 1/500 frames: 0

After modifying the update to be more similar to DiscreteA2CBase.train(), the print out is:

fps step: 6744 fps step and policy inference: 6571 fps total: 6360 epoch: 1/500 frames: 8192

The text was updated successfully, but these errors were encountered:

yutaizhou changed the title ~~print_statistics() of ContinuousA2CBase() might be wrong?~~ print_statistics() output of ContinuousA2CBase() might be wrong due to frame update implementation? Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

print_statistics() output of ContinuousA2CBase() might be wrong due to frame update implementation? #275

print_statistics() output of ContinuousA2CBase() might be wrong due to frame update implementation? #275

yutaizhou commented Mar 6, 2024

print_statistics() output of ContinuousA2CBase() might be wrong due to frame update implementation? #275

print_statistics() output of ContinuousA2CBase() might be wrong due to frame update implementation? #275

Comments

yutaizhou commented Mar 6, 2024