You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the excellent library. I may have found a bug in how frame is tracked across training, and it comes from the implementation of where the frame = self.frame // self.num_agents update is inserted, which differs across both ContinuousA2CBase.train() and DiscreteA2CBase.train
In ContinuousA2CBase.train(), the update is inserted before self.frame += curr_frames, which I believe is the wrong implementation. Whereas in DiscreteA2CBase.train(), the update is inserted after self.frame += curr_frames, which I believe is the correct implementation.
After one interation of PPO training using num_envs=512 and horizon_length=16, ContinuousA2CBase.train() prints outs:
The text was updated successfully, but these errors were encountered:
yutaizhou
changed the title
print_statistics() of ContinuousA2CBase() might be wrong?
print_statistics() output of ContinuousA2CBase() might be wrong due to frame update implementation?
Mar 6, 2024
Hello!
Thank you for the excellent library. I may have found a bug in how frame is tracked across training, and it comes from the implementation of where the
frame = self.frame // self.num_agents
update is inserted, which differs across bothContinuousA2CBase.train()
andDiscreteA2CBase.train
In
ContinuousA2CBase.train()
, the update is inserted beforeself.frame += curr_frames
, which I believe is the wrong implementation. Whereas inDiscreteA2CBase.train()
, the update is inserted afterself.frame += curr_frames
, which I believe is the correct implementation.After one interation of PPO training using
num_envs=512
andhorizon_length=16
,ContinuousA2CBase.train()
prints outs:After modifying the update to be more similar to
DiscreteA2CBase.train()
, the print out is:The text was updated successfully, but these errors were encountered: