Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO代码存在critic loss无限增长的问题 #10

Open
BroOfBallSis opened this issue May 14, 2023 · 4 comments
Open

PPO代码存在critic loss无限增长的问题 #10

BroOfBallSis opened this issue May 14, 2023 · 4 comments

Comments

@BroOfBallSis
Copy link

使用此处的PPO代码,训练时总是发现critic loss不断增长,甚至会增长到1e18的数量级;
经比较其他地方的PPO代码,怀疑是此处的PPO代码在计算target_value时使用了当前的critic网络来计算batch中state的value,
因此导致值估计越推越高;
将代码改为在replay buffer中存入记录的同时存入state的值估计,而不是在计算target_value时计算state的值估计,
critic loss无限增长的问题得到解决。

@Ethan21435
Copy link

您好,有做结果的保存吗?

@RisingAuroras
Copy link

@BroOfBallSis 请问,你改动后,性能有没有提升呢,critic loss有时确实会出现这种问题,但是只要expected return是上升趋势我觉得还ok

@futalemontea
Copy link

是的 PPO_discrete 的update函数存在问题

@Starlight0798
Copy link

你好,能否要下您的修正代码?

使用此处的PPO代码,训练时总是发现critic loss不断增长,甚至会增长到1e18的数量级; 经比较其他地方的PPO代码,怀疑是此处的PPO代码在计算target_value时使用了当前的critic网络来计算batch中state的value, 因此导致值估计越推越高; 将代码改为在replay buffer中存入记录的同时存入state的值估计,而不是在计算target_value时计算state的值估计, critic loss无限增长的问题得到解决。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants