We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用此处的PPO代码,训练时总是发现critic loss不断增长,甚至会增长到1e18的数量级; 经比较其他地方的PPO代码,怀疑是此处的PPO代码在计算target_value时使用了当前的critic网络来计算batch中state的value, 因此导致值估计越推越高; 将代码改为在replay buffer中存入记录的同时存入state的值估计,而不是在计算target_value时计算state的值估计, critic loss无限增长的问题得到解决。
The text was updated successfully, but these errors were encountered:
您好,有做结果的保存吗?
Sorry, something went wrong.
@BroOfBallSis 请问,你改动后,性能有没有提升呢,critic loss有时确实会出现这种问题,但是只要expected return是上升趋势我觉得还ok
是的 PPO_discrete 的update函数存在问题
你好,能否要下您的修正代码?
No branches or pull requests
使用此处的PPO代码,训练时总是发现critic loss不断增长,甚至会增长到1e18的数量级;
经比较其他地方的PPO代码,怀疑是此处的PPO代码在计算target_value时使用了当前的critic网络来计算batch中state的value,
因此导致值估计越推越高;
将代码改为在replay buffer中存入记录的同时存入state的值估计,而不是在计算target_value时计算state的值估计,
critic loss无限增长的问题得到解决。
The text was updated successfully, but these errors were encountered: