-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About target entropy #15
Comments
Although SAC-Discrete paper proposes to set target entropy as "maximum entropy multiplied by 0.98", which is slightly lower than the entropy of the (discrete) uniform distribution, you can use any target. The author of the paper(@p-christ) and I discussed about target entropy here and it may help you. Thanks. |
I don't understand why SAC origin paper use a minimum target entropy but SAC-Discrete paper use a maxmum target entropy. In origin paper the alpha will decrease with gradient decrease,so the entropy ' s effect and the exploration will gradually decrease.But in SAC-Discrete paper the alpha will increase so the entropy ' s effect and the exploration will gradually increase, this is really differernt from the origin SAC paper,i think it is not correct.Am i right? |
SAC (for continuous controls) doesn't use a "minimum" as a target.
Alpha doesn't necessarily increase. I don't see what you want to know, so please make your points clear. Anyway, thanks. |
Thanks.I have a question,why SAC origin paper use -dim(A)? I saw that in paper :"where H is a desired minimum expected entropy".Why the minimum entropy is -dim(A) for a system with continuous action space? |
I don't know why they use -|A| as "the desired minimum entropy". Thanks. |
Why sac for discrete use a positive value with -np.log(1.0/acition_space.size()) * 0.98 that log_alpha will be increase to greater than 1.0 and sac for continuous use a negative value with -np.prod(acition_space.size()) that log_alpha will decrease?
@ku2482
The text was updated successfully, but these errors were encountered: