Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About target entropy #15

Closed
lidongke opened this issue Nov 17, 2020 · 5 comments
Closed

About target entropy #15

lidongke opened this issue Nov 17, 2020 · 5 comments

Comments

@lidongke
Copy link

Why sac for discrete use a positive value with -np.log(1.0/acition_space.size()) * 0.98 that log_alpha will be increase to greater than 1.0 and sac for continuous use a negative value with -np.prod(acition_space.size()) that log_alpha will decrease?
@ku2482

@toshikwa
Copy link
Owner

@lidongke.

Although SAC-Discrete paper proposes to set target entropy as "maximum entropy multiplied by 0.98", which is slightly lower than the entropy of the (discrete) uniform distribution, you can use any target.
I assume this target is so large for some environments that alpha would keep increasing. However, categorical distributions approximated by neural networks are fragile to become deterministic, so setting low target entropy may harm the training.

The author of the paper(@p-christ) and I discussed about target entropy here and it may help you.

Thanks.

@lidongke
Copy link
Author

I don't understand why SAC origin paper use a minimum target entropy but SAC-Discrete paper use a maxmum target entropy. In origin paper the alpha will decrease with gradient decrease,so the entropy ' s effect and the exploration will gradually decrease.But in SAC-Discrete paper the alpha will increase so the entropy ' s effect and the exploration will gradually increase, this is really differernt from the origin SAC paper,i think it is not correct.Am i right?
@ku2482

@toshikwa
Copy link
Owner

toshikwa commented Nov 23, 2020

@lidongke

I don't understand why SAC origin paper use a minimum target entropy but SAC-Discrete paper use a maxmum target entropy.

SAC (for continuous controls) doesn't use a "minimum" as a target.

But in SAC-Discrete paper the alpha will increase so the entropy ' s effect and the exploration will gradually increase, this is really differernt from the origin SAC paper,i think it is not correct.Am i right?

Alpha doesn't necessarily increase.
Although this setting doesn't have enough theoretical basis, it's not necessarily wrong.
You can use any other settings if you say it's not correct.

I don't see what you want to know, so please make your points clear.

Anyway, thanks.

@lidongke
Copy link
Author

Thanks.I have a question,why SAC origin paper use -dim(A)? I saw that in paper :"where H is a desired minimum expected entropy".Why the minimum entropy is -dim(A) for a system with continuous action space?
@ku2482

@toshikwa
Copy link
Owner

@lidongke

I don't know why they use -|A| as "the desired minimum entropy".
I assume that it also doesn't have a theoretical basis and SAC (for continuous controls) is robust with a wide range of the value of H.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants