About target entropy #15

lidongke · 2020-11-17T11:19:45Z

Why sac for discrete use a positive value with -np.log(1.0/acition_space.size()) * 0.98 that log_alpha will be increase to greater than 1.0 and sac for continuous use a negative value with -np.prod(acition_space.size()) that log_alpha will decrease?
@ku2482

toshikwa · 2020-11-18T09:19:28Z

@lidongke.

Although SAC-Discrete paper proposes to set target entropy as "maximum entropy multiplied by 0.98", which is slightly lower than the entropy of the (discrete) uniform distribution, you can use any target.
I assume this target is so large for some environments that alpha would keep increasing. However, categorical distributions approximated by neural networks are fragile to become deterministic, so setting low target entropy may harm the training.

The author of the paper(@p-christ) and I discussed about target entropy here and it may help you.

Thanks.

lidongke · 2020-11-23T04:16:36Z

I don't understand why SAC origin paper use a minimum target entropy but SAC-Discrete paper use a maxmum target entropy. In origin paper the alpha will decrease with gradient decrease,so the entropy ' s effect and the exploration will gradually decrease.But in SAC-Discrete paper the alpha will increase so the entropy ' s effect and the exploration will gradually increase, this is really differernt from the origin SAC paper,i think it is not correct.Am i right?
@ku2482

toshikwa · 2020-11-23T04:39:22Z

@lidongke

I don't understand why SAC origin paper use a minimum target entropy but SAC-Discrete paper use a maxmum target entropy.

SAC (for continuous controls) doesn't use a "minimum" as a target.

But in SAC-Discrete paper the alpha will increase so the entropy ' s effect and the exploration will gradually increase, this is really differernt from the origin SAC paper,i think it is not correct.Am i right?

Alpha doesn't necessarily increase.
Although this setting doesn't have enough theoretical basis, it's not necessarily wrong.
You can use any other settings if you say it's not correct.

I don't see what you want to know, so please make your points clear.

Anyway, thanks.

lidongke · 2020-11-23T07:47:47Z

Thanks.I have a question,why SAC origin paper use -dim(A)? I saw that in paper :"where H is a desired minimum expected entropy".Why the minimum entropy is -dim(A) for a system with continuous action space?
@ku2482

toshikwa · 2020-11-23T09:28:27Z

@lidongke

I don't know why they use -|A| as "the desired minimum entropy".
I assume that it also doesn't have a theoretical basis and SAC (for continuous controls) is robust with a wide range of the value of H.

Thanks.

toshikwa closed this as completed Nov 27, 2020

Howuhh mentioned this issue Sep 23, 2022

SAC-discrete implementation vwxyzjn/cleanrl#270

Merged

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About target entropy #15

About target entropy #15

lidongke commented Nov 17, 2020

toshikwa commented Nov 18, 2020

lidongke commented Nov 23, 2020

toshikwa commented Nov 23, 2020 •

edited

Loading

lidongke commented Nov 23, 2020

toshikwa commented Nov 23, 2020

About target entropy #15

About target entropy #15

Comments

lidongke commented Nov 17, 2020

toshikwa commented Nov 18, 2020

lidongke commented Nov 23, 2020

toshikwa commented Nov 23, 2020 • edited Loading

lidongke commented Nov 23, 2020

toshikwa commented Nov 23, 2020

toshikwa commented Nov 23, 2020 •

edited

Loading