BC Problems with Actor Crititc (PPO) getting action probabilities #8

benspoek · 2022-02-22T10:07:22Z

I am implementing the Stable Baselines3 - Pretraining with Behavior Cloning example for a PPO agent with a discrete action space. However I can not retrieve the logits by the method proposed in the code

latent_pi, _, _ = model._get_latent(data)
logits = model.action_net(latent_pi)
action_prediction = logits

due to a AttributeError: 'ActorCriticPolicy' object has no attribute '_get_latent'. How can i work around that? is there another possibility to get the action probabilities?

araffin · 2022-02-22T10:20:54Z

Hello,
the PPO code was updated but not the BC one apparently...
PPO policy now has a get_distribution() method from which you should be able to extract logits ;)
see https://github.com/DLR-RM/stable-baselines3/blob/52c29dc497fa2eb235d0476b067bed8ac488fe64/stable_baselines3/common/policies.py#L650

A PR that solves this issue is welcomed ;)

araffin added the bug Something isn't working label Feb 22, 2022

araffin closed this as completed in 103fdeb Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BC Problems with Actor Crititc (PPO) getting action probabilities #8

BC Problems with Actor Crititc (PPO) getting action probabilities #8

benspoek commented Feb 22, 2022

araffin commented Feb 22, 2022

BC Problems with Actor Crititc (PPO) getting action probabilities #8

BC Problems with Actor Crititc (PPO) getting action probabilities #8

Comments

benspoek commented Feb 22, 2022

araffin commented Feb 22, 2022