Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updat value function with different action types, why? #32

Open
tessavdheiden opened this issue Sep 28, 2020 · 1 comment
Open

Updat value function with different action types, why? #32

tessavdheiden opened this issue Sep 28, 2020 · 1 comment

Comments

@tessavdheiden
Copy link

Hi Shariq,

In your code you update the value function with actions computed by:

  1. gumbel_softmax
  2. onehot_from_logits

As far as I know, 1) has the gradient attached, while 2) does not.

Why did you implemented it this way?

@uhlajs
Copy link

uhlajs commented Dec 29, 2020

Hi @tessavdheiden,
I believe that in one call of update you want to update actor just for one actor (namely for actor attach to agent_i). That is the reason why you send one action sample with gradient attached (representing the action of agent agent_i) and others without.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants