You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def_build_inputs(self, batch, t):
# Assumes homogenous agents with flat observations.# Other MACs might want to e.g. delegate building inputs to each agentbs=batch.batch_sizeinputs= []
inputs.append(batch["obs"][:, t]) # b1avifself.args.obs_last_action:
ift==0:
inputs.append(th.zeros_like(batch["actions_onehot"][:, t]))
else:
inputs.append(batch["actions_onehot"][:, t-1])
ifself.args.obs_agent_id:
inputs.append(th.eye(self.n_agents, device=batch.device).unsqueeze(0).expand(bs, -1, -1))
inputs=th.cat([x.reshape(bs*self.n_agents, -1) forxininputs], dim=1)
returninputs
In your implementation, inputs is constructed with batch["obs"][:, t] and batch["actions_onehot"][:, t-1] rather than action-observation history and action.
The text was updated successfully, but these errors were encountered:
Reading the pseudocode in paper Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
The inputs of agent network is τᵃₜ and uᵃₜ. According to the pseudocode, τ is a list of (oₜ, uₜ₋₁). τᵃ and uᵃ are introduced as following in the paper
However, in the pymarl code, the inputs of agent network seems not τ and u but o and u:
https://github.com/oxwhirl/pymarl/blob/73960e11c5a72e7f9c492d36dbfde02016fde05a/src/controllers/basic_controller.py#L77-92
In your implementation,
inputs
is constructed withbatch["obs"][:, t]
andbatch["actions_onehot"][:, t-1]
rather than action-observation history and action.The text was updated successfully, but these errors were encountered: