Policy sampling issue in PSRO rollout module #63

zkengz · 2024-04-09T13:21:44Z

I greatly appreciate your contribution! The lib helps a lot, especially considering the scarcity of open-source code related to PSRO in large games.

There seems to be a bug in rollout module of PSRO, specifically in the following section:

malib/malib/rollout/inference/ray/server.py

Line 115 in ea37d5d

spec_policy_id = spec.sample()

It appears that the policy of an agent is sampled every timestep, whereas I believe it should be sampled every episode.

By the way, the code is written with Ray as well as multi-threading and I find it difficult to debug (naive ray.init(local_mode = True) doesn't work). Could you suggest any elegant way of debugging?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy sampling issue in PSRO rollout module #63

Policy sampling issue in PSRO rollout module #63

zkengz commented Apr 9, 2024

Policy sampling issue in PSRO rollout module #63

Policy sampling issue in PSRO rollout module #63

Comments

zkengz commented Apr 9, 2024