Performance gap in Predator-Prey #9

pengzhenghao · 2021-03-16T02:13:08Z

Hi there! Thanks for this excellent repo! The code is really nice and great lifesaver for other researchers!

I am trying to reproduce the result in the paper "Deep Multi-Agent Reinforcement Learning for Decentralised Continuous Cooperative Control" and find that there is some performance gap between my result and the reported results. I think this might due to my carelessness on the hyper-parameters, so I am looking for help in this issue.

Apart from using the default comix.yaml and particle.yaml configs, I additionally introduces these parameters in the config:

# According to paper F.1
batch_size: 1024
gamma: 0.85
lr: 0.01
rnn_hidden_dim: 64
t_max: 2000000
test_interval: 2000
save_model: True
save_model_interval: 200000

And this is the result of COMIX in Continuous Predator-Prey environment with 8 repetitions (no difference in the config, just repeat):

For reference, this is the learning curve in original paper:

For details, I find that final performance (episode return) of 8 trials varying drastically:

so I guess there is some hyper-parameters that I have ignored, which leads to the failure of some trials.

Could anyone help to provide some suggestions on this issue? Thanks!!!

The text was updated successfully, but these errors were encountered:

pengzhenghao · 2021-03-16T02:18:02Z

More information: some trials fail after long running

beipeng · 2021-03-19T16:32:25Z

Hi, Thanks for raising the issue! The hyperparameters you are using for continuous predator-prey look very similar to what we used (assuming you are using batch_size_run=1). We ran 10 different seeds before for this task and didn't see this performance degradation issue. But we did see a similar problem when we use gamma=0.99. We think this is probably due to the q-value overestimation bias in QMIX (can be more severe due to the mixing network), which can be a problem in certain tasks and causes catastrophic performance degradation. So maybe the performance gap you are seeing here is due to some random seeds. COVDN shouldn't have this problem (tend to be quite stable). Maybe you can run COVDN to see if you can get similar result to us to double check if you are using the same hyperparameter setting. Hope that helps!

pengzhenghao · 2021-03-20T07:38:14Z

Thanks @beipeng ! I am using gamma = 0.85 and batch_size_run = 1.

COVDN indeed performs much stable than COMIX and the result is perfectly match the one in paper.

May I ask how to actually repeating the experiment? I can't find clear place to insert the random seed. In my experiment, I just run the same script for multiple time with random seed set globally (like np.random.seed(xx), xx = 0, 100, 200, ..., 700).

Thanks a lot for your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance gap in Predator-Prey #9

Performance gap in Predator-Prey #9

pengzhenghao commented Mar 16, 2021

pengzhenghao commented Mar 16, 2021

beipeng commented Mar 19, 2021 •

edited

Loading

pengzhenghao commented Mar 20, 2021

Performance gap in Predator-Prey #9

Performance gap in Predator-Prey #9

Comments

pengzhenghao commented Mar 16, 2021

pengzhenghao commented Mar 16, 2021

beipeng commented Mar 19, 2021 • edited Loading

pengzhenghao commented Mar 20, 2021

beipeng commented Mar 19, 2021 •

edited

Loading