Reproducing the paper result of doom_deadly_corridor in VizDoom #296

sjaelee25 · 2024-03-10T01:49:41Z

I have tried to reproduce the experimental result for Deadly Corridor in VizDoom basic tasks.

the command is referred as in documentation https://www.samplefactory.dev/09-environment-integrations/vizdoom/#reproducing-paper-results
python -m sf_examples.vizdoom.train_vizdoom --train_for_env_steps=500000000 --algo=APPO **--env=doom_deadly_corridor** --env_frameskip=4 --use_rnn=True --num_workers=36 --num_envs_per_worker=8 --num_policies=1 --batch_size=2048 --wide_aspect_ratio=False --experiment=doom_basic_envs

However, the performance is like

while the paper result is as follows

Could you check this issue? Thanks!

The text was updated successfully, but these errors were encountered:

alex-petrenko · 2024-03-11T21:10:00Z

Hi @sjaelee25

I think you might be dealing with two separate issues here.

The reward scale reported in the paper is most likely matching the baseline "A2C" paper. I'm guessing you're training with reward scale 0.1 so you're seeing very different results because WandB logs metrics as observed by the learning algorithm after reward scaling, not the original env rewards. Check your configs for reward scaling, I'm not sure where this is added, last time I looked was ~3 years ago.
as far as I remember, "Deadly Corridor" is a hard exploration environment and standard RL algorithms like PPO don't do very well on this. It's not very good at measuring relative performance of different RL algorithms because there's a lot of variance and it's susceptible to initial conditions very much.

A good test to whether your agent can learn anything better: look at reward_max. Looks like under the random policy initially the agent almost never sees higher rewards compared to what it eventually gets. RL can't learn a behavior it's never able to observe.

Why there's a discrepancy between the current version of the codebase and 2020 version when this experiment was done, I can't say. 1000 different things were changed between then and now. But I remember this scenario was always very high variance.
If you want a simple test, use doom_basic or defend_the_center.
If you want a challenging environment where you can compare different ideas, use battle or battle2.
If you want this one to reliably produce good results, you'll probably have to spend some time on this specifically, add some reward shaping, or some exploration heuristic.

alex-petrenko · 2024-03-11T21:10:41Z

Also I recommend playing the scenario by hand to be able to see what kinds of rewards are achievable with human-level play. This will give you better idea of what's possible and what the agent needs to do.

sjaelee25 · 2024-03-12T07:11:07Z

Thank you for your reply and advice!

As you mentioned, the reward scale for deadly corridor is set to 0.01, while other Vizdoom basic tasks have a scale of 1.0. However, changing the scale to 1.0 or 0.1 does not seem to affect the results significantly, and the variance is also very low when using multiple different random seeds.

While other tasks are also considered, I would like to demonstrate the advantages of my proposed method using deadly_corridor. I apologize for asking about code that developed over 3 years ago, but I am also currently curious of reward shaping (not scaling).
For tasks such as health gathering and battle, there are separate reward shaping functions. I wonder if deadly corridor also had an additional reward function that was removed or such case that reward != episode return.

Thank you again!

alex-petrenko · 2024-03-16T01:03:53Z

I don’t think there was any special reward shaping function. You can check the codebase release version for icml2020 to make sure

this is an exploration task and it’s just very sensitive to initial conditions. Most likely your agent hits an early local minimum and is unable to improve

sjaelee25 · 2024-03-18T07:26:38Z

Okay, thank you for your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing the paper result of doom_deadly_corridor in VizDoom #296

Reproducing the paper result of doom_deadly_corridor in VizDoom #296

sjaelee25 commented Mar 10, 2024

alex-petrenko commented Mar 11, 2024

alex-petrenko commented Mar 11, 2024

sjaelee25 commented Mar 12, 2024 •

edited

Loading

alex-petrenko commented Mar 16, 2024

sjaelee25 commented Mar 18, 2024

Reproducing the paper result of doom_deadly_corridor in VizDoom #296

Reproducing the paper result of doom_deadly_corridor in VizDoom #296

Comments

sjaelee25 commented Mar 10, 2024

alex-petrenko commented Mar 11, 2024

alex-petrenko commented Mar 11, 2024

sjaelee25 commented Mar 12, 2024 • edited Loading

alex-petrenko commented Mar 16, 2024

sjaelee25 commented Mar 18, 2024

sjaelee25 commented Mar 12, 2024 •

edited

Loading