Incorrect Results from get_last_reward() when Replaying LMP in Mode.PLAYER #412

mhe500 · 2019-10-20T15:31:16Z

Issue: Playback of .lmp using Mode.PLAYER results in game.get_last_reward() sometimes combining rewards of two time steps.

ViZDoom version 1.1.7
Platform: Ubuntu 18.04, Python 3.7.3

To reproduce:

Record an episode of basic.cfg/basic.wad (as bundled with ViZDoom) doing nothing (timeout = 300 tics, living reward = -1.0). Every step should have a reward of exactly -1.0 and total return should be -300.0 (see .lmp attached, remove .txt extension, basic-stand-there-do-nothing-300-timeout.lmp.txt)
Replay using script (attached, simple_replay.py.txt) based off of https://github.com/mwydmuch/ViZDoom/blob/master/examples/python/record_episodes.py. Replay with Mode.PLAYER results in return < -300.0. Replay with Mode.SPECTATOR results in correct return of -300.0.

This implies either a bug in the code, or that the ~~documentation~~ example which states that replay can be used in any mode should be updated to indicate Mode.SPECTATOR is required (and possibly a validation in the code that disallows replaying LMP files in Mode.PLAYER).

Separately, the example also states that game.get_last_action() is not supported for replay, which it is now.

Thank you!

Example output when using Mode.PLAYER:

REPLAY OF EPISODE
************************

State #1
Game variables: 50.0
Reward: -1.0
=====================
State #2
Game variables: 50.0
Reward: -2.0
=====================
State #3
Game variables: 50.0
Reward: -2.0
=====================
State #4
Game variables: 50.0
Reward: -2.0
=====================
...
=====================
State #299
Game variables: 50.0
Reward: -1.0
=====================
Episode finished.
total reward: -355.0
************************

The text was updated successfully, but these errors were encountered:

Miffyli · 2019-10-21T08:37:20Z

Nice catch. Tested on Windows 10, Python 3.6.8 and ViZDoom 1.1.8pre and I am getting same results (does not work with PLAYER mode, but works with SPECTATOR mode). The sleep trick from #354 does not help here, although they might be related.

mhe500 · 2019-10-21T13:40:52Z

This feels like a race condition and I think my assertion that "it works in Mode.SPECTATOR" is not correct.

If I insert a sleep in the loop and print state.tic in addition to state.number, in Mode.SPECTATOR the results become very far off. Tic increments by more than 1 on each iteration, the loop runs fewer than 300 times and the rewards are duplicated:

Example output with Mode.SPECTATOR when adding sleep(0.1) into the loop and also printing the state.tic.

...
=====================
State #80 state.tic=289
Game variables: 50.0
Reward: -5.0
=====================
State #81 state.tic=293
Game variables: 50.0
Reward: -4.0
=====================
State #82 state.tic=296
Game variables: 50.0
Reward: -5.0
=====================
State #83 state.tic=300
Game variables: 50.0
Reward: -4.0
=====================
State #84 state.tic=303
Game variables: 50.0
Reward: -5.0
=====================
State #85 state.tic=307
Game variables: 50.0
Reward: -4.0
=====================
Episode finished.
total reward: -380.0
************************

However, if I do the same in Mode.player, the result is still wrong but it's "less wrong" since I get about the correct number of iterations (300) and state.tic seems to increment by about 1 each time, though sometimes by +2 and sometimes by 0.

Net, net, this feels like a race condition, but the effects are somewhat more limited in Mode.PLAYER. Though the living reward is effectively doubled. In terms of training this means that transitions from an expert trajectory replayed using this method actually look less appealing than those found randomly!

Miffyli · 2019-10-21T14:12:16Z

Thanks for pointing this out! I first found it weird SPECTATOR mode worked better since it was designed to be used by human players (and hence closer to asynchronous behavior than synchronized). I will fix the FAQ according to this discovery :)

mhe500 · 2019-10-21T15:59:50Z

Well to be clear: Mode.SPECTATOR works better only when the Python script's loop is able to run quickly enough -- put in a sleep and the result is worse than with Mode.PLAYER. In all cases I think there is a race condition and neither SPECTATOR nor PLAYER is producing the right result.

I've experimented a bit and if I change ViZDoomGame.cpp as follows:

diff --git a/src/lib/ViZDoomGame.cpp b/src/lib/ViZDoomGame.cpp
index 54d1764..cae7529 100644
--- a/src/lib/ViZDoomGame.cpp
+++ b/src/lib/ViZDoomGame.cpp
@@ -215,7 +215,10 @@ namespace vizdoom {
         this->summaryReward += reward;
         this->lastReward = reward;
 
-        this->lastMapTic = this->doomController->getMapTic();
+        if (this->doomController->isRunDoomAsync())
+            this->lastMapTic = this->doomController->getMapTic();
+        else
+            this->lastMapTic = this->doomController->getMapLastTic();
 
         /* Update state */
         if (!this->isEpisodeFinished()) {

My belief is that for synchronous cases, it seems more correct for the DoomGame to get the last map tic executed by the user of the DoomGame using the value that the DoomController holds, rather than using DoomController::getMapTic which queries shared memory that may have been updated by Doom running in a separate thread.

In testing of cases of human game play (SPECTATOR), .lmp playback (PLAYER) and agent play (PLAYER) I get the results I expect. However, given I don't really know the internals of ViZDoom, I'm not really sure whether this is a reasonable fix or not, but for now I will try this as a workaround.

Miffyli · 2019-10-21T16:02:38Z

In the words of @mwydmuch , the whole project is a hack on top of ZDoom so solutions tend not to be too pretty. If this solution works reliably for you, it would be quite dandy if you could make a PR out of it :)

mhe500 · 2019-10-21T16:06:17Z

Let me play with it a bit and if it works I will gladly submit a PR. Thanks!

mhe500 · 2019-10-21T17:44:08Z

One more update:

I think the below is an additional problem. The missing waitForDoomWork() causes the controller and VizDoom to become out of sync and subsequent TIC/UPDATE/TIC+UPDATE requests from the controller receive an acknowledgement that should have been paired with the prior message.

The two changes I made seem to be working well for me but needs additional testing.

diff --git a/src/lib/ViZDoomController.cpp b/src/lib/ViZDoomController.cpp
index daf6bba..faafbc8 100644
--- a/src/lib/ViZDoomController.cpp
+++ b/src/lib/ViZDoomController.cpp
@@ -441,6 +441,7 @@ namespace vizdoom {
             // Workaround for some problems
             this->sendCommand(std::string("map ") + this->map);
             this->MQDoom->send(MSG_CODE_TIC);
+            this->waitForDoomWork();
 
             this->sendCommand(std::string("playdemo ") + prepareLmpFilePath(demoPath));

Else DoomController becomes out of sync with Doom. See: Farama-Foundation#412

running synchronously. See Farama-Foundation#412.

mwydmuch · 2019-11-03T14:40:42Z

Fixed by #415

Miffyli added the bug label Oct 21, 2019

mhe500 added a commit to multitude-ai/ViZDoom that referenced this issue Oct 23, 2019

waitForDoomWork() after sending MSG_CODE_TIC.

6941cb6

Else DoomController becomes out of sync with Doom. See: Farama-Foundation#412

mhe500 added a commit to multitude-ai/ViZDoom that referenced this issue Oct 23, 2019

Get last tic from DoomController::getMapLastTic when

3b4b1bc

running synchronously. See Farama-Foundation#412.

mhe500 mentioned this issue Oct 23, 2019

Vizdoom vizdoomcontroller sync fix #415

Merged

mwydmuch closed this as completed Nov 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Results from get_last_reward() when Replaying LMP in Mode.PLAYER #412

Incorrect Results from get_last_reward() when Replaying LMP in Mode.PLAYER #412

mhe500 commented Oct 20, 2019 •

edited

Loading

Miffyli commented Oct 21, 2019 •

edited

Loading

mhe500 commented Oct 21, 2019

Miffyli commented Oct 21, 2019

mhe500 commented Oct 21, 2019

Miffyli commented Oct 21, 2019

mhe500 commented Oct 21, 2019

mhe500 commented Oct 21, 2019

mwydmuch commented Nov 3, 2019

Incorrect Results from get_last_reward() when Replaying LMP in Mode.PLAYER #412

Incorrect Results from get_last_reward() when Replaying LMP in Mode.PLAYER #412

Comments

mhe500 commented Oct 20, 2019 • edited Loading

Miffyli commented Oct 21, 2019 • edited Loading

mhe500 commented Oct 21, 2019

Miffyli commented Oct 21, 2019

mhe500 commented Oct 21, 2019

Miffyli commented Oct 21, 2019

mhe500 commented Oct 21, 2019

mhe500 commented Oct 21, 2019

mwydmuch commented Nov 3, 2019

mhe500 commented Oct 20, 2019 •

edited

Loading

Miffyli commented Oct 21, 2019 •

edited

Loading