Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Unexpected ViZDoom instance crash" during multi-threading training #169

Open
pengsun opened this issue Jan 15, 2017 · 13 comments
Open

"Unexpected ViZDoom instance crash" during multi-threading training #169

pengsun opened this issue Jan 15, 2017 · 13 comments

Comments

@pengsun
Copy link

pengsun commented Jan 15, 2017

Hi,

I ran a A3C style training and ViZDoom crashed occasionally (The same lua/Torch code ran well with ALE environment across a wide range of games). The more threads I started, the faster it crashed... The error information is like:

*** Fatal Error ***
Address not mapped to object (signal 11)
Address: 0x21

Generating vizdoom-crash.log and killing process 55494, please wait... tid = 1, error:
ViZDoomErrorException: Unexpected ViZDoom instance crash.
stack traceback:
        [C]: in function 'makeAction'
        ./env/EnvVizdoomTTT2.lua:162: in function 'takeAction'
        ./A3CTrainer.lua:112: in function 'doLoop'
        train-async.lua:266: in function <train-async.lua:265>
        [C]: in function 'xpcall'
        train-async.lua:89: in function <train-async.lua:88>
        [C]: in function 'xpcall'
        ...xxx/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
        .../xxx/torch/install/share/lua/5.1/threads/queue.lua:65: in function <.../xxx/torch/install/share/lua/5.1/threads/queue.lua:41>
        [C]: in function 'pcall'
        .../xxx/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
        [string "  local Queue = require 'threads.queue'..."]:13: in main chunk
sh: xmessage: command not found
Segmentation fault

and the vizdoom-crash.log is like:

*** Fatal Error ***
Address not mapped to object (signal 11)
Address: 0x33a640

System: Linux TENCENT64.site 3.10.104-1-tlinux2-0041.tl2 #1 SMP Fri Oct 28 20:58:27 CST 2016 x86_64 x86_64 x86_64 GNU/Linux

ViZDoom version 1.1.1 (ZDOOM 2.8.1) ()
Compiler version: 4.8.5 20150623 (Red Hat 4.8.5-4)

Command line: /xxx/torch/install/lib/lua/5.1/vizdoom/vizdoom -iwad /xxx/torch/install/lib/lua/5.1/vizdoom/freedoom2.wad -config _vizdoom.ini -skill 3 -width 160 -height 120 +vid_aspect 0 +fullscreen 0 +viz_controlled 1 +viz_instance_id DHnwSkInWu +use_mouse 0 -rngseed 3106951485 +viz_render_mode 3661 +viz_noconsole 1 +viz_screen_format 1 +viz_window_hidden 1 +viz_noxserver 1 -noidle -nojoy -nosound +viz_nosound 1 -host 1 -deathmatch +timelimit 2.0 +sv_forcerespawn 1 +sv_noautoaim 1 +sv_respawnprotect 1 +sv_spawnfarthest 1 +name WhoAmI +colorset 0 -file env/vizdoomScenarios/ttt2.wad

Wad 0: vizdoom.pk3
Wad 1: freedoom2.wad
Wad 2: ttt2.wad

Current map: MAP01

viewx = -12212431
viewy = 21605803
viewz = 2686976
viewangle = ca000000

Executing: gdb --quiet --batch --command=gdb-respfile-M3jxmD
[New LWP 47155]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f75832d0ca9 in waitpid () from /lib64/libpthread.so.0

* Loaded Libraries
From                To                  Syms Read   Shared Object Library
0x00007f75839161e0  0x00007f75839177e8  Yes         /lib64/libonion.so
0x00007f75834f0d30  0x00007f75835c5dd0  Yes         /usr/local/lib/libSDL2-2.0.so.0
0x00007f75832c78a0  0x00007f75832d2514  Yes (*)     /lib64/libpthread.so.0
0x00007f75830bc2c0  0x00007f75830bf0bc  Yes (*)     /lib64/librt.so.1
0x00007f7582ea6170  0x00007f7582eb26f0  Yes (*)     /lib64/libz.so.1
0x00007f7582c52c90  0x00007f7582c89730  Yes (*)     /lib64/libjpeg.so.62
0x00007f7582a437b0  0x00007f7582a49994  Yes (*)     /lib64/libboost_thread-mt.so.1.53.0
0x00007f75828352d0  0x00007f7582835e44  Yes (*)     /lib64/libboost_system-mt.so.1.53.0
0x00007f758262b710  0x00007f758262f8c0  Yes (*)     /lib64/libboost_date_time-mt.so.1.53.0
0x00007f758241e610  0x00007f7582420590  Yes (*)     /lib64/libboost_chrono-mt.so.1.53.0
0x00007f7582217ed0  0x00007f75822189d0  Yes (*)     /lib64/libdl.so.2
0x00007f7581f6a510  0x00007f7581fd159a  Yes (*)     /lib64/libstdc++.so.6
0x00007f7581c124b0  0x00007f7581c7c9e8  Yes (*)     /lib64/libm.so.6
0x00007f75819f9af0  0x00007f7581a09298  Yes (*)     /lib64/libgcc_s.so.1
0x00007f75816553e0  0x00007f7581798c40  Yes (*)     /lib64/libc.so.6
0x00007f7583804ae0  0x00007f758381f27a  Yes (*)     /lib64/ld-linux-x86-64.so.2
(*): Shared library is missing debugging information.

* Threads
  Id   Target Id         Frame 
  2    Thread 0x7f7581635700 (LWP 47155) "SDLTimer" 0x00007f75832cf790 in sem_wait () from /lib64/libpthread.so.0
* 1    Thread 0x7f75838ef740 (LWP 47154) "vizdoom" 0x00007f75832d0ca9 in waitpid () from /lib64/libpthread.so.0

* FPU Status
  R7: Empty   0x00000000000000000000
  R6: Empty   0x00000000000000000000
  R5: Empty   0x00000000000000000000
  R4: Empty   0x00000000000000000000
  R3: Empty   0x00000000000000000000
  R2: Empty   0x00000000000000000000
  R1: Empty   0x00000000000000000000
=>R0: Empty   0x00000000000000000000

Status Word:         0x0000                                            
                       TOP: 0
Control Word:        0x027f   IM DM ZM OM UM PM
                       PC: Double Precision (53-bits)
                       RC: Round to nearest
Tag Word:            0xffff
Instruction Pointer: 0x00:0x00000000
Operand Pointer:     0x00:0x00000000
Opcode:              0x0000

* Registers
rax            0xfffffffffffffe00	-512
rbx            0xb858	47192
rcx            0xffffffffffffffff	-1
rdx            0x0	0
rsi            0xbb2760	12265312
rdi            0xb858	47192
rbp            0xbb2760	0xbb2760
rsp            0xbb2740	0xbb2740
r8             0x0	0
r9             0x1	1
r10            0x0	0
r11            0x246	582
r12            0xb	11
r13            0x1090	4240
r14            0xbafc40	12254272
r15            0xa	10
rip            0x7f75832d0ca9	0x7f75832d0ca9 <waitpid+105>
eflags         0x246	[ PF ZF IF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0

* Backtrace

Thread 2 (Thread 0x7f7581635700 (LWP 47155)):
#0  0x00007f75832cf790 in sem_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007f75835c0d15 in SDL_SemWait_REAL (sem=0x1e5b830) at /xxx/libs-for-vizdoom/SDL2-2.0.5/src/thread/pthread/SDL_syssem.c:97
        retval = <optimized out>
#2  SDL_SemWaitTimeout_REAL (sem=0x1e5b830, timeout=4294967295) at /xxx/libs-for-vizdoom/SDL2-2.0.5/src/thread/pthread/SDL_syssem.c:126
        retval = <optimized out>
        ts_timeout = {tv_sec = 0, tv_nsec = 140142691766213}
#3  0x00007f7583566419 in SDL_TimerThread (_data=_data@entry=0x7f7583802660 <SDL_timer_data>) at /xxx/libs-for-vizdoom/SDL2-2.0.5/src/timer/SDL_timer.c:200
        data = 0x7f7583802660 <SDL_timer_data>
        pending = <optimized out>
        current = <optimized out>
        freelist_head = 0x0
        freelist_tail = 0x0
        tick = 236
        now = <optimized out>
        interval = <optimized out>
        delay = <optimized out>
#4  0x00007f7583565d8d in SDL_RunThread (data=0x1e5b860) at /xxx/libs-for-vizdoom/SDL2-2.0.5/src/thread/SDL_thread.c:283
        args = 0x1e5b860
        userfunc = 0x7f75835662c0 <SDL_TimerThread>
        userdata = 0x7f7583802660 <SDL_timer_data>
        thread = 0x1e704d0
        statusloc = 0x1e704e0
#5  0x00007f75835c08e9 in RunThread (data=<optimized out>) at /xxx/libs-for-vizdoom/SDL2-2.0.5/src/thread/pthread/SDL_systhread.c:74
No locals.
#6  0x00007f75832c9dc5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#7  0x00007f758172c29d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 1 (Thread 0x7f75838ef740 (LWP 47154)):
#0  0x00007f75832d0ca9 in waitpid () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x000000000051cc3c in ?? ()
No symbol table info available.
#2  <signal handler called>
No symbol table info available.
#3  0x0000000002035ecc in ?? ()
No symbol table info available.
#4  0x000000000056a306 in GC::Step() ()
No symbol table info available.
#5  0x000000000056d65e in DThinker::TickThinkers(FThinkerList*, FThinkerList*) ()
No symbol table info available.
#6  0x000000000056d7bc in DThinker::RunThinkers() ()
No symbol table info available.
#7  0x000000000063f2fe in P_Ticker() ()
No symbol table info available.
#8  0x000000000057b85e in G_Ticker() ()
No symbol table info available.
#9  0x000000000055b72e in TryRunTics() ()
No symbol table info available.
#10 0x000000000055301b in D_DoomLoop() ()
No symbol table info available.
#11 0x00000000005557f2 in D_DoomMain() ()
No symbol table info available.
#12 0x00000000005049a3 in main ()
No symbol table info available.

@mwydmuch
Copy link
Member

mwydmuch commented Jan 15, 2017

How often does it happen? Cause I cannot replicate it.
Also it seems that there is a memory leak when using getState() inside torch.Threads. Can you confirm that or I'm doing something wrong here?

@pengsun
Copy link
Author

pengsun commented Jan 16, 2017

It always happens when I trained long enough. How can we tell there is memory leak from the information I post?

@mwydmuch
Copy link
Member

I see that you use your own wad file. Have you checked other scenarios/wads?
I've created a test based on multiple_instances.lua and observed the memory leak.

@pengsun
Copy link
Author

pengsun commented Jan 17, 2017

Okay, will check more wad files... I checked cig.wad, which also crashed. My own wad file "ttt2.wad" is just renamed from "multi_duel.wad" with both players changed to "deathmatch player" by slade...

@mwydmuch
Copy link
Member

mwydmuch commented Jan 17, 2017

What about some singleplayer wads? Based on location of the crash, I suspect that bots may be the cause.

As ZDoom wiki says (https://zdoom.org/wiki/Bots) bots are officially unsupported in current versions of engine and unfortunately aren't working perfectly with all ViZDoom features (recording functionality). On the other hand, I believe that bots were used frequently during last year competition and no one reported similar problem.

I'll examine this for sure, but I'm a little busy lately, so it may take a week.
Fortunately you can simply catch that exception and restart ViZDoom (sorry if it's obvious).

@GoingMyWay
Copy link

GoingMyWay commented Jun 23, 2017

Similar issues occurred

*** Fatal Error ***
Address not mapped to object (signal 11)
Address: 0x4ecee8230

Generating vizdoom-crash.log and killing process 25541, please wait... 40	../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Error: Can't open display: 

I used basic.wad provided from the scenarios directory

@mwydmuch
Copy link
Member

Hello @GoingMyWay,
can you provide some simple code to replicate this problem and vizdoom-crash.log file?

@GoingMyWay
Copy link

GoingMyWay commented Jun 23, 2017

@mwydmuch The following are the details of the outputs

*** Fatal Error ***
Address not mapped to object (signal 11)
Address: 0x4ecee8230

Generating vizdoom-crash.log and killing process 25541, please wait... 40	../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Error: Can't open display: 


*** Fatal Error ***
Segmentation fault (signal 11)
Address: (nil)

Generating vizdoom-crash.log and killing process 25553, please wait... 40	../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Error: Can't open display: 


*** Fatal Error ***
Segmentation fault (signal 11)
Address: (nil)

Generating vizdoom-crash.log and killing process 25535, please wait... 40	../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Error: Can't open display: 
Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "main.py", line 56, in <lambda>
    agent_train = lambda: ag.train(max_episode_length, gamma, sess, coord, saver)
  File "/home/doom/Code/A3C/agent.py", line 138, in train
    r = self.env.make_action(self.actions[a_index], 4) / 400.0
vizdoom.vizdoom.ViZDoomErrorException: Unexpected ViZDoom instance crash.

Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "main.py", line 56, in <lambda>
    agent_train = lambda: ag.train(max_episode_length, gamma, sess, coord, saver)
  File "/home/doom/Code/A3C/agent.py", line 138, in train
    r = self.env.make_action(self.actions[a_index], 4) / 400.0
vizdoom.vizdoom.ViZDoomErrorException: Unexpected ViZDoom instance crash.

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "main.py", line 56, in <lambda>
    agent_train = lambda: ag.train(max_episode_length, gamma, sess, coord, saver)
  File "/home/doom/Code/A3C/agent.py", line 138, in train
    r = self.env.make_action(self.actions[a_index], 4) / 400.0
vizdoom.vizdoom.ViZDoomErrorException: Unexpected ViZDoom instance crash.

Starting worker 0
Starting worker 1
Starting worker 2
Starting worker 3
Episode count 250, saved Model
Episode count 500, saved Model
Episode count 750, saved Model
Episode count 1000, saved Model
Episode count 1250, saved Model
Episode count 1500, saved Model
Episode count 1750, saved Model
Episode count 2000, saved Model
Episode count 2250, saved Model
Episode count 2500, saved Model
Episode count 2750, saved Model
Episode count 3000, saved Model
Episode count 3250, saved Model
Episode count 3500, saved Model
Episode count 3750, saved Model
Episode count 4000, saved Model
Episode count 4250, saved Model
Stop training name:worker_2

Since There are 4 agents training simultaneously, the maximum number of episode is 120000, however it stoped at 4250.

Show vizdoom-crash.log

cat vizdoom-crash.log 
*** Fatal Error ***
Segmentation fault (signal 11)
Address: (nil)

System: Linux ubuntu 3.13.0-88-generic #135-Ubuntu SMP Wed Jun 8 21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

ViZDoom version 1.1.1 (ZDOOM 2.8.1) ()
Compiler version: 4.8.4

Command line: /usr/local/lib/python3.5/site-packages/vizdoom/vizdoom -iwad /usr/local/lib/python3.5/site-packages/vizdoom/freedoom2.wad -config _vizdoom.ini -skill 3 -width 160 -height 120 +vid_aspect 3 +fullscreen 0 +viz_controlled 1 +viz_instance_id buZO9fDhLb +use_mouse 0 -rngseed 3260863032 +viz_labels 1 +viz_render_mode 1604 +viz_noconsole 1 +viz_screen_format 8 +viz_window_hidden 1 +viz_noxserver 1 -noidle -nojoy -nosound +viz_nosound 1 -file ./scenarios/basic.wad

Wad 0: vizdoom.pk3
Wad 1: freedoom2.wad
Wad 2: basic.wad

Not in a level.

Executing: gdb --quiet --batch --command=gdb-respfile-TiqBru
[New LWP 25536]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f9d7679bed9 in __libc_waitpid (pid=26139, stat_loc=0xbb9e20, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40

* Loaded Libraries
From                To                  Syms Read   Shared Object Library
0x00007f9d77271e40  0x00007f9d774bbaae  Yes (*)     /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
0x00007f9d76fc22a0  0x00007f9d76ff00c6  Yes (*)     /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
0x00007f9d76cc98c0  0x00007f9d76d3cb2a  Yes (*)     /lib/x86_64-linux-gnu/libglib-2.0.so.0
0x00007f9d769c2280  0x00007f9d76a707da  Yes (*)     /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
0x00007f9d767919f0  0x00007f9d7679e471  Yes         /lib/x86_64-linux-gnu/libpthread.so.0
0x00007f9d76586350  0x00007f9d7658933c  Yes         /lib/x86_64-linux-gnu/librt.so.1
0x00007f9d7636ce00  0x00007f9d7637cbf8  Yes (*)     /lib/x86_64-linux-gnu/libz.so.1
0x00007f9d76119d90  0x00007f9d76150520  Yes (*)     /usr/lib/x86_64-linux-gnu/libjpeg.so.8
0x00007f9d75f073c0  0x00007f9d75f1308f  Yes (*)     /lib/x86_64-linux-gnu/libbz2.so.1.0
0x00007f9d75cfaf40  0x00007f9d75d00883  Yes (*)     /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
0x00007f9d75aed250  0x00007f9d75aeddc3  Yes (*)     /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0
0x00007f9d7588a0c0  0x00007f9d758cae7c  Yes (*)     /usr/lib/x86_64-linux-gnu/libsndfile.so.1
0x00007f9d75680ed0  0x00007f9d756819ce  Yes         /lib/x86_64-linux-gnu/libdl.so.2
0x00007f9d753d7620  0x00007f9d7543a803  Yes (*)     /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00007f9d7507b610  0x00007f9d750ea056  Yes         /lib/x86_64-linux-gnu/libm.so.6
0x00007f9d74e62ab0  0x00007f9d74e72985  Yes (*)     /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007f9d74aba520  0x00007f9d74bff183  Yes         /lib/x86_64-linux-gnu/libc.so.6
0x00007f9d74805910  0x00007f9d7485a0eb  Yes (*)     /usr/lib/x86_64-linux-gnu/libgdk-x11-2.0.so.0
0x00007f9d745e5150  0x00007f9d745e6015  Yes (*)     /usr/lib/x86_64-linux-gnu/libgmodule-2.0.so.0
0x00007f9d743db8d0  0x00007f9d743e0216  Yes (*)     /usr/lib/x86_64-linux-gnu/libpangocairo-1.0.so.0
0x00007f9d740ba7f0  0x00007f9d7413dbfb  Yes (*)     /usr/lib/x86_64-linux-gnu/libX11.so.6
0x00007f9d73e9d530  0x00007f9d73e9f766  Yes (*)     /usr/lib/x86_64-linux-gnu/libXfixes.so.3
0x00007f9d73c83990  0x00007f9d73c8f183  Yes (*)     /usr/lib/x86_64-linux-gnu/libatk-1.0.so.0
0x00007f9d73980940  0x00007f9d73a3c016  Yes (*)     /usr/lib/x86_64-linux-gnu/libcairo.so.2
0x00007f9d73753e30  0x00007f9d73765d10  Yes (*)     /usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0
0x00007f9d7340e790  0x00007f9d734d4a2b  Yes (*)     /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0
0x00007f9d731cd360  0x00007f9d731d554b  Yes (*)     /usr/lib/x86_64-linux-gnu/libpangoft2-1.0.so.0
0x00007f9d72f86e50  0x00007f9d72fa5f26  Yes (*)     /usr/lib/x86_64-linux-gnu/libpango-1.0.so.0
0x00007f9d72d43cc0  0x00007f9d72d60819  Yes (*)     /usr/lib/x86_64-linux-gnu/libfontconfig.so.1
0x00007f9d72b36650  0x00007f9d72b3ad38  Yes (*)     /usr/lib/x86_64-linux-gnu/libffi.so.6
0x00007f9d728f87b0  0x00007f9d7291ecdf  Yes (*)     /lib/x86_64-linux-gnu/libpcre.so.3
0x00007f9d72630f80  0x00007f9d726b59a2  Yes (*)     /usr/lib/x86_64-linux-gnu/libasound.so.2
0x00007f9d72404550  0x00007f9d72405733  Yes (*)     /usr/lib/x86_64-linux-gnu/libpulse-simple.so.0
0x00007f9d721c5170  0x00007f9d721efcd8  Yes (*)     /usr/lib/x86_64-linux-gnu/libpulse.so.0
0x00007f9d71fab580  0x00007f9d71fb4c7f  Yes (*)     /usr/lib/x86_64-linux-gnu/libXext.so.6
0x00007f9d71da0420  0x00007f9d71da4e20  Yes (*)     /usr/lib/x86_64-linux-gnu/libXcursor.so.1
0x00007f9d71b9baf0  0x00007f9d71b9c3ec  Yes (*)     /usr/lib/x86_64-linux-gnu/libXinerama.so.1
0x00007f9d7198d1e0  0x00007f9d71996f12  Yes (*)     /usr/lib/x86_64-linux-gnu/libXi.so.6
0x00007f9d71782c00  0x00007f9d71788632  Yes (*)     /usr/lib/x86_64-linux-gnu/libXrandr.so.2
0x00007f9d7157dcd0  0x00007f9d7157ea8c  Yes (*)     /usr/lib/x86_64-linux-gnu/libXss.so.1
0x00007f9d71377f40  0x00007f9d7137a6f6  Yes (*)     /usr/lib/x86_64-linux-gnu/libXxf86vm.so.1
0x00007f9d711756e0  0x00007f9d71175896  Yes (*)     /usr/lib/x86_64-linux-gnu/libwayland-egl.so.1
0x00007f9d70f6cb90  0x00007f9d70f70bcf  Yes (*)     /usr/lib/x86_64-linux-gnu/libwayland-client.so.0
0x00007f9d70d61200  0x00007f9d70d62c4c  Yes (*)     /usr/lib/x86_64-linux-gnu/libwayland-cursor.so.0
0x00007f9d70b29f10  0x00007f9d70b411d5  Yes (*)     /usr/lib/x86_64-linux-gnu/libxkbcommon.so.0
0x00007f9d77845ae0  0x00007f9d77860490  Yes         /lib64/ld-linux-x86-64.so.2
0x00007f9d708fd300  0x00007f9d7091b98d  Yes (*)     /usr/lib/x86_64-linux-gnu/libFLAC.so.8
0x00007f9d70439a40  0x00007f9d7043bff4  Yes (*)     /usr/lib/x86_64-linux-gnu/libvorbisenc.so.2
0x00007f9d701fbdd0  0x00007f9d702132ed  Yes (*)     /usr/lib/x86_64-linux-gnu/libvorbis.so.0
0x00007f9d6fff1a70  0x00007f9d6fff5c25  Yes (*)     /usr/lib/x86_64-linux-gnu/libogg.so.0
0x00007f9d6fde7a60  0x00007f9d6fded728  Yes (*)     /usr/lib/x86_64-linux-gnu/libXrender.so.1
0x00007f9d6fbe3c40  0x00007f9d6fbe4618  Yes (*)     /usr/lib/x86_64-linux-gnu/libXcomposite.so.1
0x00007f9d6f9e0b90  0x00007f9d6f9e149b  Yes (*)     /usr/lib/x86_64-linux-gnu/libXdamage.so.1
0x00007f9d6f748b90  0x00007f9d6f7b427d  Yes (*)     /usr/lib/x86_64-linux-gnu/libfreetype.so.6
0x00007f9d6f527620  0x00007f9d6f5335e5  Yes (*)     /usr/lib/x86_64-linux-gnu/libxcb.so.1
0x00007f9d6f2804e0  0x00007f9d6f305c2c  Yes (*)     /usr/lib/x86_64-linux-gnu/libpixman-1.so.0
0x00007f9d6f053ab0  0x00007f9d6f06d003  Yes (*)     /lib/x86_64-linux-gnu/libpng12.so.0
0x00007f9d6ee4dd80  0x00007f9d6ee4e5f3  Yes (*)     /usr/lib/x86_64-linux-gnu/libxcb-shm.so.0
0x00007f9d6ec47430  0x00007f9d6ec49edf  Yes (*)     /usr/lib/x86_64-linux-gnu/libxcb-render.so.0
0x00007f9d6ea26b00  0x00007f9d6ea38a26  Yes (*)     /lib/x86_64-linux-gnu/libselinux.so.1
0x00007f9d6e809ad0  0x00007f9d6e818eb9  Yes         /lib/x86_64-linux-gnu/libresolv.so.2
0x00007f9d6e5b6f20  0x00007f9d6e5ea52a  Yes (*)     /usr/lib/x86_64-linux-gnu/libharfbuzz.so.0
0x00007f9d6e3a9c20  0x00007f9d6e3ad011  Yes (*)     /usr/lib/x86_64-linux-gnu/libthai.so.0
0x00007f9d6e181b60  0x00007f9d6e19a6c9  Yes (*)     /lib/x86_64-linux-gnu/libexpat.so.1
0x00007f9d6df27180  0x00007f9d6df6144a  Yes (*)     /usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-4.0.so
0x00007f9d6dd0e800  0x00007f9d6dd133cb  Yes (*)     /lib/x86_64-linux-gnu/libjson-c.so.2
0x00007f9d6dacd840  0x00007f9d6daf5d54  Yes (*)     /lib/x86_64-linux-gnu/libdbus-1.so.3
0x00007f9d6d8c3e50  0x00007f9d6d8c4acc  Yes (*)     /usr/lib/x86_64-linux-gnu/libXau.so.6
0x00007f9d6d6be350  0x00007f9d6d6bfd6c  Yes (*)     /usr/lib/x86_64-linux-gnu/libXdmcp.so.6
0x00007f9d6d499550  0x00007f9d6d4b548c  Yes (*)     /usr/lib/x86_64-linux-gnu/libgraphite2.so.3
0x00007f9d6d291170  0x00007f9d6d294274  Yes (*)     /usr/lib/x86_64-linux-gnu/libdatrie.so.1
0x00007f9d6d088d70  0x00007f9d6d08c798  Yes (*)     /lib/x86_64-linux-gnu/libwrap.so.0
0x00007f9d6ce81370  0x00007f9d6ce838e8  Yes (*)     /usr/lib/x86_64-linux-gnu/libasyncns.so.0
0x00007f9d6cc6a160  0x00007f9d6cc76ea3  Yes         /lib/x86_64-linux-gnu/libnsl.so.1
0x00007f9d6c78f010  0x00007f9d6c797750  Yes (*)     /lib/x86_64-linux-gnu/libudev.so.1
0x00007f9d6c574610  0x00007f9d6c5866ec  Yes (*)     /lib/x86_64-linux-gnu/libcgmanager.so.0
0x00007f9d6c35d760  0x00007f9d6c36a12a  Yes (*)     /lib/x86_64-linux-gnu/libnih.so.1
0x00007f9d6c151ce0  0x00007f9d6c1554c6  Yes (*)     /lib/x86_64-linux-gnu/libnih-dbus.so.1
0x00007f9d6b6fc2e0  0x00007f9d6b72b3a4  Yes (*)     /usr/lib/x86_64-linux-gnu/libopenal.so
(*): Shared library is missing debugging information.

* Threads
  Id   Target Id         Frame 
  2    Thread 0x7f9d6c14e700 (LWP 25536) "SDLTimer" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
* 1    Thread 0x7f9d77a209c0 (LWP 25535) "vizdoom" 0x00007f9d7679bed9 in __libc_waitpid (pid=26139, stat_loc=0xbb9e20, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40

* FPU Status
  R7: Empty   0x00000000000000000000
  R6: Empty   0x00000000000000000000
  R5: Empty   0x00000000000000000000
  R4: Empty   0x00000000000000000000
  R3: Empty   0x00000000000000000000
  R2: Empty   0x00000000000000000000
  R1: Empty   0x00000000000000000000
=>R0: Empty   0x00000000000000000000

Status Word:         0x0000                                            
                       TOP: 0
Control Word:        0x027f   IM DM ZM OM UM PM
                       PC: Double Precision (53-bits)
                       RC: Round to nearest
Tag Word:            0xffff
Instruction Pointer: 0x7f9d:0x7542ceae
Operand Pointer:     0x7fff:0xd2cc4448
Opcode:              0x0000

* Registers
rax            0xfffffffffffffe00	-512
rbx            0x661b	26139
rcx            0xffffffffffffffff	-1
rdx            0x0	0
rsi            0xbb9e20	12295712
rdi            0x661b	26139
rbp            0xbb9e20	0xbb9e20
rsp            0xbb9e00	0xbb9e00
r8             0x0	0
r9             0x1	1
r10            0x0	0
r11            0x246	582
r12            0xb	11
r13            0x1090	4240
r14            0xbb7300	12284672
r15            0x9	9
rip            0x7f9d7679bed9	0x7f9d7679bed9 <__libc_waitpid+105>
eflags         0x246	[ PF ZF IF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0

* Backtrace

Thread 2 (Thread 0x7f9d6c14e700 (LWP 25536)):
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
No locals.
#1  0x00007f9d76a6b52e in ?? () from /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#2  0x00007f9d76a6b675 in ?? () from /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#3  0x00007f9d76a20ba1 in ?? () from /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#4  0x00007f9d76a2073d in ?? () from /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#5  0x00007f9d76a6b279 in ?? () from /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#6  0x00007f9d76794184 in start_thread (arg=0x7f9d6c14e700) at pthread_create.c:312
        __res = <optimized out>
        pd = 0x7f9d6c14e700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140314099902208, 6620885331285598523, 0, 0, 140314099902912, 140314099902208, -6568284515656048325, -6568235264282107589}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#7  0x00007f9d74b9537d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.

Thread 1 (Thread 0x7f9d77a209c0 (LWP 25535)):
#0  0x00007f9d7679bed9 in __libc_waitpid (pid=26139, stat_loc=0xbb9e20, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
        resultvar = 26139
        oldtype = 0
#1  0x000000000051fb2c in ?? ()
No symbol table info available.
#2  <signal handler called>
No locals.
#3  __strncpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S:296
No locals.
#4  0x0000000000808551 in VIZ_GameStateUpdateLabels() ()
No symbol table info available.
#5  0x000000000080aa4a in VIZ_Update() ()
No symbol table info available.
#6  0x000000000080b495 in VIZ_Tic() ()
No symbol table info available.
#7  0x000000000055709e in D_DoomLoop() ()
No symbol table info available.
#8  0x0000000000559b59 in D_DoomMain() ()
No symbol table info available.
#9  0x000000000050774f in main ()
No symbol table info available.

Show simple code

for ag in agents:
    agent_train = lambda: ag.train(max_episode_length, gamma, sess, coord, saver)
    thd = threading.Thread(target=(agent_train))
    thd.start()
    time.sleep(0.5)
    agent_threads.append(thd)
coord.join(worker_threads, stop_grace_period_secs=60)

in ag.train()

...
a_index = np.argmax(a_policy_value == a)
r = self.env.make_action(self.actions[a_index], 4) / 400.0
...

Thank you!

@mwydmuch
Copy link
Member

Has it happened again? Cause I can't replicate this (using exactly the same settings as yours) so I'm missing some significant factor in my environment or this is really really rare.

@GoingMyWay
Copy link

@mwydmuch , Yes, almost every time, Would you mind me sending my code to you?

@mwydmuch
Copy link
Member

This will be very helpful, please send it to [email protected]
Thank you

@GoingMyWay
Copy link

@mwydmuch , I sent code to you, thank you!

@Ohsaworks
Copy link

I faced the same problem when using multi-threading.
Was the issue solved? I prefer to have the knowledge about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants