Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roboschool Windows Crashes Upon Exit #8

Open
Akz47 opened this issue Jul 13, 2018 · 20 comments
Open

Roboschool Windows Crashes Upon Exit #8

Akz47 opened this issue Jul 13, 2018 · 20 comments
Labels

Comments

@Akz47
Copy link

Akz47 commented Jul 13, 2018

When I exit the simulations or press Ctrl-C at the command prompt, Windows always displays a crash alert and tries to initiate a Windows Error Reporting.

I checked the windows error log, and it appears to be associated with QT5 module:

Faulting application name: python.exe, version: 3.6.1150.1013, time stamp: 0x5914acba
Faulting module name: Qt5Gui.dll, version: 5.10.0.0, time stamp: 0x5a2ae6f3
Exception code: 0xc0000005
Fault offset: 0x0000000000078626
Faulting process id: 0x2328
Faulting application start time: 0x01d41a82f4c8335f
Faulting application path: K:\Anaconda3\python.exe
Faulting module path: L:\WINDOWS\SYSTEM32\Qt5Gui.dll
Report Id: f6d73881-e892-47c0-9a5e-e2417c547b15
Faulting package full name:
Faulting package-relative application ID:

I am using the following versions:

  • Python 3.6.1
  • Anaconda 4.4.0 (but I'm not running the scripts within an environment)
  • Tensorflow 1.8.0
  • OpenAI Gym 0.10.5
  • Baselines 0.1.5
  • Windows 10 Pro (build 1709)

I don't think it affects the operation of the simulation, but it's weird that it crashes each time.

Thanks!

@Akz47 Akz47 changed the title Roboschool Crashes Upon Exit Roboschool Windows Crashes Upon Exit Jul 13, 2018
@TheCrazyT
Copy link
Owner

TheCrazyT commented Jul 13, 2018

Ok i must admit it has nothing to do with anaconda or the python version.
For some reason i do not get the window for the crash anymore, but i can clearly see the same fault offsets inside the windows event-history. (maybe i disabled the window once with some setting, i'm not shure)
Will try to investigate it.
All i know for now is that the cpp_household-library seems to execute the
_ZN14QOpenGLContext14currentContextEv-function (wich itself calls _ZNK14QOpenGLContext14extraFunctionsEv) wich seems to access an invalid offset.
My guess is a nullpointer because of free'd resources or something like that.

Edit:
Oh well i'm quite shure it crashes on locations like here:
https://github.com/TheCrazyT/roboschool/blob/master/roboschool/cpp-household/render-simple.cpp#L340

There are probably more than one line that uses QOpenGLContext::currentContext()->extraFunctions(), guess i will just add a check for a nullpointer on those locations.

@Akz47
Copy link
Author

Akz47 commented Jul 14, 2018

Thank you for promptly locating the code segments that caused the null pointer errors. What are the ExtraFunctions for, so they are not needed for rendering the simulations?

Meanwhile, I've entirely disable Windows Error Reporting and the crash dialog.

Below are the steps, in case anyone wishes to do so too:

  • Launch Local Group Policy Editor
  • Go to Administrative Templates > Windows Components > Windows Error Reporting
  • Set "Disable Windows Error Reporting" to Enabled
  • Set "Prevent display of the user interface for critical errors" to Enabled

So I will be oblivious to any crashes. :)

Please let me know if and when an updated version is available.

@TheCrazyT
Copy link
Owner

What are the ExtraFunctions for, so they are not needed for rendering the simulations?

They are needed, but by the time some destructors are called, the QOpenGLContext is already gone.
Thats a problem because QOpenGLContext::currentContext() would return no object anymore.

TheCrazyT added a commit that referenced this issue Jul 14, 2018
TheCrazyT added a commit that referenced this issue Jul 14, 2018
TheCrazyT added a commit that referenced this issue Jul 14, 2018
TheCrazyT added a commit that referenced this issue Jul 14, 2018
@TheCrazyT
Copy link
Owner

I think I fixed it, you can download it here: https://dl.bintray.com/thecrazyt/roboschool/0.5/
(basically you only need to download cpp_household.pyd and replace it)

@Akz47
Copy link
Author

Akz47 commented Jul 15, 2018

Thank you for your fantastic speedy fix, it works perfect now without any crash!

I've re-enabled Windows Error Reporting and double checked in the Event Log to confirm that no application errors are reported.

On a separate note, I'm having a little trouble executing the multiplayer samples demo_pong.py and demo_race1.py, which make use of os.mkfifo not available on Windows:

Traceback (most recent call last):
File "demo_pong.py", line 23, in
gameserver = roboschool.multiplayer.SharedMemoryServer(game, "pongdemo", want_test_window=True)
File "roboschool\multiplayer.py", line 257, in init
player_n=n)
File "roboschool\multiplayer.py", line 147, in init
os.mkfifo(self.sh_pipe_actready_filename)
AttributeError: module 'os' has no attribute 'mkfifo'

I tried searching around for a Windows version of this multiplayer.py but couldn't seem to find any. I saw some general (non-Roboschool) recommendations to replace os.mkfifo with os.pipe / pywin32 / ctypes, but I'm not too sure how exactly to go about doing that.

Line 147 in multiplayer.py is:

os.mkfifo(self.sh_pipe_actready_filename)

Could you please shed some light on how I can fix this for Windows execution?

Thank you once again for your kind assistance.

@TheCrazyT
Copy link
Owner

I tried os.pipe and failed ...
After that I tried win32pipe to use named pipes ... and failed again ...
Guess i can't fix it that easily, would take more time i guess.

@Akz47
Copy link
Author

Akz47 commented Aug 2, 2018

No worries, thank you very much for trying. I'll experiment with the other examples that do not require the pipes first.

@TheCrazyT
Copy link
Owner

TheCrazyT commented Aug 4, 2018

Alright it seems to work now with the commit:
ecf5279
Although i'm trying to make the project work for both operating systems, i probably break the possibility to use the same project on linux.
I do not have enough time to test on both machines and since a new roboschool-version is planned it is not worth the effort to make it work on both systems i guess.

Well my solution is working, but probably can be improved.
Not shure wich demo's are still failing, tested my solution with the demo_pong.py .

Edit:
Not shure if it is an error but the animation seems to stop at frame 999.
Currently can't figure out if this happens intensionally or by a bug.
Strange thing is that i can't find any number in the source that limits to that frame.
All i know is that you get an error about a closed pipe ... wich is weird because the source currently never closes the pipe (except if the "server"/python stops)

@Akz47
Copy link
Author

Akz47 commented Aug 5, 2018

Thank you for updating the Roboschool to address the pipe issue.

I've updated my Roboschool copy with your 3 new files: winfifo.py, multiplayer.py and demo_pong.py, and also configured the temp directory paths.

When I run demo_pong.py, it exits with the following error without showing any animations:

Waiting tmp/multiplayer_pongdemo_player00
Waiting tmp/multiplayer_pongdemo_player01
Player 0 connected, wants to operate RoboschoolPong-v1 in this scene
Player 1 connected, wants to operate RoboschoolPong-v1 in this scene
Traceback (most recent call last):
  File "demo_pong.py", line 26, in <module>
    gameserver.serve_forever()
  File "roboschool\multiplayer.py", line 285, in serve_forever
    p.read_and_apply_action()
  File "roboschool\multiplayer.py", line 192, in read_and_apply_action
    check = self.sh_pipe_actready.readline()[:-1]
  File "roboschool\winfifo.py", line 62, in readline
    res = str(super().readline(),"UTF-8")
  File "roboschool\winfifo.py", line 54, in read
    result,data = win32file.ReadFile(self.handle,size,self.overlapped)
pywintypes.error: (109, 'ReadFile', 'The pipe has been ended.')

Is this related to your same error?

@TheCrazyT
Copy link
Owner

TheCrazyT commented Aug 5, 2018

Yes it is the same error that i can't figure out.
But for some reason that error does not happen at the first 999 frames so i'm seeing indeed a pong animation.
First i thought it could be the internal garbage-collector of python, but since the pipe-variables are in global namespace("PIPE_HANDLES") this should not be the case.

Minimizing the window also stops the animation for some reason (can't remember what error happens if you do it).

@Akz47
Copy link
Author

Akz47 commented Aug 6, 2018

I found this post about os.path.exists() interfering with the pipes and generating the same error.

Could this issue be related?

https://stackoverflow.com/questions/51255352/why-does-os-path-exists-stop-windows-named-pipes-from-connecting

@TheCrazyT
Copy link
Owner

TheCrazyT commented Aug 6, 2018

No, i figured out why it failed for me:

register(
    id='RoboschoolPong-v1',
    entry_point='roboschool:RoboschoolPong',
    max_episode_steps=1000,
    tags={ "pg_complexity": 20*1000000 },
    ) 

That code is inside the init.py of the roboschool folder.
max_episode_steps=1000 explains why it stops at 999 for me with that error.
The client-python script finishes its execution after it finished its episode.
Result is that the pipe is lost (wich is ok because the suprocess stopped) and the server throwing that pipe is broken error.
This could normaly be silently ignored although i still don't get why the serve_forever function is written in a way to do more than 1 episode although the play-function in demo_pong.py finishes after 1 episode.

Edit:
Now that i think about it i guess they just forgot a while True: above the call of the play-function.

TheCrazyT added a commit that referenced this issue Aug 6, 2018
Not shure if the animation should be endless, but an error is wrong and the server part was written to do more than 1 episode.
#8
@TheCrazyT
Copy link
Owner

TheCrazyT commented Aug 6, 2018

Oh and i almost forgot:

also configured the temp directory paths.

this is not necessary because the paths are only virtual, i'm using named pipes and no real file on windows.

@Akz47
Copy link
Author

Akz47 commented Aug 7, 2018

Thanks for the update. I updated the demo_pong.py with the "while True" line, and changed init.py's steps to 5000.

Below is what I got:

Waiting tmp/multiplayer_pongdemo_player00
Waiting tmp/multiplayer_pongdemo_player01
Player 0 connected, wants to operate RoboschoolPong-v1 in this scene
Player 1 connected, wants to operate RoboschoolPong-v1 in this scene
Traceback (most recent call last):
  File "demo_pong.py", line 26, in <module>
    gameserver.serve_forever()
  File "roboschool\multiplayer.py", line 285, in serve_forever
    p.read_and_apply_action()
  File "roboschool\multiplayer.py", line 192, in read_and_apply_action
    check = self.sh_pipe_actready.readline()[:-1]
  File "roboschool\winfifo.py", line 62, in readline
    res = str(super().readline(),"UTF-8")
  File "roboschool\winfifo.py", line 57, in read
    raise Exception("ret_code: %d" % ret_code);
Exception: ret_code: 258

The script showed the error, then returned to command line, but continued to output results like this:

40:-38 50:-46 52:-50 67:-62 53:-51 48:-44 58:-56 46:-43 58:-54 51:-47 ...

It seemed to continue indefinitely at about 1 result every 2-3 seconds for hours. Are these the expected results? However, no visual output / window is displayed.

I also tried enabling "video=True" in demo_pong, but the script will then crash with the earlier "pywintypes.error: (109, 'ReadFile', 'The pipe has been ended.')" error.

p/s: The temp directory I was referring to earlier was actually configured in multiplayer.py, which generates actual files in the system.

@TheCrazyT
Copy link
Owner

Alright i know what you mean the paths were no trouble for me (maybe because i have msys and cygwin installed?).
Well i see that you mean the multiplayer_pongdemo_player00_* and multiplayer_pongdemo_player01_* files.
Sadly the MULTIPLAYER_FILES_DIR was used for the pipe-paths as well and the ":" creates trouble.
(because "\.\pipe\roboschoolC:\tmp" is no valid pipe path for example)
I modified the winfifo to replace that character ...

@Akz47
Copy link
Author

Akz47 commented Aug 15, 2018

Thank you for your reply. Actually I directly edited the MULTIPLAYER_FILES_DIR variable too, setting it just to "tmp", a relative path within my execution directory (which is agent_zoo). I see the "multiplayer_pongdemo_player00_*" files inside, so it seems to write correctly.

However, there is no video screen generated when I run this demo_pong. For others like RoboschoolWalker etc, an animation window is displayed.

I only keep seeing the output results like "0:-38 50:-46 52:-50 67:-62 53:-51 48:-44 58:-56 46:-43 58:-54 51:-47 ..." that seems to run indefinitely.

Is there something preventing the animation window from launching or rendering?

@TheCrazyT
Copy link
Owner

Do you use the current version?(winfifo.py should have the line fileName = fileName.replace(":","_"))
Do you get any stacktrace?
The numbers that are outputted are normaly the scores of the left and the right "pong".
What is strange is that it shows big or negative values for you for some reason.
Currently i have no clue why the window is not shown, its hard to debug withouth having the same problem.
Maybe you could change the FIFO_DEBUG-constant to "true" (inside roboschool/winfifo.py) , post the result of the application on http://pastebin.com/ and link it here.
This could help me find the problem.

@Akz47
Copy link
Author

Akz47 commented Aug 15, 2018

Thanks for your pointers. Yes, I'm already using the latest winfifo.py.

Once I run it, I get the following error, but the output numbers continue to be generated in the background:

Traceback (most recent call last):
  File "demo_pong.py", line 27, in <module>
    gameserver.serve_forever()
  File "roboschool\roboschool\multiplayer.py", line 286, in serve_forever
    p.read_and_apply_action()
  File "roboschool\roboschool\multiplayer.py", line 193, in read_and_apply_action
    check = self.sh_pipe_actready.readline()[:-1]
  File "roboschool\roboschool\winfifo.py", line 62, in readline
    res = str(super().readline(),"UTF-8")
  File "roboschool\roboschool\winfifo.py", line 57, in read
    raise Exception("ret_code: %d" % ret_code);
Exception: ret_code: 258

What does this return code 258 mean?

Below is the debug information after enabling FIFO_DEBUG:
https://pastebin.com/mQZgXwyB

Could the animation problem be a separate issue unrelated to the winfifo, or is the rendering disabled somewhere? If I run RoboschoolPong_v0_2017may1.py, the animation shows properly.

@TheCrazyT
Copy link
Owner

TheCrazyT commented Aug 15, 2018

The code 258 means that its a timeout that happens.
I setted the time to 10 seconds wich should be more than enough for the subprocesses to respond.
Atleast at the beginning the communication seems to work between the subprocesses and the main process (the one that calls gameserver.serve_forever() ).

But for some reason the subprocesses do not seem to write a second time after sending their model information ("RoboschoolPong-v1").
To explain the log a little, the first number represents the following:
12316 is the processid of the main process.
11296 is the processid of the one of the "pongs".
10248 is the processid of the other "pong".
It crashes when the main process waits for response from one of the sub-processes.
(for some reason one of the subprocesses only write to "multiplayer_pongdemo_player00_actready" for one time)

@Akz47
Copy link
Author

Akz47 commented Aug 20, 2018

Thank you for your detailed analysis.

Since there is a timeout and crash, it seems weird that the games are still being played? The results like "114:-110 112:-107 131:-123 ......" still continue to be generated even after the return code 258 is displayed.

Does that mean that only a specific sub-process timed out / crash, without affecting the main loop?

Does any of these error or log messages help diagnose the missing animation rendering?

@TheCrazyT TheCrazyT added the bug label Mar 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants