Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait() function could throw TimeoutExpired exception when it shouldn't #1224

Closed
xiaolingbao opened this issue Feb 20, 2018 · 8 comments
Closed

Comments

@xiaolingbao
Copy link
Contributor

This issue is about function wait()

def wait(self, timeout=None):
.

My test scenario is listed below, although it doesn't repro 100%, it happens quite often. I didn't measure how frequently it happens though.

  1. Have a process object, proc, call proc.kill().
  2. Call proc.wait(30)
  3. proc.wait() throws TimeoutExpired almost instantly.

Speculation:

  1. cext.proc_wait() returns quickly because process exited (killed) and returns a value not equal to WAIT_TIMEOUT.
  2. Somehow, the process still exists, and thus pid_exists() returns True. (Zombie status? Other processes, threads still have references to the process handle?)
  3. Since timeout value is not None, TimeoutExpired throws.

This behavior is different from the previous version. In the previous version, wait() will not throw TimeoutExpired, but just return the value returned from proc_wait().

@giampaolo
Copy link
Owner

Mmmm, can you paste the traceback?

@xiaolingbao
Copy link
Contributor Author

xiaolingbao commented Feb 20, 2018

This is our code snippet that calls psutil:

       # timeout = 30
 377   logging.info('Terminating process id: %r', process_id)
 378   try:
 379     proc = psutil.Process(process_id)
 380     proc.terminate()
 381     proc.wait(timeout)
 382     logging.info('Process %d was terminated successfully', process_id)
 383     return True
 384   except psutil.TimeoutExpired as e:
 385     logging.exception(e)
 386     logging.error('Failed to terminate process %d in %d seconds',
 387                   process_id, timeout)
 388     return False
 389   except psutil.AccessDenied:
 390     logging.error(
 391         'Unable to terminate process %d due to insufficient privileges',
 392         process_id)
 393     return False
 394   except psutil.NoSuchProcess:
 395     logging.info('Process %d was not running', process_id)
 396     return True

This is the log got:

[INFO 14:31:30.256 5056:516 test_lib.py:377]Terminating process id: 5472
[ERROR 14:31:30.367 5056:516 test_lib.py:385]psutil.TimeoutExpired timeout after 30 seconds (pid=5472)
Traceback (most recent call last):
  File "c:\TestAutomation\lib\test_lib.py", line 381, in _TerminateProcess
    proc.wait(timeout)
  File "c:\TestAutomation\third_party\py\psutil\__init__.py", line 1264, in wait
    return self._proc.wait(timeout)
  File "c:\TestAutomation\third_party\py\psutil\_pswindows.py", line 635, in wrapper
    return fun(self, *args, **kwargs)
  File "c:\TestAutomation\third_party\py\psutil\_pswindows.py", line 805, in wait
    raise TimeoutExpired(timeout, self.pid, self._name)
TimeoutExpired: psutil.TimeoutExpired timeout after 30 seconds (pid=5472)

Please see the log timestamp, they happened roughly within .1 second. This is on Windows platform.

I tried to terminate notepad.exe and wait, and failed to repro. So it maybe it is harder to repro when terminating a light-weight process. This is just speculation though.

@giampaolo
Copy link
Owner

I briefly investigated this. It may be due to WaitForSingleObject returning WAIT_ABANDONED or WAIT_OBJECT_0:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms687032(v=vs.85).aspx

@giampaolo
Copy link
Owner

Bumping up priority.

My test scenario is listed below, although it doesn't repro 100%, it happens quite often.

I know it doesn't happen all the time but... any chance you could provide a test script which reproduce it sometimes?

@xiaolingbao
Copy link
Contributor Author

I've created a simple script that could repro the issue occasionally. Originally I said the issue happens quite often. This assertion seems to be corrected. We ran tests in parallel on lots of machines, that increase the repro rate a lot. And the process we kill may also play a factor. Please run the script multiple times if needed.

psutil_test.zip

@giampaolo
Copy link
Owner

OK, I think I fixed this in #1253 but unfortunately I cannot reproduce the issue. Can you try? If you don't have a C compiler setup just tell me what Python version you have and I can attach a wheel file for you to try.

@giampaolo
Copy link
Owner

@giampaolo
Copy link
Owner

I'm assuming this is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants