Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threading.Event().wait() not interruptable with Ctrl-C on Windows #80116

Open
chrisjbillington mannequin opened this issue Feb 8, 2019 · 8 comments
Open

threading.Event().wait() not interruptable with Ctrl-C on Windows #80116

chrisjbillington mannequin opened this issue Feb 8, 2019 · 8 comments
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes OS-windows type-bug An unexpected behavior, bug, or error

Comments

@chrisjbillington
Copy link
Mannequin

chrisjbillington mannequin commented Feb 8, 2019

BPO 35935
Nosy @pfmoore, @tjguk, @zware, @eryksun, @zooba, @chrisjbillington

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-02-08.01:12:57.458>
labels = ['3.10', 'type-bug', '3.8', '3.9', 'OS-windows']
title = 'threading.Event().wait() not interruptable with Ctrl-C on Windows'
updated_at = <Date 2021-03-03.17:50:09.938>
user = 'https://github.com/chrisjbillington'

bugs.python.org fields:

activity = <Date 2021-03-03.17:50:09.938>
actor = 'eryksun'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Windows']
creation = <Date 2019-02-08.01:12:57.458>
creator = 'Chris Billington'
dependencies = []
files = []
hgrepos = []
issue_num = 35935
keywords = []
message_count = 7.0
messages = ['335049', '335050', '335056', '335086', '387997', '388028', '388037']
nosy_count = 7.0
nosy_names = ['paul.moore', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'Chris Billington', 'alegrigoriev']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue35935'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

@chrisjbillington
Copy link
Mannequin Author

chrisjbillington mannequin commented Feb 8, 2019

I'm experiencing that the following short program:

import threading
event = threading.Event()
event.wait()

Cannot be interrupted with Ctrl-C on Python 2.7.15 or 3.7.1 on Windows 10 (using the Anaconda Python distribution).

However, if the wait is given a timeout:

import threading
event = threading.Event()
while True:
    if event.wait(10000):
        break

then this is interruptable on Python 2.7.15, but is still uninterruptible on Python 3.7.1.

@chrisjbillington chrisjbillington mannequin added 3.7 (EOL) end of life OS-windows type-bug An unexpected behavior, bug, or error labels Feb 8, 2019
@chrisjbillington
Copy link
Mannequin Author

chrisjbillington mannequin commented Feb 8, 2019

If I add:

import signal
signal.signal(signal.SIGINT, signal.SIG_DFL)

before the wait() call, then the call is interruptible on both Python versions without needing to add a timeout.

@eryksun
Copy link
Contributor

eryksun commented Feb 8, 2019

Python's C signal handler sets a flag and returns, and the signal is eventually handled in the main thread. In Windows, this means the Python SIGINT handler won't be called so long as the main thread is blocked. (In Unix the signal is delivered on the main thread and interrupts most blocking calls.)

In Python 3, our C signal handler also signals a SIGINT kernel event object. This gets used in functions such as time.sleep(). However, threading wait and join methods do not support this event. In principle they could, so long as the underlying implementation continues to use kernel semaphore objects, but that may change. There's been pressure to adopt native condition variables instead of using semaphores.

When you enable the default handler, that's actually the default console control-event handler. It simply exits via ExitProcess(STATUS_CONTROL_C_EXIT). This works because the console control event is delivered by creating a new thread that starts at a private CtrlRoutine function in kernelbase.dll, so it doesn't matter that the main thread may be blocked. By default SIGBREAK also executes the default handler, so Ctrl+Break almost always works to kill a console process. Shells such as cmd.exe usually ignore it, because it would be annoying if Ctrl+Break also killed the shell and destroyed the console window.

Note also that Python's signal architecture cannot support CTRL_CLOSE_EVENT, even though it's also mapped to SIGBREAK. The problem is that our C handler simply sets a flag and returns. For the close event, the session server waits on the control thread for up to 5 seconds and then terminates the process. Thus the C signal handler returning immediately means our process will be killed long before our Python handler gets called.

We may need to actually handle the event, such as ensuring that atexit functions are called. Currently the only way to handle closing the console window and cases where the main thread is blocked is to install our own console control handler using ctypes or PyWin32. Usually we do this to ensure a clean, controlled shutdown. Here's what this looks like with ctypes:

    import ctypes
    from ctypes import wintypes

    kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

    CTRL_C_EVENT = 0
    CTRL_BREAK_EVENT = 1
    CTRL_CLOSE_EVENT = 2

    HANDLER_ROUTINE = ctypes.WINFUNCTYPE(wintypes.BOOL, wintypes.DWORD)
    kernel32.SetConsoleCtrlHandler.argtypes = (
        HANDLER_ROUTINE,
        wintypes.BOOL)

    @HANDLER_ROUTINE
    def handler(ctrl):
        if ctrl == CTRL_C_EVENT:
            handled = do_ctrl_c()
        elif ctrl == CTRL_BREAK_EVENT:
            handled = do_ctrl_break()
        elif ctrl == CTRL_CLOSE_EVENT:
            handled = do_ctrl_close()
        else:
            handled = False
        # If not handled, call the next handler.
        return handled 

    if not kernel32.SetConsoleCtrlHandler(handler, True):
        raise ctypes.WinError(ctypes.get_last_error())

The do_ctrl_* functions could simply be sys.exit(1), which will ensure that atexit handlers get called.

@zooba
Copy link
Member

zooba commented Feb 8, 2019

I'm not sure it's quite as simple as calling sys.exit, but it would be a great project to bring universal cancellation support to all (regularly) blocking functions. Asyncio has suffered from this as well.

Part of the problem is that POSIX APIs often don't support cancellation, and so things have been designed in ways that prevent use of Windows's cancellation support (via APCs or kernel events). Given that we would have to emulate a lot of things on all platforms to make it consistent, this is certainly a PEP and long-term project. (And probably a lot of arguments with people who don't like new abstractions :( )

But on this particular issue, making the unconditional wait be interruptable by signals shouldn't be impossible. It's been done elsewhere, so probably just this one got missed.

@eryksun
Copy link
Contributor

eryksun commented Mar 3, 2021

But on this particular issue, making the unconditional wait be
interruptable by signals shouldn't be impossible.

PyThread_acquire_lock_timed() in Python/thread_nt.h currently ignores intr_flag. The current implementation calls EnterNonRecursiveMutex(), which in turn calls PyCOND_WAIT() / PyCOND_TIMEDWAIT() in Python/condvar.h. EnterNonRecursiveMutex() needs to support intr_flag and support a return value that indicates the wait was interrupted. PyThread_acquire_lock_timed() needs to handle this value by returning PY_LOCK_INTR.

When using emulated condition variables, a lock combines a semaphore and a critical section. Waiting on the semaphore can be integrated with the SIGINT event via WaitForMultipleObjects().

Here's a replacement for WaitForSingleObject() that integrates the SIGINT event, and supports long waits passed as a PY_TIMEOUT_T in microseconds (just for the sake of discussion; it's not rigorously tested code):

    unsigned long
    _Py_WaitForSingleObject(void *handle, PY_TIMEOUT_T microseconds,
                            int intr_flag)
    {
        DWORD result;
        DWORD handle_count;
        HANDLE handle_array[2];
        HANDLE sigint_event = NULL;
    LONGLONG timeout = -1;
    ULONGLONG deadline = 0;
        /* Store timeout in system time units of 100 ns. */
        if (microseconds >= 0) {
            QueryUnbiasedInterruptTime(&deadline);
            timeout = microseconds * 10;
            deadline += timeout;
        }
        handle_count = 1;
        handle_array[0] = (HANDLE)handle;
        if (intr_flag) {
            sigint_event = _PyOS_SigintEvent();
            if (sigint_event) {
                handle_array[handle_count++] = sigint_event;
                ResetEvent(sigint_event);
            }
        }
    do {
        ULONGLONG now;
        DWORD milliseconds;
            if (timeout < 0) {
                milliseconds = INFINITE;
            } else if (timeout < INFINITE * 10000) {
                milliseconds = timeout / 10000;
            } else {
                milliseconds = INFINITE - 1;
            }
            result = WaitForMultipleObjectsEx(
                        handle_count, handle_array, FALSE,
                        milliseconds, FALSE);
            if (sigint_event && result == WAIT_OBJECT_0 + 1) {
                /* Pretend that this was an alertable wait that
                   was interrupted by a user-mode APC queued to
                   the main thread by the C signal handler. It's
                   not implemented that way, but it could be. */
                result = STATUS_USER_APC;
            }

            if (result != WAIT_TIMEOUT) {
                break;
            }
        QueryUnbiasedInterruptTime(&now);
        timeout = deadline - now;
    } while (timeout >= 0);
        return result;
    }

If the wait returns STATUS_USER_APC, then the caller should call PyErr_CheckSignals(). intr_flag would presumably only be true when called from a thread that can handle signals, i.e. when _PyOS_IsMainThread() is true.

That said, if actual Windows condition variables are used (an alternate implementation in Python/condvar.h), then waiting is implemented via SleepConditionVariableSRW(). There's no way to integrate the SIGINT event with this wait, nor any documented way to cancel the wait from the console control thread. If this implementation is adopted, then maybe the few cases that require locks that support an interruptible wait can be implemented as a separate thread API.

@eryksun eryksun added 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes and removed 3.7 (EOL) end of life labels Mar 3, 2021
@alegrigoriev
Copy link
Mannequin

alegrigoriev mannequin commented Mar 3, 2021

@ericsun:

Windows calls the console event handler in a separate thread. The console event handler receives CTRL_C_EVENT, CTRL_BREAK_EVENT, console close, logoff, system shutdown events.

Originally, Windows devised an APC mechanism to simulate asynchronous delivery of Posix signal to threads. Those APCs are invoked during alertable wait functions. Delivery of an APS also aborts the wait with WAIT_IO_COMPLETION return code.

An APC can be queued by QueueUserAPC function.

An APC queue can be processed at any time by calling an alertable wait function with zero timeout, for example SleepEx(0, TRUE).

If you need an APC to break wait for asynchronous input (like console or serial port), use overlapped I/O with GetOverlappedResultEx function. To cancel the I/O request, use CancelIo function on the thread which issued the request. Note that you still need to wait for the cancelled request to complete the cancellation with GetOverlappedResult.

@eryksun
Copy link
Contributor

eryksun commented Mar 3, 2021

Alexander, I wrote the above sample function to be slotted directly into the existing design based on the SIGINT event. I wasn't looking to rewrite everything using user-mode APCs and alertable waits. A change like that could have ramifications for applications that already use alertable waits, depending on how resiliently they're designed.

Originally, Windows devised an APC mechanism to simulate asynchronous
delivery of Posix signal to threads.

IIRC, Iterix (i.e. SUA -- replaced nowadays by WSL) used Asynchronous Procedure Calls (APCs) to implement Unix signals. But APCs certainly weren't devised solely for the purpose of the providing signals in the POSIX subsystem. They're an evolution of Asynchronous System Traps (ASTs) in DEC VMS. (The lead designer of NT and most of the development team originally designed and implemented VMS at DEC. They began working at Microsoft to design the NT system and OS/2, Win32, and POSIX subsystems starting in late 1988.) Kernel-mode and user-mode APCs are fundamental and critical to NT (e.g. thread termination uses an APC), and particularly the I/O system, which uses a special kernel-mode APC for I/O request completion. (An I/O request is often serviced in an arbitrary thread context. Buffered I/O completion has to be queued back to the calling thread in order to copy from a system buffer to the user buffer.)

Those APCs are invoked during alertable wait functions. Delivery
of an APS also aborts the wait with WAIT_IO_COMPLETION return code.

WAIT_IO_COMPLETION is the same as STATUS_USER_APC, because I/O completion routines are queued as user-mode APCs (e.g. by ReadFileEx). Using the name "WAIT_IO_COMPLETION" clarifies the intent in this case. In general, I prefer "STATUS_USER_APC".

An APC queue can be processed at any time by calling an alertable
wait function with zero timeout, for example SleepEx(0, TRUE).

The user-mode APC queue can also be pumped by calling the NtTestAlert() system function. For example:

    import ctypes
    ntdll = ctypes.WinDLL('ntdll')
    kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

    @ctypes.WINFUNCTYPE(None, ctypes.c_void_p)
    def apc(p):
        print('spam APC')

    hthread = ctypes.c_void_p(-2)
    kernel32.QueueUserAPC(apc, hthread, None)
    >>> ntdll.NtTestAlert(hthread)
    spam APC
    0

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@vthemelis
Copy link

Hello,

Is anyone still looking at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes OS-windows type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

3 participants