-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-128130: Fix unhandled keyboard interrupt data race #129975
gh-128130: Fix unhandled keyboard interrupt data race #129975
Conversation
Use an atomic operation when setting `_PyRuntime.signals.unhandled_keyboard_interrupt`. We now only clear the variable at the start of `_PyRun_Main`, which is the same function where we check it. This avoids race conditions where previously another thread might call `run_eval_code_obj()` and erroneously clear the unhandled keyboard interrupt.
@@ -1120,11 +1123,9 @@ _PyErr_Display(PyObject *file, PyObject *unused, PyObject *value, PyObject *tb) | |||
Py_XDECREF(print_exception_fn); | |||
if (result) { | |||
Py_DECREF(result); | |||
_PyRuntime.signals.unhandled_keyboard_interrupt = unhandled_keyboard_interrupt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this being removed from the _PyErr_Display code paths? It was added by @pablogsal 's #110702 work & @iritkatriel 's review.
While we're sadly lacking comments in the code as to why... I believe this was for the scenario where we're heading back into Python code within the traceback printing and could thus receive a KeyboardInterrupt during that which we either wind up swallowing (bad - the "multiple Ctrl-C required to trigger the handler raising from a point the KeyboardInterrupt exception can propagate upwards to interrupt actual execution" problem that Python has had at times in the past? OR that none of the python code in the traceback printing path can swallow the KeyboardInterrupt state and prevent an ultimate exit_sigint() from terminating the process? (did I just describe the same thing twice?)
I'd be conservative and leave these here in _PyErr_Display
, just using its now required atomic read and stores. It shouldn't be a speed-path, we're already rendering big strings and sending them out some file handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm generally sympathetic to being conservative when making changes, the unhandled_keyboard_interrupt
logic can't be fixed by using atomic reads and writes. The saving and restoration done here could still end up erroneously clearing it in a different thread (even in the GIL-enabled build).
It also is not useful here anymore. We previously cleared unhandled_keyboard_interrupt
in run_eval_code_obj
, which is called from many places, including _PyErr_Display
potentially. Now we are clearing it in Py_RunMain
, which is the same place we read it.
The most important thing I'd like to convince you of is that we should be handling KeyboardInterrupt
the same way as SystemError
. They are both exceptions that when unhandled should lead to Python exiting with specific error code and/or behavior.
I believe this was for the scenario where we're heading back into Python code within the traceback printing and could thus receive a KeyboardInterrupt during that which we either wind up swallowing...
That's true before and after this PR -- the code did not handle that case. See this demo, for example: https://gist.github.com/colesbury/6871cc3d412c5d4fc51c7fd70e33e8a2
I think we should better handle that case, but probably not as part of this PR. This PR largely keeps the same behavior and handles the same cases as before, but avoids the race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! This code is one of those places that I go "eew" every time I dive in on it; challenging to reason about and hard to craft reliable specific regression tests for.
I like the after state of this change better. Definitely cleaner to keep the clearing and use in one place.
Use an atomic operation when setting
_PyRuntime.signals.unhandled_keyboard_interrupt
. We now only clear the variable at the start of_PyRun_Main
, which is the same function where we check it.This avoids race conditions where previously another thread might call
run_eval_code_obj()
and erroneously clear the unhandled keyboard interrupt._PyRuntime.signals.unhandled_keyboard_interrupt
when callingeval()
concurrently in free-threading mode #128130