Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128130: Fix unhandled keyboard interrupt data race #129975

Merged

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Feb 10, 2025

Use an atomic operation when setting
_PyRuntime.signals.unhandled_keyboard_interrupt. We now only clear the variable at the start of _PyRun_Main, which is the same function where we check it.

This avoids race conditions where previously another thread might call run_eval_code_obj() and erroneously clear the unhandled keyboard interrupt.

Use an atomic operation when setting
`_PyRuntime.signals.unhandled_keyboard_interrupt`. We now only clear the
variable at the start of `_PyRun_Main`, which is the same function where
we check it.

This avoids race conditions where previously another thread might call
`run_eval_code_obj()` and erroneously clear the unhandled keyboard
interrupt.
@@ -1120,11 +1123,9 @@ _PyErr_Display(PyObject *file, PyObject *unused, PyObject *value, PyObject *tb)
Py_XDECREF(print_exception_fn);
if (result) {
Py_DECREF(result);
_PyRuntime.signals.unhandled_keyboard_interrupt = unhandled_keyboard_interrupt;
Copy link
Member

@gpshead gpshead Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this being removed from the _PyErr_Display code paths? It was added by @pablogsal 's #110702 work & @iritkatriel 's review.

While we're sadly lacking comments in the code as to why... I believe this was for the scenario where we're heading back into Python code within the traceback printing and could thus receive a KeyboardInterrupt during that which we either wind up swallowing (bad - the "multiple Ctrl-C required to trigger the handler raising from a point the KeyboardInterrupt exception can propagate upwards to interrupt actual execution" problem that Python has had at times in the past? OR that none of the python code in the traceback printing path can swallow the KeyboardInterrupt state and prevent an ultimate exit_sigint() from terminating the process? (did I just describe the same thing twice?)

I'd be conservative and leave these here in _PyErr_Display, just using its now required atomic read and stores. It shouldn't be a speed-path, we're already rendering big strings and sending them out some file handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm generally sympathetic to being conservative when making changes, the unhandled_keyboard_interrupt logic can't be fixed by using atomic reads and writes. The saving and restoration done here could still end up erroneously clearing it in a different thread (even in the GIL-enabled build).

It also is not useful here anymore. We previously cleared unhandled_keyboard_interrupt in run_eval_code_obj, which is called from many places, including _PyErr_Display potentially. Now we are clearing it in Py_RunMain, which is the same place we read it.

The most important thing I'd like to convince you of is that we should be handling KeyboardInterrupt the same way as SystemError. They are both exceptions that when unhandled should lead to Python exiting with specific error code and/or behavior.

I believe this was for the scenario where we're heading back into Python code within the traceback printing and could thus receive a KeyboardInterrupt during that which we either wind up swallowing...

That's true before and after this PR -- the code did not handle that case. See this demo, for example: https://gist.github.com/colesbury/6871cc3d412c5d4fc51c7fd70e33e8a2

I think we should better handle that case, but probably not as part of this PR. This PR largely keeps the same behavior and handles the same cases as before, but avoids the race conditions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! This code is one of those places that I go "eew" every time I dive in on it; challenging to reason about and hard to craft reliable specific regression tests for.

I like the after state of this change better. Definitely cleaner to keep the clearing and use in one place.

@colesbury colesbury merged commit 451f291 into python:main Feb 13, 2025
49 checks passed
@colesbury colesbury deleted the gh-128130-unhandled_keyboard_interrupt branch February 13, 2025 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants