Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix possible race condition in get_async_backend, Fixes 425. #714

Merged
merged 12 commits into from
May 10, 2024

Conversation

DavidJiricek
Copy link
Contributor

@DavidJiricek DavidJiricek commented Apr 9, 2024

Related issue is #425

@agronholm
Copy link
Owner

My only concern with this is performance – this function is called a lot. I'd like to see some benchmarks against the previous version to ensure that there is no notable performance degradation.

@DavidJiricek
Copy link
Contributor Author

I ran a benchmark and was surprised by the performance drop. It was noticeable.

So I made a new, more performant version - it is not as bulletproof as before, but it works and still fixes the problem.

It may be insecure if a following occurs:
The loaded module already has the `backend_class' attribute, but is not fully initialized.

In this case, get_async_backend() will return an incomplete object.

But in the current version of the code, the backend_class attribute is assigned at the very end of the module initialization. So the above situation should not occur. And it is still much safer than the current version of get_async_backend().

@agronholm
Copy link
Owner

Looks much better. Please add a changelog note (credit yourself and add the appropriate issue link) and I'll merge.

@gschaffner
Copy link
Collaborator

gschaffner commented Apr 17, 2024

It may be insecure if a following occurs: The loaded module already has the backend_class attribute, but is not fully initialized.

In this case, get_async_backend() will return an incomplete object.

i believe that there are ways to do this that will incur neither the problem you mention in the quote nor major performance degradation. e.g. i think a technique like in master...9d2d95c should be good on both of these fronts1.

Footnotes

  1. i wrote this patchset a while back, but i got busy and thus never got around to submitting it :)

    some notes on it that i had drafted at the time:

    this should fix Thread race condition in _eventloop.get_asynclib() (with fix) #425 without degrading get_async_backend's performance.

    regarding the problem in the current code:

    The module will exist in sys.modules before the loader executes the module code. This is crucial because the module code may (directly or indirectly) import itself; adding it to sys.modules beforehand prevents unbounded recursion in the worst case and multiple loading in the best. --- https://docs.python.org/3/reference/import.html#loading

    regarding this proposed fix, which relies on Python's import locking: https://docs.python.org/3/whatsnew/3.3.html#a-finer-grained-import-lock, https://github.com/python/cpython/blob/v3.12.1/Lib/importlib/_bootstrap.py#L1357

    i did write a regression test for this but in order to force the race condition to occur it relies on (a) get_async_backend using importlib and (b) monkeypatching importlib privates. it also has a fixed runtime since the monkeypatched thread needs to hold the module mid-initialization for long enough that the main thread can check if get_async_backend can bypass the import lock.

    IMO that test should probably not be merged (what do you think about it?), but even if not merged it does serve as a test now (just not as a regression test for later).

@agronholm
Copy link
Owner

It may be insecure if a following occurs: The loaded module already has the backend_class attribute, but is not fully initialized.
In this case, get_async_backend() will return an incomplete object.

i believe that there are ways to do this that will incur neither the problem you mention in the quote nor major performance degradation. e.g. i think a technique like in master...9d2d95c should be good on both of these fronts1.

Footnotes

  1. i wrote this patchset a while back, but i got busy and thus never got around to submitting it :)

    some notes on it that i had drafted at the time:
    this should fix Thread race condition in _eventloop.get_asynclib() (with fix) #425 without degrading get_async_backend's performance.
    regarding the problem in the current code:

    The module will exist in sys.modules before the loader executes the module code. This is crucial because the module code may (directly or indirectly) import itself; adding it to sys.modules beforehand prevents unbounded recursion in the worst case and multiple loading in the best. --- https://docs.python.org/3/reference/import.html#loading

    regarding this proposed fix, which relies on Python's import locking: https://docs.python.org/3/whatsnew/3.3.html#a-finer-grained-import-lock, https://github.com/python/cpython/blob/v3.12.1/Lib/importlib/_bootstrap.py#L1357
    i did write a regression test for this but in order to force the race condition to occur it relies on (a) get_async_backend using importlib and (b) monkeypatching importlib privates. it also has a fixed runtime since the monkeypatched thread needs to hold the module mid-initialization for long enough that the main thread can check if get_async_backend can bypass the import lock.
    IMO that test should probably not be merged (what do you think about it?), but even if not merged it does serve as a test now (just not as a regression test for later).

Can you elaborate on how this is different from what anyio does right now?

@agronholm
Copy link
Owner

@DavidJiricek would you mind adding a changelog entry?

@davidbrochart
Copy link
Contributor

Just got hit by this race condition, thanks for the fix @DavidJiricek !

@gschaffner
Copy link
Collaborator

gschaffner commented Apr 23, 2024

Can you elaborate on how this is different from what anyio does right now?

get_async_backend wants to use a cache for performance, so that it has a fast-path that avoids the expensive importlib call. right now, the cache it uses for this is sys.modules. the problem with this is that modules are placed into sys.modules before they have been initialized [ref]. thus when get_async_backend hits the cache (return sys.modules[...].backend_class) it can attempt to return an attribute of a module that is not initialized. (this happens when another thread is currently initializing the backend module.)

the import statement (and importlib.import_module) avoid returning partially-initialized modules by using an import lock [ref]. by accessing the module from sys.modules directly, AnyIO is currently bypassing said lock.

the way that master...9d2d95c is different is: the cache it uses is not sys.modules. it has AnyIO maintain its own cache instead of trying to piggyback on sys.modules. this cache is specifically for memoizing get_async_backend, i.e. we only place initialized items into it. so our cache has the invariant we want (it can only contain initialized objects), unlike sys.modules.

@agronholm
Copy link
Owner

Can you elaborate on how this is different from what anyio does right now?

get_async_backend wants to use a cache for performance, so that it has a fast-path that avoids the expensive importlib call. right now, the cache it uses for this is sys.modules. the problem with this is that modules are placed into sys.modules before they have been initialized [ref]. thus when get_async_backend hits the cache (return sys.modules[...].backend_class) it can attempt to return an attribute of a module that is not initialized. (this happens when another thread is currently initializing the backend module.)

the import statement (and importlib.import_module) avoid returning partially-initialized modules by using an import lock [ref]. by accessing the module from sys.modules directly, AnyIO is currently bypassing said lock.

the way that master...9d2d95c is different is: the cache it uses is not sys.modules. it has AnyIO maintain its own cache instead of trying to piggyback on sys.modules. this cache is specifically for memoizing get_async_backend, i.e. we only place initialized items into it. so our cache has the invariant we want (it can only contain initialized objects), unlike sys.modules.

Yes, I can see that this is indeed a preferable solution. Why the WeakValueDictionary though?

@agronholm
Copy link
Owner

Following the suggestion of @gschaffner, I think you should modify the solution to store the backend class in a dict on a successful import. It can be a normal dict though; the weak-value dictionary seems unnecessary.

@gschaffner
Copy link
Collaborator

Yes, I can see that this is indeed a preferable solution. Why the WeakValueDictionary though?

It can be a normal dict though; the weak-value dictionary seems unnecessary.

yeah, agreed. I think my thought process was that weakrefs would let the module get GC'd if it's no longer in use and someone forcibly unloads it (del sys.modules[...]). but probably that's not a case we should worry about supporting.

@agronholm agronholm requested a review from gschaffner May 9, 2024 12:25
Copy link
Collaborator

@gschaffner gschaffner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple minor comments, at your discretion. otherwise LGTM!

src/anyio/_core/_eventloop.py Outdated Show resolved Hide resolved
src/anyio/_core/_eventloop.py Outdated Show resolved Hide resolved
docs/versionhistory.rst Outdated Show resolved Hide resolved
@agronholm agronholm merged commit 8e8b7f5 into agronholm:master May 10, 2024
16 checks passed
@agronholm
Copy link
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants