Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in case of multiple event loops #611

Open
bcmills opened this issue Sep 6, 2024 · 5 comments
Open

Race condition in case of multiple event loops #611

bcmills opened this issue Sep 6, 2024 · 5 comments

Comments

@bcmills
Copy link

bcmills commented Sep 6, 2024

It appears that the alru_cache decorator is using an unsynchronized OrderedDict instance for its cache:

self.__cache: OrderedDict[Hashable, _CacheItem[_R]] = OrderedDict()

If I am not mistaken, that will lead to a data race if the decorated function is called simultaneously from different asyncio event loops running on different threads — which seems to be a realistic (if rare) situation (compare python/cpython#93462 (comment)).

If I understand correctly, other asyncio wrappers generally bind the event loop during initialization and fail explicitly if called from a different event loop. Unfortunately, since the alru_cache decorator generally runs at module init time there is no such loop on the thread.

I don't have a clear idea of how this problem could be fixed; I just wanted to point it out as a (significant and subtle) risk for users of this module.

@bcmills
Copy link
Author

bcmills commented Sep 6, 2024

Note that even if the individual dict operations were protected by the GIL, there are read-modify-write sequences in the code that IIUC are not:

cache_item = self.__cache.get(key)
if cache_item is not None:
self._cache_hit(key)
if not cache_item.fut.done():
return await asyncio.shield(cache_item.fut)
return cache_item.fut.result()
fut = loop.create_future()
coro = self.__wrapped__(*fn_args, **fn_kwargs)
task: asyncio.Task[_R] = loop.create_task(coro)
self.__tasks.add(task)
task.add_done_callback(partial(self._task_done_callback, fut, key))
self.__cache[key] = _CacheItem(fut, None)

@Dreamsorcerer
Copy link
Member

Not sure if multiple loops is something we really want to care about, seems like a bad idea overall. But, could maybe prefix the cache per loop or something? {loop_id: {...}}

@bcmills
Copy link
Author

bcmills commented Sep 7, 2024

Having multiple loops seems more-or-less inevitable if someone is trying to incrementally adopt async in a large existing codebase.

I would be reluctant to use a per-loop cache: it would depress the hit rate, and I'm not sure how you'd safely evict a loop from the cache to prevent a memory leak if it terminates.

I suppose you could use a threading.Lock instead? If there's only a single event loop a non-blocking acquire will always succeed, and if you do end up needing to block I think you could run the acquire in a separate thread (using loop.run_in_executor on the current loop).

@tabrezm
Copy link

tabrezm commented Oct 29, 2024

I believe we're running into this problem or something related after having recently adopted async-lru.

We use a ProcessPoolExecutor to spin up processes which each have their own event loop. The event loop gets created and destroyed multiple times throughout the lifecycle of the process. We're seeing an issue where the cache result is sometimes empty for existing processes:

  File "/home/[redacted]/.local/lib/python3.12/site-packages/async_lru/__init__.py", line 208, in __call__
    return cache_item.fut.result()
           ^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/[redacted]/main.py", line 614, in <module>
    success, failed = future.result()
                      ^^^^^^^^^^^^^^^
  File "/opt/python3.12/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
asyncio.exceptions.CancelledError

@Dreamsorcerer
Copy link
Member

Well, feel free to try the solution I mentioned and open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants