Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sketch out a way to support multithreading #23

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Mr0grog
Copy link
Member

@Mr0grog Mr0grog commented Feb 7, 2020

This is less an actual PR than a sketch of one possible way to do things. It’s not necessarily meant to merge (or at least not exactly as-is). Trying to figure out how to take what I did in edgi-govdata-archiving/web-monitoring-processing#551 and integrate it more smoothly here.

This implementation is kinda wonky, but is the best way I've come up with to support sessions/clients across multiple threads and pooling connections across multiple threads. It's based on the kind of hacky implementation in edgi-govdata-archiving/web-monitoring-processing#551.

The basic idea here includes two-and-a-half pieces, and works around the fact that urllib3 is thread-safe, while requests is not:

  1. WaybackSession is renamed to UnsafeWaybackSession (denoting it should only be used on a single thread) and a new WaybackSession class just acts as a proxy to multiple UnsafeWaybackSessions, one per thread.

  2. A special subclass of requests's HTTPAdapter that takes an instance of urllib3's PoolManager to wrap. HTTPAdapter itself is really just a wrapper around PoolManager, but it always creates a new one. This version just wraps whatever one is given to it.

  3. UnsafeWaybackSession now takes a PoolManager as an argument, which, if provided, is passed to its HTTPAdapter. WaybackSession creates one PoolManager which it sets on all the actual UnsafeWaybackSession objects it creates and proxies access to. That way a single pool of requests is shared across many threads.

This is super wonky! It definitely makes me feel like we might just be better off dropping requests and just using urllib3 directly (especially given #2 -- which means requests wouldn't be part of our public interface in any way). But this is a smaller change that probably carries less short-term risk.

This implementation is kinda wonky, but is the best way I've come up with to support sessions/clients across multiple threads and pooling connections across multiple threads. It's based on the kind of hacky implementation in edgi-govdata-archiving/web-monitoring-processing#551.

The basic idea here includes two pieces, and works around the fact that urllib3 is thread-safe, while requests is not:

1. `WaybackSession` is renamed to `UnsafeWaybackSession` (denoting it should only be used on a single thread) and a new `WaybackSession` class just acts as a proxy to multiple UnsafeWaybackSessions, one per thread.

2. A special subclass of requests's `HTTPAdapter` that takes an instance of urllib3's `PoolManager` to wrap. `HTTPAdapter` itself is really just a wrapper around `PoolManager`, but it always creates a new one. This version just wraps whatever one is given to it. `UnsafeWaybackSession` now takes a `PoolManager` as an argument, which, if provided, is passed to its `HTTPAdapter`. `WaybackSession` creates one `PoolManager` which it sets on all the actual `UnsafeWaybackSession` objects it creates and proxies access to. That way a single pool of requests is shared across many threads.

This is super wonky! It definitely makes me feel like we might just be better off dropping requests and just using urllib3 directly (especially given #2 -- which means requests wouldn't be part of our public interface in any way). But this is a smaller change that *probably* carries less short-term risk.
Comment on lines +457 to +459
def request(self, *args, **kwargs):
concrete_session = self.get_unsafe_session()
return concrete_session.request(*args, **kwargs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost certainly there’s a better way to handle this (e.g. with magic methods, or maybe wrapt).

@Mr0grog
Copy link
Member Author

Mr0grog commented Mar 16, 2020

Ok, given the lack of feedback, I am probably going to move forward with testing, cleaning up, and merging this approach this week. I don’t think it’s a good approach, though, so I’m also going to start moving forward with #2 and switching to using urllib3 directly instead of requests as a next step. (Httpx claims to be headed for an API-stable release by April, so I want to look into that, too.)

@Mr0grog Mr0grog changed the base branch from master to main July 31, 2020 18:38
@Mr0grog
Copy link
Member Author

Mr0grog commented Sep 23, 2020

This has been sitting for a long time! A general update: the more I stare at this, the less comfortable I feel with it. It introduces a super-wonky abstraction to get around Requests’s lack of thread-safety and, while that makes multi-threaded use much easier, it probably introduces some maddening corner cases.

I’m pretty sure there’s no emergency here (nobody’s complained, and EDGI [a.k.a me] is the only user who desperately needs thread safety, but they already have a workaround, even if it’s ugly), so it’s probably better to take the time to put together a nicer solution. I think that means implementing #2 so that Requests is no longer part of our API, and then swapping Requests out for either urllib3 or Httpx, both of which should be thread-safe. (I’m less clear on Httpx’s sync mode being thread safe, so it needs more investigation. It’s also still in beta.)

I’m leaving this open in case the above doesn’t work out or we run into a real need to ship multithreading support sooner, but I don’t currently expect to finish and merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

1 participant