Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async code in aiobotocore calls synchronous functions of boto3 #940

Closed
zferentz opened this issue May 16, 2022 · 16 comments
Closed

Async code in aiobotocore calls synchronous functions of boto3 #940

zferentz opened this issue May 16, 2022 · 16 comments

Comments

@zferentz
Copy link

zferentz commented May 16, 2022

Describe the bug
While running some performance test (simple loop, code attached below) we believe that we found a problem where creating a session/resource in aioboto3 is calling an async function in aiobotocore which is calling a blocking function in boto3.

Code
The following code demonstrates the problem - the more sessions/resources we create - the more time it takes which shows that it's not real async.

import asyncio
import datetime
import aioboto3

async def create_session_and_resource():
    session = aioboto3.Session()
    async with session.resource('dynamodb', region_name='us-east-1',
                                endpoint_url='http://fakehost:8123'
                                ) as dynamo_resource:
        pass

async def main(N):
    tasks = [create_session_and_resource() for _ in range(N)]
    t1 = datetime.datetime.now()
    await asyncio.gather(*tasks)
    t2 = datetime.datetime.now()
    print(f"N={N} total duration= {(t2 - t1).total_seconds()}")


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main(2))
    loop.run_until_complete(main(10))
    loop.run_until_complete(main(50))

pip results

aioboto3                         8.0.5
aiobotocore                    1.0.4
boto3                              1.12.32
botocore                         1.15.32

Environment:

  • Python Version: 3.6.15 (similar behavior with Python3.8)
  • OS name and version: Ubuntu 20.04.1

Additional info/context
We opened an issue with aioboto3 (terricain/aioboto3#254) but the assignee was right to ask us to bring it up here.

We believe that the block is happening because aiobotocore calls synchronous functions of boto3.

As an example, think of the coroutine named _create_client (implemented in aiobotocoro/session) which calls self._get_internal_component('endpoint_resolver') .

def _get_internal_component(self, name):
        # While this method may be called by botocore classes outside of the
        # Session, this method should **never** be used by a class that lives
        # outside of botocore.
        return self._internal_components.get_component(name)

which eventually hits the get_component method in botocore.session which
executes the deffered (sync) function as can be seen here:

class ComponentLocator(object):
    """Service locator for session components."""
    def __init__(self):
        self._components = {}
        self._deferred = {}

    def get_component(self, name):
        if name in self._deferred:
            factory = self._deferred[name]
            self._components[name] = factory()
            # Only delete the component from the deferred dict after
            # successfully creating the object from the factory as well as
            # injecting the instantiated value into the _components dict.
            del self._deferred[name]
        try:
            return self._components[name]
        except KeyError:
            raise ValueError("Unknown component: %s" % name)

We wonder if it's possible to assure the asynchronicity of both the Session and the resource creation to eliminate the blocking.
Our backend code creates many sessions/resources and we're trying to optimize its performance.

Thank you in advance !!

@dacevedo12
Copy link

Our backend code creates many sessions/resources

They do recommend keeping a single persistent session for long-running processes like servers. We did this a month ago and saw slightly better performance. Just my 2 cents

@thehesiod
Copy link
Collaborator

I'm not following, you want to make the factory() method async? it does very little work, doesn't make sense to make async. And yes as @dacevedo12 mentions, you should not create many sessions/clients, they should be long lived.

@zferentz
Copy link
Author

@dacevedo12 - thanks for the feedback !!! i totally agree with you and we're working to update our system to use single Session object, I have a feeling that the problem is not with the Session but rather with creating the resource so i'm not sure that having long-lived session will solve the problem.

@thehesiod - that exactly what i initially thought, however it bothers me a bit that a nonbocking (async) function actually calls a blocking function and it blocks the entire ioloop. I'm just not sure if this is indeed the intended/designed behavior . I also admit that i tried to fix it but the code is complex , it's even hard for me to understand what's going on...

@thehesiod
Copy link
Collaborator

@zferentz that's totally normal, you're not going to have all your code async, only that which makes blocking calls, or eventually makes blocking calls.

@zferentz
Copy link
Author

@thehesiod that's my point - i think that the creating the session (or maybe it's the resource) makes a blocking call to boto3 which leads to a blocking I/O call (which shouldn't happen in an async call).
I agree that using a single session/resource for the entire service will probably workaround the problem .

Either way, appreciate your effort and feedback !!

@thehesiod
Copy link
Collaborator

@zferentz I don't see any blocking io calls, can you point that out?

@zferentz
Copy link
Author

zferentz commented May 17, 2022

For example, i see that the call async with session.resource(...) calls self._loader.load_service_model(...) which calls botocore's list_available_services which eventually calls os.listdir twice and then checks if file exists using os.path.isfile .

The top level call is async however all the other calls are non async and since we call os.path - we are basically calling a blocking function that access the filesystem which is a blocking call.

Now, since the list_available_services is enumerating all the supported services, we end up with hundred calls to os.listdir and os.path.isfile .
Of course filesystem cache can help here but it's still an async call that calls multiple I/O blocking calls.

@thehesiod
Copy link
Collaborator

@zferentz ya but there are no python built-in async functions for file io, we'd have to swap to using aiofiles or manually via threading which is non ideal. There really needs to be built-in python async file APIs for that to work as intended, otherwise you're spinning up a bunch of heavy-weight, GIL'd python threads. Further most people should be on cached SSDs which should make this rather trivial. If you're blocked on listdir, should inspect why the directory its listing has so many files.

@zferentz
Copy link
Author

@thehesiod Thank you for the immediate feedback. First, I'm happy that i was not completely wrong and we do have an async code that calls blocking I/O functions. I wasnt 100% sure and i am happy that you see it as well.

Now, indeed Python doesnt provide a builtin mechanism for async filesystem operations but i wonder if we can optimize it by caching this data (as we can assume that the list of libraries/modules wont change throughout the lifetime of the application).

Either way,i do realize that there's no easy solution here.

@thehesiod
Copy link
Collaborator

@zferentz it is cached, at the session/client level :), and people really shouldn't be re-creating sessions/clients often as there's no need. In terms of async code calling blocking io yes and no. file IO really should not be a problem on modern systems unless your filesystem is on a network, in which case probably caching should be controlled at the driver level, or, the code is doing something dumb (like I'd imagine that listfiles probably should be a glob with a filepattern instead, and if there are a bagillion files in the dir something is probably badly designed for that folder). Also caching is out of scope for aiobotocore as it's just async'ifying botocore. Caching belongs in the botocore project. Also globals=bad in general. Are you bringing this up due to a real world problem or theoretical? Without a real world scenario I can't give a good recommendation.

@zferentz
Copy link
Author

@thehesiod Thanks again for your quick reply !!
I do agree with everything you've mentioned but i think that caching as a simple dict class-member will speed up the performance for many use cases.

For example, in my example - one of our older libraries is creating many resources/sessions and for each one the loader more than 200 blocking system calls to the filesystem (which is indeed local, standard python ). The net result (on my system) is that the entire ioloop is blocked and i can see how other async operations (standard aiothttp calls) are waiting for the session-creation.

I guess if no one else raised this topic then it means that everyone indeed using single session for the entire app (which i hope to do in the future) or just people are not aware of this issue.

Either way, if you believe that there's no easy way to fix it - feel free to close this ticket . thanks a lot.

@dacevedo12
Copy link

dacevedo12 commented May 19, 2022

@zferentz tbh I'm glad you reported this as we were also seeing some performance shenanigans but weren't really sure whether event loop skews due to blocking calls played a part in the problem.

For instance, I sometimes see aiobotocore+aiohttp requests to dynamodb taking seconds (for context, that's outrageously slow for non-scan dynamodb operations), whereas cloudwatch always reports times under 100ms. We're currently keeping a global long-lived session but still instancing the resource on every operation

@thehesiod
Copy link
Collaborator

FYI to track slow sync calls you can use something like this: https://gist.github.com/thehesiod/0d2beed27c800e4f7286be6c3cd17a09 That's what we have in production to ensure sync calls aren't blocking the main thread for too long

@thehesiod
Copy link
Collaborator

@zferentz I think it would be great to speed up botocore, actually I proposed some changes to how they parse timestamp years ago that sped up our scenario 20% and they have yet to merge my change, so unfortunately getting them to change things can be glacial. However I suggest the following, given list_available_services is doing a file listing, and one wouldn't reasonably expect the service list to change during the lifetime of the app I think it would be one of the few instances where it would be worthwhile to cache in the service list in a global. I would make a monkeypatch to replace that method with one that caches globally, then write a small python test that shows the speed difference by running both versions for like 100 times or so. The open a botocore bug to propose your solution. This is really a botocore issue and not an aiobotocore issue IMO. Let me know if you disagree.

@thehesiod
Copy link
Collaborator

and in either case can use your monkeypatch to speed up your use case :). Actually if this really does take a significant amount of time I'd like to use this monkeypatch at our company as well, or perhaps we could ship an optional set of speedups in aiobotocore. Please do report your test and findings here!

@dacevedo12 dacevedo12 mentioned this issue Jun 1, 2022
6 tasks
@thehesiod
Copy link
Collaborator

ok going to close as this isn't really an issue with aiobotocore as 1) the performance issue is really an underlying botocore issue and 2) unfortunately we don't have batteries included async file support yet :(

feel free to re-open if I missed something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants