Skip to content

Commit

Permalink
Share data loader to across asyncio boto sessions
Browse files Browse the repository at this point in the history
By default, a botocore session creates and caches an instance of JSONDecoder which
consumes a lot of memory.  This issue was reported here boto/botocore#3078.
In the context of triggers which use boto sessions, this can result in excessive
memory usage and as a result reduced capacity on the triggerer.  We can reduce
memory footprint by sharing the loader instance across the sessions.
  • Loading branch information
dstandish committed Jul 8, 2024
1 parent dc08893 commit c510b79
Showing 1 changed file with 16 additions and 1 deletion.
17 changes: 16 additions & 1 deletion airflow/providers/amazon/aws/hooks/base_aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,19 @@

from airflow.models.connection import Connection # Avoid circular imports.

_loader = botocore.loaders.Loader()
"""
botocore data loader to be used with async sessions
By default, a botocore session creates and caches an instance of JSONDecoder which
consumes a lot of memory. This issue was reported here https://github.com/boto/botocore/issues/3078.
In the context of triggers which use boto sessions, this can result in excessive
memory usage and as a result reduced capacity on the triggerer. We can reduce
memory footprint by sharing the loader instance across the sessions.
:meta private:
"""


class BaseSessionFactory(LoggingMixin):
"""
Expand Down Expand Up @@ -155,7 +168,9 @@ def _apply_session_kwargs(self, session):
def get_async_session(self):
from aiobotocore.session import get_session as async_get_session

return async_get_session()
session = async_get_session()
session.register_component("data_loader", _loader)
return session

def create_session(
self, deferrable: bool = False
Expand Down

0 comments on commit c510b79

Please sign in to comment.