-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive memory usage on multithreading #1670
Comments
I forgot to mention that I added cyclic garbage collection into the 5 second loop that will display the memory. If this is removed, the memory will increase even more (and it doesn't seem to stop) which means someone is also leaking circular references. Now I noticed something even worse. If I create a new session inside the loop, the memory usage will be even higher, even with the garbage collection in place. This program I linked is simple enough and yet the memory issues are so visible I'm wondering if no one saw it before or if maybe this is related to some recent boto version. boto3: 1.7.71 Program output https://pastebin.com/Nm4dWPKJ :
|
Some more investigation (sorry for so much noise):
I was initially only testing Python 2.7.15, but now that I also ran the program on Python 3.7.0 the memory usage is about half (500MB) with or without cyclic garbage collection, which is great. On Python 3, the leak still happens if I create the session within the for loop on every thread! Just the increase in memory is slower this time. I decided to test older boto versions (from boto3 1.0 to 1.7) with Python 2.7 and they all show the leaking pattern when session is created inside a loop, BUT on boto 1.5 and lower memory usage is 100 MB lower and on boto 1.2 and lower memory takes 2 minutes to reach that value instead of 20 seconds. I noticed that if I explicitly do I cannot do this in my code since I need to reuse resources and I'm starting to be out of options... |
At first I thought this might be related to boto/botocore#1248 which is the only confirmed leak I know of. However, looking into this it seems to me that this is related to the client/resource object. That being said this isn't a memory leak, the reason you're seeing the ramp up in memory is that each time you create a session/client we have to go to disk to load the JSON models representing the service, etc. There's so much contention on a single file it takes ~20-30 seconds to even instantiate all 100 session/clients, considering each session has its own cache I'm actually not all that surprised by these memory usage numbers. I would suggest doing something like this: def run_pool(size):
ts = []
session = boto3.Session()
for x in range(size):
s3 = session.resource('s3')
t = Boto3Thread(s3)
t.start()
ts.append(t)
return ts This way you only instantiate one session, and can actually leverage the caching that the session provides to instantiate all 100 resource objects to give to each thread. |
@joguSD Sorry but that doesn't explain why memory is not being released even with cyclic garbage collection in place. And the most strange part is if I do The code you provided is very alike from what I ended up using in my application but besides the 100 threads with the same lifespan as the program, I occasionally run other things in parallel on other threads and those add to total memory that is never freed! After a few days, my 8GB server is out of memory again. This is at least poor memory management on boto3 part. The only solution I am seeing is revert 2 weeks of porting code and go back to boto 2.49 |
We also started experiencing this. Here's a quick output of py-spy. Note the excessive thread count (and corresponding memory usage).This code was stable for years and dozens of boto3/botocore versions. There's clearly something buggy in the transition to urllib3. The app code here is using the s3 transfer service to upload a few files. Its also worth noting, this app isn't using threads where this triggers, the thread usage are entirely from s3transfer library. py 2.7.15, version freeze below
|
@kapilt Considering the original issue was raised before the |
@joguSD that's fair, re-reading its not entirely clear its the same ilk. i'll file as a separate issue after some more analysis and differential to the urllib3 change along and checking s3 transfer parameters to not use threads. fwiw, we do create a bunch of sessions as well but all are out of scope here and free to be gc'd. |
@joguSD same problem here ! using boto3 to upload about 30000 little files, |
confirm, just a simple creation of a |
Fwiw at least for my app switching s3transfer to not use threads resolved a lot of issues wrt to memory. |
Hi, we also hit the same problem. The memory keeps increasing and doesn't get released. I tried patching some of the AWS code (including the caching decorators so that they don't cache), manually clearing the loader cache, and adjusting the model loaders not to load the documentation. I noticed as well that the session has a register function, but unregister isn't called, so I kept track of the registered objects and called that too (not sure if that makes a difference). That seemed to bring down the memory, at the expense of caching, but I didn't notice any speed difference. Any feedback or ideas from the AWS team about this? |
I'm too experiencing this issue with se S3 Boto client. Reading bucket objects keeps the memory usage pretty well, but writing them with |
We have noticed this problem too. We are using this in the backend of a flask web application. By nature, the web application is multithreaded. So we cannot instantiate just one session globally in the app.
|
@antonbarua I suppose something like that might be possible, but it might not be all that practical. Stripping the model down isn't as simple as just keeping the operations you want to use. You'd have to figure out what shapes are needed and which are orphaned and then remove them. The documentation is there for tools built on top of botocore like the AWS CLI, but from the pure SDK perspective I could see why you wouldn't want this. If you were really inclined you could do a tree-shake of sorts on the model stripping it down to what you need and placing it in |
Hi @joguSD, if trimming the model to keep only the desired operations isn't practical, what do you think about having an option that disables the cache and calling unregister as described in #1670 (comment) ? |
Is there any wokaround while the fix is on the way :( ? |
Hi @Gloix , Initial s3 session with static variable can fix the memory leak situation. import threading
import boto3
import os
import base64
import time
import random
import psutil
BUCKET = '' # <--- YOUR BUCKET NAME HERE
MIN_WAIT = 1
MAX_WAIT = 20
class Boto3Thread(threading.Thread):
daemon = True
is_running = True
__s3_client = boto3.client('s3', region_name='us-east-1')
def run(self):
path = 'test_boto/'
while self.is_running:
file_name = path + 'file_' + str(random.randrange(100000))
content = base64.b64encode(os.urandom(100000)).decode()
self.__s3_client.put_object(
Bucket=BUCKET,
Key=file_name,
Body=content,
ContentType='text/plain'
)
if not self.is_running:
# Avoid an useless sleep cycle
break
sleep_duration = random.randrange(MIN_WAIT, MAX_WAIT)
#print('{} will sleep for {} seconds'.format(self.name, sleep_duration))
time.sleep(sleep_duration)
def check_memory():
import gc
gc.collect()
process = psutil.Process(os.getpid())
return process.memory_info().rss / 1024. / 1024.
def run_pool(size):
ts = []
for x in range(size):
t = Boto3Thread()
t.start()
ts.append(t)
return ts
def stop_pool(ts):
for t in ts:
t.is_running = False
for t in ts:
t.join()
def main():
ts = run_pool(100)
try:
while True:
print('Process Memory: {:.1f} MB'.format(check_memory()))
time.sleep(5)
except KeyboardInterrupt:
pass
finally:
print('Wait for all threads to finish. Should take about {} seconds!'.format(MAX_WAIT))
stop_pool(ts)
main() |
Sorry I'm late to the party, but @joguSD may I ask about the suggestion you made (quoted below)?
I'm asking because according to https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html#multithreading-multiprocessing it is not recommended for multiple threads to share a session. So, if I have ten threads making separate S3 requests, should they share a session or not? |
@yjhouzz The documentation you linked to states (emphasis mine):
The resource in the code snippet is not shared, just the session is. |
@irgeek read further:
Read issue boto/botocore#1246 for more info. |
I’ve started to use boto3 in a flask application and got the ´cannot allocate memory’ error. Is there any update in this issue and some best practices to use boto3 with flask ? |
any solution to this ? My code is very simple but i have memory leaks. I am trying to download 50GB file.
|
I was having the same issue (Flask+boto3+AWS Elastic beanstalk) and it crashed the server for multiple times due to out of memory exception. I tried Eventually I figured out that I've to run the function(that uses boto3) separately in a different process(separate python script), so that when the sub-process terminated it also free the memory.
The method is not elegant and it's just a workaround, but it works though. |
I do observe the same issue in a slightly different context when downloading larger files (10GB+) in Docker containers with a hard limit on memory, with a single boto3 session and no multithreaded invocation of In some cases I can also observe the same error as mentioned in #1670 (comment):
It seems that disabling threading in From what I observed so far the most reliable mitigation for me was to reduce the multipart chunk size ( |
Has anyone found a workaround for an application like Flask where one session cannot be instantiated globally? |
@cscholer cache and reuse sessions. A thread-local cash is fine. |
@cschloer @longbowrocks Below is the code I use (slightly edited) to replace the There are limitations to this and I offer no guarantees. Use with caution. import json
import hashlib
import time
import threading
import boto3.session
DEFAULT_REGION = 'us-east-1'
KEY = None
SECRET = None
class AWSConnection(object):
def __init__(self, function, name, **kw):
assert function in ('resource', 'client')
self._function = function
self._name = name
self._params = kw
if not self._params:
self._identifier = self._name
else:
self._identifier = self._name + hash_dict(self._params)
def get_connection(self):
thread = threading.currentThread()
if not hasattr(thread, '_aws_metadata_'):
thread._aws_metadata_ = {
'age': time.time(),
'session': boto3.session.Session(),
'resource': {},
'client': {}
}
try:
connection = thread._aws_metadata_[self._function][self._identifier]
except KeyError:
connection = create_connection_object(
self._function, self._name, session=thread._aws_metadata_['session'], **self._params
)
thread._aws_metadata_[self._function][self._identifier] = connection
return connection
def __repr__(self):
return 'AWS {0._function} <{0._name}> {0._params}'.format(self)
def __getattr__(self, item):
connection = self.get_connection()
return getattr(connection, item)
def create_connection_object(function, name, session=None, region=None, **kw):
assert function in ('resource', 'client')
if session is None:
session = boto3.session.Session()
if region is None:
region = DEFAULT_REGION
key, secret = KEY, SECRET
# Do not set these variables unless they were configured on parameters file
# If they are not present, boto3 will try to load them from other means
if key and secret:
kw['aws_access_key_id'] = key
kw['aws_secret_access_key'] = secret
return getattr(session, function)(name, region_name=region, **kw)
def hash_dict(dictionary):
""" This function will hash a dictionary based on JSON encoding, so changes in
list order do matter and will affect result.
Also this is an hex output, so not size optimized
"""
json_string = json.dumps(dictionary, sort_keys=True, indent=None)
return hashlib.sha1(json_string.encode('utf-8')).hexdigest()
def resource(name, **kw):
return AWSConnection('resource', name, **kw)
def client(name, **kw):
return AWSConnection('client', name, **kw) |
Really appreciate the (very) quick and thorough response @jbvsmo You're solution mostly worked for me - I combined it with simply reducing the number of processes in my UWSGI config - I think I was expecting too much from my tiny (1GB memory) server so I reduced the # of processes from 10 to 5. |
Creating a new session each time S3Storage is instantiated creates a memory leak. It seems that S3Storage can get created a bunch of times (I'm seeing it get created again and again as my app runs) and a boto3's Session takes loads of memory (see boto/boto3#1670) so my app eventually runs out of memory. This should fix the issue while still avoiding using the same session across different threads.
This is totally crazy. s3 client session drain out all our memory resources. |
We ran into this problem today. The memory leak was crashing our servers |
Same here. Using S3 in a FastAPI service drains all memory, eventually crashing the service. |
Is there any news on this issue? We run into the same problem which results in crashing servers. Without the boto3.session our servers consume < 200 MB while with boto3.session it accumulates to over 16GB until the servers crash. I'll try this work around for now: |
Is there any pressing reason each Creating client/sessions is brutal on memory, and also incredibly slow (a full second). If the JSON blob was simply loaded once and then all clients/sessions in every thread simply referred to it, that seems like it would solve our problem. Are these blobs being mutated by sessions/clients in some way? They seem to be AWS service maps so I'm assuming not. --- edit --- I performed a hacky tested by wrapping This cut No doubt my hack introduced race conditions between threads (please nobody replicate+ deploy this!), but serves as a proof of concept of this being a valuable improvement to some. |
Yes it would be much better if the data loaded from these JSON files were either shared across all sessions & clients in a thread-safe way, or possibly even just replaced by python modules containing the same data (importlib gives thread safety for free). Stripping documentation to save memoryIn the mean time, I have obtained a modest saving (up to ~2 MB memory per client depending on the AWS service) by recursively blanking the values of the keys For example this reduces the S3 service definition ( Here's the basic idea: use glob to find all the service-2.json files and then for each one:
def blank_values_by_key_recurse(obj, keys_to_blank: list[str]):
if isinstance(obj, dict):
for key in list(obj.keys()):
if key in keys_to_blank:
obj[key] = ''
else:
blank_values_by_key_recurse(obj[key], keys_to_blank)
elif isinstance(obj, list):
for item in obj:
blank_values_by_key_recurse(item, keys_to_blank)
# else: do nothing, it's a leaf value
with open(path, 'w', encoding='utf8') as fp:
# ensure_ascii=False gives a closer match to what botocore ships: unicode characters present rather than \uNNNN
json.dump(dict_obj, fp, indent=indent, separators=(',', ':')), ensure_ascii=False) Stripping out unused endpoints/partitionsI also tried a more drastic step of stripping down It did seem to be fine however to remove the partitions that I wasn't interested in (everything apart from the standard "aws") which reduces endpoints.json by ~15% which is still worthwhile given that is loaded in every session (i.e. once per thread even in the best case where you cache sessions per thread and reuse). Of course the best route would be for the boto team to engage on this issue and consider proper fixes like those proposed at the start of this comment. EDIT: I've filed boto/botocore#3078 with specifics and proposed improvements. |
Creating a new session each time S3Storage is instantiated creates a memory leak. It seems that S3Storage can get created a bunch of times (I'm seeing it get created again and again as my app runs) and a boto3's Session takes loads of memory (see boto/boto3#1670) so my app eventually runs out of memory. This should fix the issue while still avoiding using the same session across different threads.
Creating a new session each time S3Storage is instantiated creates a memory leak. It seems that S3Storage can get created a bunch of times (I'm seeing it get created again and again as my app runs) and a boto3's Session takes loads of memory (see boto/boto3#1670) so my app eventually runs out of memory. This should fix the issue while still avoiding using the same session across different threads.
I have been trying to debug a "memory leak" in my newly upgraded boto3 application. I am moving from the original boto 2.49.
My application starts a pool of 100 thread and every request is queued and redirected to one of these threads and usual memory for the lifetime of the appication was about 1GB with peaks of 1.5GB depending of the operation.
After the upgrade I added one
boto3.Session
per thread and I access multiple resources and clients from this session which are reused throughout the code. On previous code I would have a boto connection of each kind per thread (I use several services like S3, DynamoDB, SES, SQS, Mturk, SimpleDB) so it is pretty much the same thing.Except that each boto3.Session alone uses increases memory usage immensely and now my application is running on 3GB of memory instead.
How do I know it is the boto3 Session, you ask? I created 2 demo experiments with the same 100 threads and the only difference on both is using boto3 in one and not on the other.
Program 1: https://pastebin.com/Urkh3TDU
Program 2: https://pastebin.com/eDWPcS8C (Same thing with 5 lines regarding boto commented out)
Output program 1 (each print happens 5 seconds after the last one):
Now with plain multiple threads and no AWS access.
Output program 2 (each print happens 5 seconds after the last one):
Alone the boto3 session object is retaining 10MB per thread in a total of about 1GB. This is not acceptable from an object that should not be doing much more than requesting stuff to the AWS servers only. It means that the Session is keeping lots of unwanted information.
You could be wondering if it is not the resource that is keeping live memory. If you move the resource creation to inside the for loop, the program will also hit the 1GB in the exact the same 15 to 20 seconds of existence.
In the beginning I tried garbage collecting for cyclic references but it was futile. The decrease in memory was only a couple megabytes.
I've seen people complaining on botocore project on something similar (maybe not!), so it might be a shared issue.
boto/botocore#805
The text was updated successfully, but these errors were encountered: