Clarify multithreading documentation #1246

dfee · 2017-07-18T21:54:55Z

The documentation for boto3 states that:

It is recommended to create a resource instance for each thread / process in a multithreaded or multiprocess application rather than sharing a single instance among the threads / processes.(emphasis mine)

The documentation than goes on to show a code example where a session is created per thread, not a resource.

Reading through previous github issues, I see a note that we should create a separate session per thread. The comment that immediately follows says "resource", however.

So, do we need one session per thread, or are sessions thread safe, but not resources? Is there a 1:1 mapping between the thread safety of a resource and a client?

As a bonus question, how expensive / wasteful is it to create new clients (or sessions, as above) on-demand per executor thread...?

The text was updated successfully, but these errors were encountered:

JordonPhillips · 2017-07-19T17:05:45Z

Thanks for the feedback! I agree that it's a little ambiguous, we'll make sure to get that updated.

dfee · 2017-07-25T23:09:50Z

In the meantime (before you update the documentation), can you provide some insight to those questions? Thanks :)

joethompsonduo · 2018-03-07T20:04:50Z

An update is still needed.

shadycuz · 2018-03-28T13:00:57Z

This ^

joethompsonduo · 2018-03-28T13:11:36Z

Once again, echoing the sentiment that some sort of response is needed. Can you acknowledge that the maintainers are at least receiving these communications?

jweinst1 · 2018-05-07T23:54:24Z

Question, for firing large amounts of s3 restore_object requests, across multiple threads, what is the safest approach to take?

tuukkamustonen · 2018-06-05T07:15:40Z

Also wondering how this is. I've been carelessly sharing session and client instances over threads for years without (perceived) problems. Not saying it's the way to go, but I wonder what are the cases where it might break?

Even the docs state:

It is recommended to create ...

I read that as is "you may do just fine without... but we don't guarantee it", which is pretty vague.

rahulr3 · 2018-06-23T07:17:03Z

Can someone comment ? I'm working on a multithreaded application and each thread is creating it's own session/ resource. Are sessions/resource thread safe ? How expensive it is to create session, resource, clients at scale ?

jsyrjala · 2018-09-20T14:24:48Z

@JordonPhillips It would be nice to have some documentation about the thread safety matters.

rkiyanchuk · 2018-09-28T01:00:12Z

@kyleknap @jamesls

Could someone please provide the clarification:
is boto threadsafe per resource, client, or session?

It's been more than a year with no response :(

lhufnagel · 2018-10-08T06:51:22Z

Am I right in assuming that with the switch to urllib3 (which promises thread-safety) in botocore 1.11.0 we should be good?

NickWoodhams · 2018-11-27T05:41:54Z

I would also like some insight on this. I have been using put_object in celery workers and getting intermittent errors and I cannot figure out if this is due to the number of concurrent workers and limitations from AWS as to the number of clients, or an issue of threading within Boto.

Could someone please provide some insight? I would be very grateful.

ori-n · 2018-12-04T07:05:42Z

Could someone please share a code example for working with a session per thread? Thanks

hajapy · 2019-08-07T00:28:13Z

I would echo that this remains an issue. Clearer documentation would be helpful, as would general thread safety in the construction of clients/resources/sessions.

It seems that the boto3 library is not threadsafe. The solution discussed in GitHub issues such as boto/botocore#1246 boto/boto3#1592 and the documentation at https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html?highlight=multithreading#multithreading-multiprocessing suggest a very simple change that sems to make things work.

nebi-frame · 2020-02-13T14:45:51Z

I have no idea why no one from AWS gives a clear explanation for all these issues raised.

LyleScott · 2020-02-22T02:42:20Z

Instead of

import boto3
s3 = boto3.resource("s3")
s3.Object(bucket_name, filename).put(Body=s.getvalue())

I used

import boto3.session
sess = boto3.session.Session()
c = sess.client("s3")
c.put_object(Bucket=bucket_name, Key=filename, Body=s.getvalue())

and then multithreading worked fine. the boiler plate for that is something like:

# i'm using a partial here ... which is the  slightly more complicated case.
from functools import partial
from io import StringIO
from multiprocessing.pool import ThreadPool

def generate_db_rows(db_client) -> None:
    s = StringIO()
    s.write("some content...")

    # actual line that matters... this is a session per thread.
    sess = boto3.session.Session()

    client = sess.client("s3")
    client.put_object(Bucket='something', Key='thread_unique_filename.csv', Body=s.getvalue())

table_names = ("foo", "bar", "baz")
func = partial(generate_db_rows, db_client)
with ThreadPool(processes=10) as pool:
        pool.map(func, table_names)

rohan-mo · 2020-03-04T08:39:30Z

@JordonPhillips I've just seen this documentation and it is still vague as was originally pointed out by the issue author.
Could you please help clarify the documentation here?

Luis-Palacios · 2020-05-08T18:59:53Z

I'm joining the party here, I need to query from different regions depending on certain parameters on each request and just realized initializing a boto3 resource is not exactly cheap so planning on initialization a resource for each possible region on app start as a singleton and then using the corresponding resource on each request

I started thinking about thread-safe, I went to docs and I was not able to understand if this is going to be thread-safe.

I will highly appreciate clarifications too

jsmodic · 2020-05-16T02:58:01Z

I've fought with this issue a few times due to adverse effects of role based authentication and multiple botocore sessions.

From reading the code alone, it's clear there are thread safety problems. This initialization pattern combined with how it lazy initializes things and then how it creates clients is definitely not thread safe, and that's just something that immediately stands out.

However, if you look at other parts, it's obvious there is careful consideration to thread safety.

So it really looks like it is supposed to be thread safe but isn't in practice.

dash-samuel · 2020-09-22T10:31:31Z

Hi everyone in my case I can successfully create multiple threads that share the same session, and am able to download from an S3 bucket for example without problems.

This is how I do it:

import concurrent.futures
import boto3
import json

# setup client and session
sess = boto3.session.Session()
client = sess.client("s3")

files = ["path-to-file.json", "path-to-file2.json"] 

def download_from_s3(file_path):
    obj = client.get_object(Bucket="<your-bucket>", Key=file_path)
    resp = json.loads(obj["Body"].read())
    return resp

with concurrent.futures.ThreadPoolExecutor() as executor:
     executor.map(download_from_s3, files)

Creating a session for each thread in my case results in a big slow down, whereas with this approach I am seeing an up to 7x improvement in performance compared to synchronous downloads.

tomaszhlawiczka · 2021-01-13T09:24:50Z

The doc https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html#multithreading-and-multiprocessing states:

It is recommended to create a resource instance for each thread / process in a multithreaded or multiprocess application rather than sharing a single instance among the threads / processes.

Then how can I benefit from reused connections with max_pool_connections → https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html#botocore-config ?

jmehnle · 2021-03-05T18:44:07Z

@dash-samuel wrote:

Hi everyone in my case I can successfully create multiple threads that share the same session, and am able to download from an S3 bucket for example without problems.

Your threads are sharing the client, which is officially thread-safe. This explains why your example works, but this doesn't mean you can generally share a session across threads. Directly operating (such as creating clients) on a shared session from multiple threads is not thread-safe per the boto3 Session docs.

Addresses boto/botocore#1246

ryansonshine · 2021-05-05T20:04:18Z

To summarize, there are two cases, one being multithreading and the other being multiprocessing.

Session: Unsafe in all cases due to shared metadata/urllib3.

Resource: Unsafe in all cases due to its direct interaction with a Botocore session.

Client: (Assuming the client is not used to interact with the underlying Botocore Session) Safe in a threaded environment, unsafe in a multiprocess environment. This is due to forking issues with urllib3’s connection pool and leaves botocore unable to guarantee http messages are read in the right order if the pool isn’t created under the same PID. (psf/requests#4323)

I've opened up a PR boto/boto3#2848 and am requesting feedback if this makes it more clear.

Addresses boto/botocore#1246

ryansonshine · 2021-05-12T14:31:56Z

@nebi-frame @jsmodic @mattsb42-aws Do you think the PR boto/boto3#2848 adds clarity to this?

Addresses boto/botocore#1246

…#2848) Addresses boto/botocore#1246

nateprewitt · 2021-05-19T17:46:52Z

We've merged boto/boto3#2848 today, adding detailed information on multi-threading requirements for each of the main Boto3 primitives (Clients, Resources and Sessions). The updated documentation should be in the next release. I'll leave this open until the end of the week for any further feedback and we'll plan to resolve afterwards. Thanks everyone!

github-actions · 2021-05-21T18:54:45Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

blakete · 2023-10-25T19:29:41Z

To summarize, there are two cases, one being multithreading and the other being multiprocessing.

Session: Unsafe in all cases due to shared metadata/urllib3.

Resource: Unsafe in all cases due to its direct interaction with a Botocore session.

Client: (Assuming the client is not used to interact with the underlying Botocore Session) Safe in a threaded environment, unsafe in a multiprocess environment. This is due to forking issues with urllib3’s connection pool and leaves botocore unable to guarantee http messages are read in the right order if the pool isn’t created under the same PID. (psf/requests#4323)

I've opened up a PR boto/boto3#2848 and am requesting feedback if this makes it more clear.

@ryansonshine is comment still up to date? My main take away is that clients are thread safe while sessions and resources are not.

ryansonshine · 2023-10-25T19:59:40Z

To summarize, there are two cases, one being multithreading and the other being multiprocessing.
Session: Unsafe in all cases due to shared metadata/urllib3.
Resource: Unsafe in all cases due to its direct interaction with a Botocore session.
Client: (Assuming the client is not used to interact with the underlying Botocore Session) Safe in a threaded environment, unsafe in a multiprocess environment. This is due to forking issues with urllib3’s connection pool and leaves botocore unable to guarantee http messages are read in the right order if the pool isn’t created under the same PID. (psf/requests#4323)
I've opened up a PR boto/boto3#2848 and am requesting feedback if this makes it more clear.

@ryansonshine is comment still up to date? My main take away is that clients are thread safe while sessions and resources are not.

Hi @blakete , the information merged on PR boto/boto3#2848 is up to date.

anteph · 2024-02-08T15:04:40Z

Hi @ryansonshine , sorry to revive this old thread :)

I was reading the generic problem with sharing a boto3 client with multiple processes. It is my understanding that it is related to the urllib3 connection pool that boto uses under the hood, which is problematic if shared amongst processes.

I was going through one of the linked issues (psf/requests#4323) and this caught my attention:

It can’t happen merely by creating a Session before forking, it has to actually be used before the fork.

I'm no expert on the lower level details of what a fork does, but I believe in Unix it uses a copy on write approach, meaning that it copies the parent process memory only when it effectively intends to modify it. However, this doesn't work for neither sockets or opened files (meaning that they are not fork safe).

So, my expectation is that it would be ok if the connection pool was initialized in the parent process and then used by the child processes because they would eventually get a copy, as long as the connection pool was never used in the parent process (thus, no socket created prior to fork).

Translating this a layer up to Boto, my expectation would be that it is safe to initialize a boto3 client (creating the object) in the parent process and have the child processes using that instance (eventually it will get copied), as long as there was never any operation being performed prior to the fork.

The reason I'm asking this is because of the use case Celery + Boto with Celery using fork to spawn workers.
I'm was basically planning to initialize the boto3 client prior to the fork and use it for operations only after.
The reason for doing this is because celery does import app modules in the master process and only then does the fork: celery/celery#6036. This means that any global variables we declare are evaluated still in the master process.

Do you think this would be safe or am I making incorrect assumptions? I'm also not sure if the initialization of the boto3 client itself is somehow doing some network call that could be already filling the conn pool, so this may eventually be dangerous anyway?

I can always work this around with a lazy initialization of the object, but before doing so wanted to be sure it is really necessary.

Thanks :)

JordonPhillips added documentation This is a problem with documentation. enhancement This issue requests an improvement to a current feature. labels Jul 19, 2017

joguSD added the response-requested Waiting on additional info and feedback. label Aug 4, 2017

rkiyanchuk mentioned this issue Oct 3, 2018

Exceptions coming from boto3/botocore when running boto3.client('sts') too many times simultaneously boto/boto3#1592

Open

komuw mentioned this issue Mar 21, 2019

maybe use one botocore client per thread komuw/wijisqs#28

Closed

hajapy mentioned this issue Jul 25, 2019

S3 channels failing with new multi-threaded package metadata retrieval conda/conda#8993

Closed

birdsarah mentioned this issue Sep 7, 2019

Intermittent 'PermissionError: Access Denied' when trying to read S3 file from AWS Lambda fsspec/s3fs#218

Closed

longbowrocks mentioned this issue Oct 16, 2019

Excessive memory usage on multithreading boto/boto3#1670

Open

derpferd mentioned this issue Nov 3, 2019

Added credentials option for iter_bucket. piskvorky/smart_open#372

Merged

swetashre removed the response-requested Waiting on additional info and feedback. label Mar 16, 2020

jameskrach mentioned this issue Sep 29, 2020

SNOW-216307 Make PUT statements threadsafe snowflakedb/snowflake-connector-python#437

Closed

jwhitlock mentioned this issue Jan 14, 2021

Datamap script: Update logging and switch to low-level S3 client mozilla/ichnaea#1480

Merged

ryansonshine added a commit to ryansonshine/boto3 that referenced this issue May 5, 2021

Add clarification on multithreading and multiprocessing for resources

1c0ae8d

Addresses boto/botocore#1246

ryansonshine mentioned this issue May 5, 2021

Add clarification on multithreading and multiprocessing for resources boto/boto3#2848

Merged

ryansonshine added a commit to ryansonshine/boto3 that referenced this issue May 5, 2021

Add clarification on multithreading and multiprocessing for resources

354ad25

Addresses boto/botocore#1246

ryansonshine added a commit to ryansonshine/boto3 that referenced this issue May 5, 2021

Add clarification on multithreading and multiprocessing for resources

ef48773

Addresses boto/botocore#1246

ryansonshine added a commit to ryansonshine/boto3 that referenced this issue May 13, 2021

Add clarification on multithreading and multiprocessing for resources

abb7656

Addresses boto/botocore#1246

ryansonshine added a commit to ryansonshine/boto3 that referenced this issue May 13, 2021

Add clarification on multithreading and multiprocessing for resources

55306e5

Addresses boto/botocore#1246

ryansonshine added a commit to ryansonshine/boto3 that referenced this issue May 14, 2021

Add clarification on multithreading and multiprocessing for resources

e0bde37

Addresses boto/botocore#1246

ryansonshine added a commit to ryansonshine/boto3 that referenced this issue May 18, 2021

Add clarification on multithreading and multiprocessing for resources

e46955a

Addresses boto/botocore#1246

nateprewitt pushed a commit to boto/boto3 that referenced this issue May 19, 2021

Add clarification on multithreading and multiprocessing for resources (…

12f5016

…#2848) Addresses boto/botocore#1246

nateprewitt closed this as completed May 21, 2021

hajapy mentioned this issue Jun 29, 2021

Don't create an unused s3 client at import time. conda/conda#10516

Merged

abend-arg mentioned this issue Aug 3, 2021

Memory leak kislyuk/watchtower#34

Closed

shantanutrip mentioned this issue Sep 1, 2021

[build] Add safety report to docker image aws/deep-learning-containers#1186

Merged

14 tasks

iainelder mentioned this issue Jan 15, 2022

How to reduce memory usage? connelldave/botocove#20

Closed

jessedobbelaere mentioned this issue Nov 21, 2022

Multi-threading issues on information_schema queries dbt-labs/dbt-athena#43

Closed

tibbe mentioned this issue Mar 30, 2023

Clarify multi-threading and client creation documentation #2898

Open

akx mentioned this issue Jul 18, 2023

Avoid a KeyError when a ComponentLocator is being called concurrently #2985

Merged

Azmisov mentioned this issue Feb 11, 2024

Is Session/Client thread or asyncio safe? aio-libs/aiobotocore#1088

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify multithreading documentation #1246

Clarify multithreading documentation #1246

dfee commented Jul 18, 2017

JordonPhillips commented Jul 19, 2017

dfee commented Jul 25, 2017

joethompsonduo commented Mar 7, 2018

shadycuz commented Mar 28, 2018

joethompsonduo commented Mar 28, 2018

jweinst1 commented May 7, 2018

tuukkamustonen commented Jun 5, 2018 •

edited

Loading

rahulr3 commented Jun 23, 2018

jsyrjala commented Sep 20, 2018

rkiyanchuk commented Sep 28, 2018 •

edited

Loading

lhufnagel commented Oct 8, 2018 •

edited

Loading

NickWoodhams commented Nov 27, 2018

ori-n commented Dec 4, 2018

hajapy commented Aug 7, 2019

nebi-frame commented Feb 13, 2020

LyleScott commented Feb 22, 2020 •

edited

Loading

rohan-mo commented Mar 4, 2020

Luis-Palacios commented May 8, 2020 •

edited

Loading

jsmodic commented May 16, 2020

dash-samuel commented Sep 22, 2020 •

edited

Loading

tomaszhlawiczka commented Jan 13, 2021

jmehnle commented Mar 5, 2021 •

edited

Loading

ryansonshine commented May 5, 2021

ryansonshine commented May 12, 2021

nateprewitt commented May 19, 2021 •

edited

Loading

github-actions bot commented May 21, 2021

blakete commented Oct 25, 2023

ryansonshine commented Oct 25, 2023

anteph commented Feb 8, 2024

Clarify multithreading documentation #1246

Clarify multithreading documentation #1246

Comments

dfee commented Jul 18, 2017

JordonPhillips commented Jul 19, 2017

dfee commented Jul 25, 2017

joethompsonduo commented Mar 7, 2018

shadycuz commented Mar 28, 2018

joethompsonduo commented Mar 28, 2018

jweinst1 commented May 7, 2018

tuukkamustonen commented Jun 5, 2018 • edited Loading

rahulr3 commented Jun 23, 2018

jsyrjala commented Sep 20, 2018

rkiyanchuk commented Sep 28, 2018 • edited Loading

lhufnagel commented Oct 8, 2018 • edited Loading

NickWoodhams commented Nov 27, 2018

ori-n commented Dec 4, 2018

hajapy commented Aug 7, 2019

nebi-frame commented Feb 13, 2020

LyleScott commented Feb 22, 2020 • edited Loading

rohan-mo commented Mar 4, 2020

Luis-Palacios commented May 8, 2020 • edited Loading

jsmodic commented May 16, 2020

dash-samuel commented Sep 22, 2020 • edited Loading

tomaszhlawiczka commented Jan 13, 2021

jmehnle commented Mar 5, 2021 • edited Loading

ryansonshine commented May 5, 2021

ryansonshine commented May 12, 2021

nateprewitt commented May 19, 2021 • edited Loading

github-actions bot commented May 21, 2021

⚠️COMMENT VISIBILITY WARNING⚠️

blakete commented Oct 25, 2023

ryansonshine commented Oct 25, 2023

anteph commented Feb 8, 2024

tuukkamustonen commented Jun 5, 2018 •

edited

Loading

rkiyanchuk commented Sep 28, 2018 •

edited

Loading

lhufnagel commented Oct 8, 2018 •

edited

Loading

LyleScott commented Feb 22, 2020 •

edited

Loading

Luis-Palacios commented May 8, 2020 •

edited

Loading

dash-samuel commented Sep 22, 2020 •

edited

Loading

jmehnle commented Mar 5, 2021 •

edited

Loading

nateprewitt commented May 19, 2021 •

edited

Loading