-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 client no reuse of HTTPS connections #1128
Comments
Not able to repro this. We rely on requests/urllib3 to do the connection pooling for us. Given the short duration in your logs, the only think I can think of for why it's creating a new connection would be if these requests happen in separate threads. If I try this script below I only get a single connection:
I get:
If you try my script (replacing with your own bucket name) do you get similar output? |
Hmm... I can't reproduce it myself either. from botocore.exceptions import ClientError
import boto3
import boto3.session
boto3.set_stream_logger('')
session = boto3.session.Session()
s3 = session.client('s3')
B = 'symbols-public-dev'
K = 'test_boto3_issue_1128.txt'
for i in range(10):
try:
s3.head_object(
Bucket=B,
Key=K,
)
print("EXISTED ALREADY")
except ClientError as exc:
if exc.response['Error']['Code'] == '404':
print("DID NOT EXIST")
else:
raise
s3.put_object(Bucket=B, Key=K, Body=b'asdfasdf') and the output
I need to dig deeper to see what of the environment that's causing the recreation of a connection. |
A ha! I figured it out. At least how to reproduce it. from botocore.exceptions import ClientError
import boto3
import boto3.session
boto3.set_stream_logger('')
session = boto3.session.Session()
s3 = session.client('s3')
B = 'symbols-public-dev'
K_base = 'test_boto3_issue_1128_{}.txt'
import random
for i in range(10):
K = K_base.format(random.random())
try:
s3.head_object(
Bucket=B,
Key=K,
)
print("EXISTED ALREADY")
except ClientError as exc:
if exc.response['Error']['Code'] == '404':
print("DID NOT EXIST")
else:
raise
s3.put_object(Bucket=B, Key=K, Body=b'asdfasdf') Now, the filename (aka. key name) is always different so every time it checks if the file is there, it concludes that it needs to do the If you do that you get I suspect that because the there's an exception frame, it resets the connection. Do you know a different way to ask if the file exists without having to rely on |
For the record, my code above is a bit nuts because it does the |
Sure enough, if you do this logic instead, it works! for i in range(10):
K = K_base.format(random.random())
response = s3.list_objects(
Bucket=B,
Prefix=K,
)
existing_keys = [x['Key'] for x in response['Contents']]
if K in existing_keys:
print("EXISTED ALREADY")
else:
print("DID NOT EXIST")
s3.put_object(Bucket=B, Key=K, Body=b'asdfasdf') So the bug here is that raising an exception does reset any connections. Now, I wonder, if it's ANY exception or just |
I injected an error... try:
1/0
except Exception:
pass in the loop and it didn't reset the connection. |
For the record, the problem with starting a new connection only happens if you do a |
Possibly related to boto/botocore#1248? |
@peterbe @joguSD it is the same issue as boto/botocore#1248. @peterbe the reason you see new connections after an error is because S3 responses on error sets the Transfer-Encoding to chunked, which triggers this bug in urllib3: urllib3/urllib3#1234. You can turn on logging/wire logs and see the problem first hand:
In the first case, the key exists and there is no "Transfer-Encoding" header. In the second case, the key does not exist, the "Transfer-Encoding: chunked" header is set and the connection is not released (the last part I omitted). One of these bugs should be set as the dup of the other. |
Duplicate of boto/botocore#1248 |
I have a function that looks something like this:
(Note! I didn't copy-n-paste my real code. Tried to "dumb" it down to the bare essentials for the sake of this issue)
So, as you can see, it opens a .zip file, for each file within it does a
HeadObject
followed by aPutObject
(if the file needs to be uploaded again).The code works. For a .zip file with about 40 files, the whole thing finishes in a couple of seconds.
However, I get this logging output:
(Sorry for the formatting. It comes from my Celery task worker)
The point is, even though I reuse the same session (all within under 60 seconds) it seems to make a new connection each time.
Either this is user error or a bug in that it's unable to re-use an established HTTP connection.
boto3==1.4.4
botocore==1.5.66
s3transfer==0.1.10
The text was updated successfully, but these errors were encountered: