Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow boto3.client.put_object #409

Closed
alsonkemp opened this issue Dec 15, 2015 · 14 comments
Closed

Very slow boto3.client.put_object #409

alsonkemp opened this issue Dec 15, 2015 · 14 comments
Labels
closed-for-staleness guidance Question that needs advice or information. s3

Comments

@alsonkemp
Copy link

We're seeing an extremely puzzling issue: one of two machines, which are running identical code and nearly identical in configuration, exhibits wildly slower boto3.client('s3').put_object performance than the other machine (note: we only instantiate the client once per thread/process). Using boto3 and running multiple processes, Machine #2 transfers data at around 1.5Gbps while Machine #1 tranfers data at around 0.015Gbps.

The machine configurations are slightly different (mostly they have differing sets of network monitoring tools), so that's suspicious, but we've confirmed that uploading using the awscli tool runs at roughly 1Gbps on either machine. So Machine #1 and #2's network setups are fine.

Checking on raw boto3, we started up a fresh Python REPL and did a minimal test of boto3.client.put_object and saw the same very low performance on Machine #1.

We switched our upload script on Machine #2 from using boto3 to subprocess-calling awscli and Machine #2's performance headed towards Machine #1's (after accounting for the shelling-out-to-a-fresh-interpreter's effect on Amdahl's Law).

So we've ruled out all of the cases we can think of to explain the slowness of boto3.client.put_object on Machine #1 and are left with only boto3.client.put_object as the culprit. An additional strange characteristic of the slowness is that, using 'bmon', we're able to watch traffic on the interface slowly ramp up [exponentially?] until the file is completely uploaded (which can take up to a minute). Additionally, CPU sys % sits around 10% on Machine #1, which is similar to Machine #2 and indicates significant network activity (even though traffic is low).

Our usage of boto3 is basically (where data can be a 100MB MP4):

s3_client = boto3.client('s3')
conn = boto.connect_s3()
bucket = Bucket(BUCKET_NAME)

def upload(key, data):
    s3_client.put_object(Bucket=bucket.bucket_name,
                         StorageClass='REDUCED_REDUNDANCY',
                         Key=key,
                         Body=data,
                         Metadata={ 'source': args.source })

We've run out of ideas for diagnostics. Do you have any pointers for us or any ideas as to the failure mode we're seeing?

@rayluo
Copy link
Contributor

rayluo commented Dec 17, 2015

Just want to make sure I understand correctly. In your this paragraph:

We switched our upload script on Machine #2 from using boto3 to subprocess-calling awscli and Machine #2's performance headed towards Machine #1's (after accounting for the shelling-out-to-a-fresh-interpreter's effect on Amdahl's Law).

Did you mean subprocess-calling awscli on the faster machine (#2) slows it down? Or did you want to say subprocess-calling awscli on the slower machine (#1) speeds it up?

@alsonkemp
Copy link
Author

Ray,
Sorry for the confusion. I meant the latter. Shelling to awscli on
Machine #2 gets us 100x throughout as compared to using boto3 on Machine #2

@rayluo
Copy link
Contributor

rayluo commented Dec 17, 2015

So I guess you mean "Shelling to awscli on Machine TWO ONE gets us 100x throughout". I put my understanding in following table (you'd better visit this github page rather than just reading your email to see the nice table). Please correct me if some data were wrong. And we will look into this.

Test Method Machine 1 Machine 2
spec (for us to reproduce the issue) ? ?
raw boto3 0.015 Gbps 1.5Gbps
aws-cli 1 Gbps 1 Gbps
subprocess-calling awscli 1.5 Gbps? n/a?

@jamesls
Copy link
Member

jamesls commented Dec 17, 2015

@alsonkemp Just want to confirm: For the Body=data argument for put_object, the data arg is just a normal opened file object, something like data = open(filename) right? Also curious if you've had a chance to try out s3_client.upload_file?

@rayluo
Copy link
Contributor

rayluo commented Dec 17, 2015

Agree with @jamesls. And this is the documentation for the s3_client.upload_file(). It accepts a filename, and it will automatically split the big file into multiple chunks with default size as 8MB and default concurrency of 10, and each chunk is streaming through the aforementioned low level APIs. This will generally give you a much better throughput than a single thread put_object(). Please let us know whether it makes a difference.

@alsonkemp
Copy link
Author

@rayluo Argh. I thought that I might get #1 and #2 backwards. Yes, shelling to awscli on slow-machine yields results that are nearly as fast as boto3 on fast-machine. But the machines are otherwise identical. boto3 is fast on fast-machine and slow on slow-machine.

Adding a bit to your table:

Test Method Slow Machine Fast Machine
spec (for us to reproduce the issue) see below see below
raw boto3 0.015 Gbps 1.5Gbps
aws-cli 1 Gbps 1 Gbps
subprocess-calling awscli 1Gbps 1Gbps

Spec: unfortunately, the spec is, basically, configure two identical Dell R720xd machines with Debian Jessie, install boto3, open a Python shell in each, import boto3, use put_object to send a 100+MB file to S3. If you have our luck, you'll wind up with a slow-machine and a fast-machine...

@jamesls the body argument was being passed the file contents, not a file pointer. The docs (http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.put_object) say that Body is of type b'bytes'. I'm not sure how this would affect one machine and not the other, but we'll get back with results from the change.

@kapilt
Copy link

kapilt commented Jan 29, 2016

different env vars perhaps, ie slow machine using a proxy? compare library and interpreter versions on both, ie. falling back to interpreted elementtree would do a 10x over the native. Else its profile. on network perhaps one is using an s3 native vpc endpoint and the other is using a proxy.

@koiker
Copy link

koiker commented Apr 29, 2016

Did you try the function boto3.s3.transfer instead of put_object?

@mheilman
Copy link

mheilman commented Jun 6, 2016

I might be running into this, or something related. The AWS CLI seems to provide faster download speeds than boto3 (I'm not sure about uploads, but I assume it's the same). On a d2.8xlarge instance, which has 10gbps networking and fast storage, I was able to download a large file of randomly generated data (8 GB) at about 150 MB/s using the CLI, but only about 35-40 MB/s using boto3. Here's the boto3 code I used (the particular transfer config settings didn't seem to have all that much effect either with the CLI or boto3, and I believe I also tried the defaults with similar results):

import boto3
import logging
from concurrent.futures import ProcessPoolExecutor
logging.basicConfig(level='DEBUG')
logging.getLogger('botocore').setLevel('INFO')
client = boto3.client('s3')

config = boto3.s3.transfer.TransferConfig(
    multipart_threshold=64 * 1024 * 1024,
    max_concurrency=10,
    num_download_attempts=10,
    multipart_chunksize=16 * 1024 * 1024,
    max_io_queue=10000
)
config = boto3.s3.transfer.TransferConfig()
transfer = boto3.s3.transfer.MultipartDownloader(client, config, boto3.s3.transfer.OSUtils())
transfer.download_file('my-bucket-name-here', 'path/to/key/here/foo.npy', 'foo.npy', 8000000000, {})

IIRC, I also tried the simpler download_file and got the same 35-40 MB/s.

Am I missing something obvious in the configuration or something?

Thanks.

@keven425
Copy link

keven425 commented Jul 8, 2018

I'm experiencing a similar issue. When I'm calling download_file() on a boto3 s3 bucket, the speed is < 1% the speed of awscli call. The strange thing is, when I use tcp_check to monitor the network traffic, the speed ramps up gradually to something reasonable.

This is a ubuntu machine not hosted on AWS.

@keven425
Copy link

keven425 commented Jul 9, 2018

@alsonkemp did you ever figure out what was the cause?

@alsonkemp
Copy link
Author

@keven425 To be honest, I have little recollection of the issue (it's been nearly three years). The issue was in a cluster of 8 nearly identical Debian Jessie Dell R720xd machines which were uploading 10 minute videos to AWS S3 (of mice & rats; see https://vium.com). The machines were directly connected, via a router, to AWS Direct Connect over a 10gbps optical fiber.

@joeflack4
Copy link

Taking about 1-2 minutes to transfer a 15mb file using put_object. Been using this for weeks and normally it just takes a few seconds.

I suppose the only difference is that normally I'm running this at work on MacOS, and today I'm running it from home. But I'm having no other noticeable issues with my internet today (or ever).

joeflack4 added a commit to joeflack4/pma-api that referenced this issue Mar 25, 2019
…model to independent class. Fixed things that broke as a result. WIP.

- Restore remote database: Feature now successfully implemented for Heroku deployments.
- Addressed issue of slow occasional cloud backup by printing warning about possible upload time and printing elapsed seconds. Related GitHub issue: boto/boto3#409
- Fixed bug where all flash messages were showing under 'danger' class
- Added object 'id' attr to list_datasets and dataset table in admin portal.
- Prevented redundant 'You have signed in' message from appearing.
@swetashre swetashre added the guidance Question that needs advice or information. label Feb 28, 2020
@swetashre
Copy link
Contributor

Following up with old issue. As it has been more than a year since the last comment is anyone having any problem with the latest version of Boto3 ? If yes, please reopen a new issue and i would be happy to help.

@swetashre swetashre added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Sep 2, 2020
@github-actions github-actions bot added closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Sep 6, 2020
@github-actions github-actions bot closed this as completed Sep 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-for-staleness guidance Question that needs advice or information. s3
Projects
None yet
Development

No branches or pull requests

9 participants