-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow boto3.client.put_object #409
Comments
Just want to make sure I understand correctly. In your this paragraph:
Did you mean subprocess-calling awscli on the faster machine (#2) slows it down? Or did you want to say subprocess-calling awscli on the slower machine (#1) speeds it up? |
So I guess you mean "Shelling to awscli on Machine
|
@alsonkemp Just want to confirm: For the |
Agree with @jamesls. And this is the documentation for the s3_client.upload_file(). It accepts a filename, and it will automatically split the big file into multiple chunks with default size as 8MB and default concurrency of 10, and each chunk is streaming through the aforementioned low level APIs. This will generally give you a much better throughput than a single thread put_object(). Please let us know whether it makes a difference. |
@rayluo Argh. I thought that I might get #1 and #2 backwards. Yes, shelling to awscli on slow-machine yields results that are nearly as fast as boto3 on fast-machine. But the machines are otherwise identical. boto3 is fast on fast-machine and slow on slow-machine. Adding a bit to your table:
Spec: unfortunately, the spec is, basically, configure two identical Dell R720xd machines with Debian Jessie, install boto3, open a Python shell in each, import boto3, use put_object to send a 100+MB file to S3. If you have our luck, you'll wind up with a slow-machine and a fast-machine... @jamesls the body argument was being passed the file contents, not a file pointer. The docs (http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.put_object) say that Body is of type b'bytes'. I'm not sure how this would affect one machine and not the other, but we'll get back with results from the change. |
different env vars perhaps, ie slow machine using a proxy? compare library and interpreter versions on both, ie. falling back to interpreted elementtree would do a 10x over the native. Else its profile. on network perhaps one is using an s3 native vpc endpoint and the other is using a proxy. |
Did you try the function boto3.s3.transfer instead of put_object? |
I might be running into this, or something related. The AWS CLI seems to provide faster download speeds than boto3 (I'm not sure about uploads, but I assume it's the same). On a d2.8xlarge instance, which has 10gbps networking and fast storage, I was able to download a large file of randomly generated data (8 GB) at about 150 MB/s using the CLI, but only about 35-40 MB/s using boto3. Here's the boto3 code I used (the particular transfer config settings didn't seem to have all that much effect either with the CLI or boto3, and I believe I also tried the defaults with similar results):
IIRC, I also tried the simpler download_file and got the same 35-40 MB/s. Am I missing something obvious in the configuration or something? Thanks. |
I'm experiencing a similar issue. When I'm calling This is a ubuntu machine not hosted on AWS. |
@alsonkemp did you ever figure out what was the cause? |
@keven425 To be honest, I have little recollection of the issue (it's been nearly three years). The issue was in a cluster of 8 nearly identical Debian Jessie Dell R720xd machines which were uploading 10 minute videos to AWS S3 (of mice & rats; see https://vium.com). The machines were directly connected, via a router, to AWS Direct Connect over a 10gbps optical fiber. |
Taking about 1-2 minutes to transfer a 15mb file using I suppose the only difference is that normally I'm running this at work on MacOS, and today I'm running it from home. But I'm having no other noticeable issues with my internet today (or ever). |
…model to independent class. Fixed things that broke as a result. WIP. - Restore remote database: Feature now successfully implemented for Heroku deployments. - Addressed issue of slow occasional cloud backup by printing warning about possible upload time and printing elapsed seconds. Related GitHub issue: boto/boto3#409 - Fixed bug where all flash messages were showing under 'danger' class - Added object 'id' attr to list_datasets and dataset table in admin portal. - Prevented redundant 'You have signed in' message from appearing.
Following up with old issue. As it has been more than a year since the last comment is anyone having any problem with the latest version of Boto3 ? If yes, please reopen a new issue and i would be happy to help. |
We're seeing an extremely puzzling issue: one of two machines, which are running identical code and nearly identical in configuration, exhibits wildly slower boto3.client('s3').put_object performance than the other machine (note: we only instantiate the client once per thread/process). Using boto3 and running multiple processes, Machine #2 transfers data at around 1.5Gbps while Machine #1 tranfers data at around 0.015Gbps.
The machine configurations are slightly different (mostly they have differing sets of network monitoring tools), so that's suspicious, but we've confirmed that uploading using the awscli tool runs at roughly 1Gbps on either machine. So Machine #1 and #2's network setups are fine.
Checking on raw boto3, we started up a fresh Python REPL and did a minimal test of boto3.client.put_object and saw the same very low performance on Machine #1.
We switched our upload script on Machine #2 from using boto3 to subprocess-calling awscli and Machine #2's performance headed towards Machine #1's (after accounting for the shelling-out-to-a-fresh-interpreter's effect on Amdahl's Law).
So we've ruled out all of the cases we can think of to explain the slowness of boto3.client.put_object on Machine #1 and are left with only boto3.client.put_object as the culprit. An additional strange characteristic of the slowness is that, using 'bmon', we're able to watch traffic on the interface slowly ramp up [exponentially?] until the file is completely uploaded (which can take up to a minute). Additionally, CPU sys % sits around 10% on Machine #1, which is similar to Machine #2 and indicates significant network activity (even though traffic is low).
Our usage of boto3 is basically (where data can be a 100MB MP4):
We've run out of ideas for diagnostics. Do you have any pointers for us or any ideas as to the failure mode we're seeing?
The text was updated successfully, but these errors were encountered: