Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer size for multipart s3 downloads #691

Closed
mheilman opened this issue Jun 22, 2016 · 11 comments
Closed

buffer size for multipart s3 downloads #691

mheilman opened this issue Jun 22, 2016 · 11 comments
Labels
response-requested Waiting on additional information or feedback.

Comments

@mheilman
Copy link

I noticed recently that for a large download, the awscli (aws s3 cp s3://...) was faster than using boto3.s3.transfer.MultipartDownloader.

After running a few tests of downloading an 8GB file, it looks like maybe the size of the I/O buffer here may have something to do with it. I don't understand why, but making that buffer size larger (e.g., 256KB or 1024KB instead of the current 16KB) seems to improve download speeds consistently for me.

Perhaps that buffer size should be increased, or maybe just made configurable? I don't understand the pros and cons other than that making it larger seems to help for my use case.

Times for downloading an 8GB file from S3 to a g2.2xlarge instance (I just changed the number in the line of code mentioned above):

  • 100 seconds with 1024KB buffer
  • 106 seconds with 256KB buffer
  • 118 seconds with 16KB buffer (current boto3 code)
  • 256 seconds with 4KB buffer

Code for testing:

import time
import boto3
import logging
from concurrent.futures import ProcessPoolExecutor

t0 = time.time()

logging.basicConfig(level='DEBUG')
logging.getLogger('botocore').setLevel('INFO')
client = boto3.client('s3')

config = boto3.s3.transfer.TransferConfig(
    multipart_threshold=64 * 1024 * 1024,
    max_concurrency=10,
    num_download_attempts=10,
    multipart_chunksize=16 * 1024 * 1024,
    max_io_queue=10000
)

config = boto3.s3.transfer.TransferConfig()
transfer = boto3.s3.transfer.MultipartDownloader(client, config, boto3.s3.transfer.OSUtils())
transfer.download_file('bucket-name', 'path/to/big/file/foo.npy', 'foo2.npy', 8000000000, {})
print("TIME: {} SECONDS".format(time.time() - t0))

I previously mentioned this here.

@mheilman
Copy link
Author

mheilman commented Jun 22, 2016

In the benchmarks above, I downloaded the file to an EBS volume, which is perhaps less than ideal since that depends on the network connection to, if I understand correctly. However, I've seen similar differences in performance between boto3 and awscli on local storage on a d2.8xlarge instance. IIRC, the difference was even more pronounced in that case, perhaps because of the 10 gbps networking of the d2.8xlarge.

@kyleknap
Copy link
Contributor

This is definitely something you may see if the configurations are not appropriate for the manager. I would really recommend read this thread and comment in a similar implementation as to why this is the case: boto/s3transfer#13 (comment). Based on that discussion, we may need to update the defaults in boto3.

If you are able to follow along, I would recommend setting a multipart_chunksize such that the following is True if you do not want to mess with the source code:

multipart_chunksize * max_concurrency < 16 kB (the default io chunksize) * max_io

When you bumped the io chunksize to 1024KB does that make the performance more comparable to the CLI? That is the io chunksize it uses.

@kyleknap kyleknap added the response-requested Waiting on additional information or feedback. label Jun 22, 2016
@mheilman
Copy link
Author

mheilman commented Jun 23, 2016

Ah, I forgot to mention how long awscli takes above. awscli takes about 90 seconds on the same machine for the same file, so it's still a bit faster than with boto3 when I changed the io chunksize to 1024KB, but not too much.

I'll try fiddling around with the multipart_chunksize and/or max_io next...

@mheilman
Copy link
Author

mheilman commented Jun 23, 2016

Another data point: it took 113 seconds to download the 8GB file with the following settings, where I just bumped up the IO queue size to be way larger than necessary to satisfy the inequality above.

  • max_concurrency=10
  • max_io_queue=1000000000
  • multipart_chunksize=16MB
  • io chunksize=16KB (boto3 default)

@mheilman
Copy link
Author

I also just tried the following settings, which are the same as the ones I used for the tests at the top of this thread except with a smaller multipart_chunksize, and it took 118 seconds.

  • max_concurrency=10
  • max_io_queue=10000
  • multipart_chunksize=1MB
  • io chunksize=16KB (boto3 default)

@gisjedi
Copy link

gisjedi commented Jun 23, 2016

I'm encountering an identical problem. I was experiencing nearly 3 times the performance using the AWS CLI as opposed to boto. AWS CLI (aws-cli/1.10.33 botocore/1.4.23) is using the out of the box defaults. I'm using boto3 1.3.1 and using all default settings for my TransferConfig. I played with max_io_queue settings as @mheilman did with little effect - for a 5GiB file I'm downloading it in roughly 44 seconds.

Tested as follows:

  • aws-cli - default settings: 15s
  • boto3 - default TransferConfig: 44s
  • boto3 - default TransferConfig and boto3 source file s3/transfer.py buffer_size variable set to 1024 * 256: 16s

I tried all the settings suggested above focusing on max_io_queue. Even setting this up to the 10s of millions made no appreciable difference... maybe a second or two. Changing the buffer_size in the boto3 source seemed to be the only configuration that actually made the results consistent with the AWS CLI. I tried buffer sizes from 16KiB all the way up to 64MiB, but settled on 256KiB as performance deteriorated on both sides of that value.

All my testing was done on a m4.10xl instance running Amazon AMI.

@mheilman
Copy link
Author

I was experiencing nearly 3 times the performance using the AWS CLI as opposed to boto.

@gisjedi, thanks for adding your observations. It's good to know I'm not the only one seeing this. While the differences I've posted above were smaller, I've also seen a similar 3x speed difference between boto3 and awscli on an d2.8xlarge instance with 10 gbps networking (the g2.2xlarge instance I used for the tests above has maybe 1 gbps).

@mheilman
Copy link
Author

fwiw, I tried commenting out the lines that queue up IO chunks here (and put a pass there), and downloading the 8GB file took about the same amount of time as awscli (86 seconds).

@kyleknap
Copy link
Contributor

Hmm it sounds like the theory that the slowness has to do with the io queue is correct.

@jamesls
Copy link
Member

jamesls commented Jul 28, 2016

Also relevant: #737

@kyleknap
Copy link
Contributor

kyleknap commented Aug 3, 2016

With the release of 1.4.0 of boto3, you now have the option to both io_chunksize and max_io_queue so for the environment where the network speed is much faster than the io speed you can configure it in a way to make io stop being the bottleneck: https://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.TransferConfig

It is important to note that with the current configuration, the defaults should be suitable. Now the io_chunksize is 256KB, which seems to be a good default value as I have found in my testing and testing from others, @gisjedi. For me with the current default configurations, boto3 achieves the same speed for downloads as the CLI for large downloads on larger instances.

Closing out issue as the defaults should now be resulting in better performance and the necessary configuration parameters related to io are now exposed to tweak to make the download faster if the results from using the defaults are still not as desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
response-requested Waiting on additional information or feedback.
Projects
None yet
Development

No branches or pull requests

4 participants