Improve S3 Performance with More Variability in Cookbook Object Naming #130

sean-horn · 2015-03-20T16:28:52Z

HelpSpot 18569

Example of the current URL format of the cookbook file objects we store on S3

https://s3-external-1.amazonaws.com:443/opscode-platform-production-data/organization-3eca786473a44f68a91c1279ce6c845b/checksum-5ea9157974ce97b987a4e6219fdfc69a?AWSAccessKeyId=AKIAIQKPG2CTSTRVDO4Q&Expires=1405545736&Signature=na08SG8QrzaA9LBSR%2BIUN3/eENI%3D"

The portion of the key we are interested in is "opscode-platform-production-data/organization-3eca786473a44f68a91c1279ce6c845b"

We would get much better performance out of the gate if the bucket/org key looked like this instead, with a reversed key suffix for the organization

"opscode-platform-production-data/b548c6ec9721c19a86f44a374687ace3-noitazinagro".

Our customer recommends '...try the "not reversed" version if the first get() fails (during the transition to the new storage system, while background S3 workers' move things around.

More customer discussion on the issue follows

I came across this message on AWS support forums: https://forums.aws.amazon.com/thread.jspa?threadID=96847 and find it surprising that you didn't prefix the cookbook segment keys with more semi random data to spread them across multiple S3 storage nodes, this is part of S3 101 when dealing with massive numbers of objects.

I don't really understand your point about S3 storage nodes. It is
transparent from an API point of view. S3 use the first few bytes of the
keys to spread them across storage nodes, it's true that hitting different
storage nodes might result in inconsistencies when the data spreads, but I
suggest you read this document...
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
(i have successfully optimized a high profile image sharing service just by
adding ".reverse!" in the bit of code where they compute the S3 key for
the objects they where storing... their scheme was very similar to yours)

"Amazon S3 maintains an index of object key names in each AWS region.
Object keys are stored lexicographically across multiple partitions in the
index. That is, Amazon S3 stores key names in alphabetical order. The key
name dictates which partition the key is stored in. Using a sequential
prefix, such as timestamp or an alphabetical sequence, increases the
likelihood that Amazon S3 will target a specific partition for a large
number of your keys, overwhelming the I/O capacity of the partition. If you
introduce some randomness in your key name prefixes, the key names, and
therefore the I/O load, will be distributed across more than one partition."

My solution to that was simply to store objects for
"organization-6213e346bef545b988d155a568d93d3e"
using "e3d39d865a551d889b545feb643e3126-noitazinagro", and try the "not
reverted" version if the first get() failed (during the transition to the
new storage system, while background S3 workers where converting every keys

i suppose you have massive numbers to handle there...)

Send me a beer when you get 10x S3 performance bump (no kidding)

The text was updated successfully, but these errors were encountered:

PrajaktaPurohit · 2020-11-20T23:13:07Z

It would be useful to check if s3 still behaves in the manner defined above.

sean-horn added the bug label Mar 20, 2015

stevendanna added this to the accepted-major milestone Jun 17, 2015

tas50 added Type: Bug Does not work as expected. and removed bug labels Jan 4, 2019

PrajaktaPurohit added Status: To be prioritized Indicates that product needs to prioritize this issue. Triage: Confirmed Indicates and issue has been confirmed as described. labels Jul 17, 2020

stevendanna removed this from the accepted-major milestone Sep 29, 2020

PrajaktaPurohit added the Triage: Try Reproducing Indicates that this issue needs to be reproduced. label Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve S3 Performance with More Variability in Cookbook Object Naming #130

Improve S3 Performance with More Variability in Cookbook Object Naming #130

sean-horn commented Mar 20, 2015

PrajaktaPurohit commented Nov 20, 2020

Improve S3 Performance with More Variability in Cookbook Object Naming #130

Improve S3 Performance with More Variability in Cookbook Object Naming #130

Comments

sean-horn commented Mar 20, 2015

PrajaktaPurohit commented Nov 20, 2020