Fix error from xtrabackup with s3: fails to upload large files#5212
Fix error from xtrabackup with s3: fails to upload large files#5212deepthi wants to merge 1 commit intovitessio:masterfrom
Conversation
…wn size, use a part size that allows us to upload the largest possible file (5 Tb) Signed-off-by: deepthi <deepthi@planetscale.com>
|
Do you know what are the implications of setting this to a large size? Is pretty innocuous or is there some change in behavior that could bite us? |
|
From the documentation: So at a part size of ~500 MB, we are exceeding the recommended size for a single operation, though the theoretical limit is 5 GB per part. |
|
there's another way to get around this, which is to use |
|
HubSpot handled this for normal backups #3844 |
|
Since we don't know what real-world effect this new value will have, can we make the xtrabackup upload part size a flag that continues to default to whatever it is now? That way we don't risk breaking anyone who isn't already broken. Another option could be to have the xtrabackup engine pass in an estimated total size. We'd need to make sure we don't rely on the passed-in size to be exact, and if so, document that's the case for future implementers of the interface. The xtrabackup engine could perhaps send the total uncompressed size of all files that the built-in backup engine would have uploaded (whatever is returned by |
|
Agree with @enisoc. FWIW, using xtrabackup_stripes seems to have fixed the problem for us. From my perspective, I think we can close this for now and point people to that flag if they run into this issue. |
From my side, I would still like to fix this if we can do it without risking harm to those who don't need it (either making it a flag, or trying to auto-detect the upper bound). We do use stripes as well, but we use a fixed number of stripes, so we'll still hit this at some point. As a third, even more complex option that no one asked for, we could add in the xtrabackup plugin a concept of max file/object size. If a given stripe file reaches that size, we rotate to a new file for that stripe. Each stripe would then be a sequence of files, rather than just one file. In this way, we could ensure that the size of any one file/object does not exceed what can be uploaded with efficient multipart upload settings. This is different from just increasing the stripe count because all stripes have to be read and written concurrently, and we don't want to force ridiculously high concurrency for very large shards. I'd probably recommend against this third option due to the complexity, but just writing it down in the name of brainstorming. |
Anyone who is using the builtin backup method won't be affected (with the tiny exception of the manifest file) because we pass in fileSize to
I like this idea. The problem with a command line option is that people have to get it right, and they don't know they have got it wrong until they have a failure. |
One implication is the memory needed, especially when combined with xtrabackup_stripes. More reason to implement reasonable part sizes. |
|
Closing this in favor of #5351 |
Reported by Slack.
We were using DefaultUploadPartSize when the file size is unknown, which limits us to a file size of ~ 50 MB. Instead use a part size that will allow us to upload the largest possible file (5 TB).
Signed-off-by: deepthi deepthi@planetscale.com