Fix error from xtrabackup with s3: fails to upload large files by deepthi · Pull Request #5212 · vitessio/vitess

deepthi · 2019-09-23T18:26:06Z

Reported by Slack.

    upload id: ngnUHv2iPGYT5vuB11lJ1LLTzazXjRlYt1mVqfJnOG6aY_1aB_hCjNiKUx9H3HIgAHfaC3GWcoWCa6J1twO_uzYJG4w4dN45xUHk8IE.mRaBu_PS03WVWDsh3Mm5aTKq
caused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000). Adjust PartSize to fit in this limit: cannot copy output from xtrabackup command: MultipartUpload: upload multipart failed
    upload id: ngnUHv2iPGYT5vuB11lJ1LLTzazXjRlYt1mVqfJnOG6aY_1aB_hCjNiKUx9H3HIgAHfaC3GWcoWCa6J1twO_uzYJG4w4dN45xUHk8IE.mRaBu_PS03WVWDsh3Mm5aTKq
caused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000). Adjust PartSize to fit in this limit)>

We were using DefaultUploadPartSize when the file size is unknown, which limits us to a file size of ~ 50 MB. Instead use a part size that will allow us to upload the largest possible file (5 TB).

Signed-off-by: deepthi deepthi@planetscale.com

…wn size, use a part size that allows us to upload the largest possible file (5 Tb) Signed-off-by: deepthi <deepthi@planetscale.com>

rafael · 2019-09-23T18:34:18Z

Do you know what are the implications of setting this to a large size? Is pretty innocuous or is there some change in behavior that could bite us?

deepthi · 2019-09-23T18:40:47Z

From the documentation:
The largest object that can be uploaded in a single PUT is 5 gigabytes. For objects larger than 100 megabytes, customers should consider using the Multipart Upload capability.

So at a part size of ~500 MB, we are exceeding the recommended size for a single operation, though the theoretical limit is 5 GB per part.
I don't know the answer to the question ^^ and information on the web on this is non-existent.

deepthi · 2019-09-23T19:20:17Z

there's another way to get around this, which is to use -xtrabackup_stripes to break up the backup into several files.

derekperkins · 2019-09-23T21:03:36Z

HubSpot handled this for normal backups #3844

enisoc · 2019-09-24T03:01:18Z

Since we don't know what real-world effect this new value will have, can we make the xtrabackup upload part size a flag that continues to default to whatever it is now? That way we don't risk breaking anyone who isn't already broken.

Another option could be to have the xtrabackup engine pass in an estimated total size. We'd need to make sure we don't rely on the passed-in size to be exact, and if so, document that's the case for future implementers of the interface. The xtrabackup engine could perhaps send the total uncompressed size of all files that the built-in backup engine would have uploaded (whatever is returned by findFilesToBackup()), divided by the number of stripes, as an upper bound. That should at least be close enough to get reasonable part sizes.

rafael · 2019-09-24T03:41:33Z

Agree with @enisoc. FWIW, using xtrabackup_stripes seems to have fixed the problem for us. From my perspective, I think we can close this for now and point people to that flag if they run into this issue.

enisoc · 2019-09-24T04:01:07Z

From my perspective, I think we can close this for now and point people to that flag if they run into this issue.

From my side, I would still like to fix this if we can do it without risking harm to those who don't need it (either making it a flag, or trying to auto-detect the upper bound). We do use stripes as well, but we use a fixed number of stripes, so we'll still hit this at some point.

As a third, even more complex option that no one asked for, we could add in the xtrabackup plugin a concept of max file/object size. If a given stripe file reaches that size, we rotate to a new file for that stripe. Each stripe would then be a sequence of files, rather than just one file. In this way, we could ensure that the size of any one file/object does not exceed what can be uploaded with efficient multipart upload settings. This is different from just increasing the stripe count because all stripes have to be read and written concurrently, and we don't want to force ridiculously high concurrency for very large shards.

I'd probably recommend against this third option due to the complexity, but just writing it down in the name of brainstorming.

deepthi · 2019-09-24T22:52:22Z

Since we don't know what real-world effect this new value will have, can we make the xtrabackup upload part size a flag that continues to default to whatever it is now? That way we don't risk breaking anyone who isn't already broken.

Anyone who is using the builtin backup method won't be affected (with the tiny exception of the manifest file) because we pass in fileSize to AddFile and compute the part size using that.
The only time there is an impact is if we pass in 0 as fileSize. That is done by xtrabackup (because the file size is unknown) and also when we create the manifest files.

Another option could be to have the xtrabackup engine pass in an estimated total size. We'd need to make sure we don't rely on the passed-in size to be exact, and if so, document that's the case for future implementers of the interface. The xtrabackup engine could perhaps send the total uncompressed size of all files that the built-in backup engine would have uploaded (whatever is returned by findFilesToBackup()), divided by the number of stripes, as an upper bound. That should at least be close enough to get reasonable part sizes.

I like this idea. The problem with a command line option is that people have to get it right, and they don't know they have got it wrong until they have a failure.

deepthi · 2019-10-18T17:57:58Z

Do you know what are the implications of setting this to a large size? Is pretty innocuous or is there some change in behavior that could bite us?

One implication is the memory needed, especially when combined with xtrabackup_stripes. More reason to implement reasonable part sizes.

deepthi · 2019-10-24T20:40:59Z

Closing this in favor of #5351

s3 backups: instead of using DefaultUploadPartSize for files of unkno…

8ff73a4

…wn size, use a part size that allows us to upload the largest possible file (5 Tb) Signed-off-by: deepthi <deepthi@planetscale.com>

deepthi requested a review from sougou as a code owner September 23, 2019 18:26

deepthi requested review from enisoc and rafael September 23, 2019 18:26

deepthi mentioned this pull request Oct 24, 2019

compute total size of files to be backed up and use that for xtrabackup #5351

Merged

deepthi closed this Oct 24, 2019

deepthi deleted the ds-s3-partsize branch October 24, 2019 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error from xtrabackup with s3: fails to upload large files#5212

Fix error from xtrabackup with s3: fails to upload large files#5212
deepthi wants to merge 1 commit intovitessio:masterfrom
planetscale:ds-s3-partsize

deepthi commented Sep 23, 2019 •

edited

Loading

Uh oh!

rafael commented Sep 23, 2019

Uh oh!

deepthi commented Sep 23, 2019 •

edited

Loading

Uh oh!

deepthi commented Sep 23, 2019

Uh oh!

derekperkins commented Sep 23, 2019

Uh oh!

enisoc commented Sep 24, 2019

Uh oh!

rafael commented Sep 24, 2019

Uh oh!

enisoc commented Sep 24, 2019

Uh oh!

deepthi commented Sep 24, 2019

Uh oh!

deepthi commented Oct 18, 2019

Uh oh!

deepthi commented Oct 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

deepthi commented Sep 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rafael commented Sep 23, 2019

Uh oh!

deepthi commented Sep 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deepthi commented Sep 23, 2019

Uh oh!

derekperkins commented Sep 23, 2019

Uh oh!

enisoc commented Sep 24, 2019

Uh oh!

rafael commented Sep 24, 2019

Uh oh!

enisoc commented Sep 24, 2019

Uh oh!

deepthi commented Sep 24, 2019

Uh oh!

deepthi commented Oct 18, 2019

Uh oh!

deepthi commented Oct 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deepthi commented Sep 23, 2019 •

edited

Loading

deepthi commented Sep 23, 2019 •

edited

Loading