Fix s3 backups of tables > 50gb#3844
Conversation
Signed-off-by: Harrison McGonigal <hmcgonigal@hubspot.com>
| // Calculate s3 upload part size using the source filesize | ||
| partSizeMB := s3manager.DefaultUploadPartSize | ||
| if filesize > 0 { | ||
| minimumPartSize := float64(filesize) / float64(s3manager.MaxUploadParts) |
There was a problem hiding this comment.
I'm not sure about dynamically changing the part size without specifying a max limit. The problem with tuning the part size is now you have to be able to fit that in memory -- multiplied by the concurrency. So I think there should be a flag for the max part size and a flag for max upload parts. Using those 2 along with concurrency, someone can tune the memory footprint of their backups
There was a problem hiding this comment.
Given a 120gb table, that would be 12mb parts. For that to have any kind of significant impact you would have to set concurrency pretty high.
I didn't think this change would affect anyone, given that tables under 50gb would just be using the same 5mb default part size as before and tables over 50gb currently don't even upload.
There was a problem hiding this comment.
I'd definitely like to hear what others have to say about their existing concurrency settings etc.
There was a problem hiding this comment.
Just to add, I was also cautious of making a blanket part size flag because then that would create large upload buffers for files that are very small. I really wanted to use a larger part size only for files that needed it and leave the rest to the default size of 5mb.
There was a problem hiding this comment.
looks like MaxUploadParts is not changeable, so this is probably the only course of action. 🤞
|
@bbeaudreault is this ready to go? |
|
This should be ready. We've been using this internally for about a week to allow backups on one of our larger dbs. |
|
travis is failing, but this lgtm otherwise |
|
@sougou is there anything left on this one? Travis failures seem to be unrelated. |
|
I've been backlogged. Hopefully will have time this weekend to catch up. I've restarted the failed test for now. |
|
Thanks! |
By default the s3uploader has a max of 10000 parts with a default part size of 5mb, meaning the largest file it can upload successfully is 50gb.
Any table with data greater than 50gb will fail a backup with:
This PR dynamically adjusts the uploader partsize based on the filesize of the source file by doing filesize/max parts and then rounding that up to the nearest mb. I tested this successfully on a db with a 120gb table.