-
Notifications
You must be signed in to change notification settings - Fork 101
Backup to common cloud storage #89
Comments
I want to join in this task~ |
@francis0407 and I also have interest in this issue. |
Another task that can be added to this is to ensure proper streaming to object storage without copying data in memory. |
What do you mean "without copying data in memory"? |
I guess if streaming is working well then perhaps it isn't necessary to state this. But we should avoid copying wherever possible. The current implementation uses read_to_end which I presume copies the data every time it re-sizes its vector. |
I think it is that uploading a file from disk without buffering all file data into memory. |
Even though it is an SST file, I think it is starting from in memory? |
The writer interface on TiKV is currently fn write(&self, name: &str, reader: &mut dyn Read) -> io::Result<()>; so the cloud storage implementation decides how to read the SST file and upload. For local storage we use Rust's built-in std::io::copy which streams up to 8 KiB at a time. |
Just checked and the file content is in memory before being uploaded. |
@yiwu-arbug when the file is created in memory, can it be created in a streaming fashion? |
@gregwebs The SST writer is currently using an in-memory storage to speed up SST generation. This is the first copy. We then write the content into a The buffer is turned into a rate-limited reader, and passed into So in the current situation we will have to deal with 3N bytes at the time. The second and third copies can be eliminated by streaming, which reduces the memory usage to N bytes (assume buffer size ≪ N). Typically N = region size = 96 MB, and the default concurrency is 4, so we're talking about using memory of ~1200 MB vs ~400 MB here. @yiwu-arbug That said, for tikv/tikv#6209, the advantage of streaming over Suppose we set the rate limit to 10 MB/s. With streaming, we will upload 1 MB every 0.1s, and the attain the average speed of 10 MB/s uniformly. With |
Yes, please get rid of |
BR support common cloud storage
Overview
Integrate BR with common cloud object storage (S3, GCS, Azure Blob storage etc).
Problem statement
Currently, BR supports local storage where backup files are stored on local directory. But the backup files need to be collected together and copied to every TiKV node. This is difficult to use in practice, so it's better to mount an NFS like filesystem to every TiKV node and BR node. However, mounting NFS to every node is difficult to set up and error-prone.
Alternatively, object storage is better for this scenario, especially that it's quite common to backup to S3/GCS on public cloud.
TODO list
S3 Support (2400 points)
GCS Support (1800 points)
TiDB Operator integration (900 points)
Test (2100 points)
Mentors
Recommended skills
The text was updated successfully, but these errors were encountered: