-
-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
advanced sparse file support #14
Comments
Just to keep a vague idea: If the different layers in borg would communicate a bit more (like the compression method being flexibly given by upper layers and the decompression result being able to tell whether its result is data that needs to be written to disk or just a sparse hole that needs to be "seeked"), the adv. sparse file support could maybe be just a special kind of "compression/decompression". sparse compression would just store: 2 type bytes for "sparse", 8 bytes length of hole instead of a hole=X value, it could be also an entry in a more generally useful meta dict, see #765. |
Advanced sparse file support would also save some time for archive creation as it does not have to read all these zeros in the sparse wholes (simple sparse file support just reads them all). |
The less involved feature, not having to chunk-HMAC hole contents is handled in #1354 |
@milkey-mouse bs just told me you have started on this. maybe first evaluate how much changes this needs - if it is a bigger change, it might be not applicable to 1.1-maint (and also might conflict with 1.2 goals). |
considering that the new "fixed" chunker is much simpler than the "buzhash" chunker, it might be easy to add sparse file support there. as the fixed chunker is better suited for block devices and raw disk files, that would be exactly the place where sparse support is most needed. |
I'll try to solve this for the fixed-blocksize chunker... |
…ackup#14 implement sparsemap() to generate data/sparse ranges. ranges are multiples of fs block size and the tests assume 4kiB block size.
…ckup#14 a file map can be: - created internally inside chunkify by calling sparsemap, which uses SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a seekable sparse file. Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ... BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize! - made by some other means and given to the chunkify function. this is not used yet, but in future this could be used to only read the changed parts and seek over the (known) unchanged parts of a file. sparsemap: the generate range sizes are multiples of the fs block size. the tests assume 4kiB fs block size.
…ckup#14 a file map can be: - created internally inside chunkify by calling sparsemap, which uses SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a seekable sparse file. Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ... BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize! - made by some other means and given to the chunkify function. this is not used yet, but in future this could be used to only read the changed parts and seek over the (known) unchanged parts of a file. sparsemap: the generate range sizes are multiples of the fs block size. the tests assume 4kiB fs block size.
Looks like this bounty is too low / too big in scope. To have a more incremental approach, I created a new issue / new bounty with a more narrow scope: So, guess that one will be solved soon and later we can continue with this bounty (after way more backing is added to this bounty). |
…ckup#14 a file map can be: - created internally inside chunkify by calling sparsemap, which uses SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a seekable sparse file. Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ... BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize! - made by some other means and given to the chunkify function. this is not used yet, but in future this could be used to only read the changed parts and seek over the (known) unchanged parts of a file. sparsemap: the generate range sizes are multiples of the fs block size. the tests assume 4kiB fs block size.
…ckup#14 a file map can be: - created internally inside chunkify by calling sparsemap, which uses SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a seekable sparse file. Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ... BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize! - made by some other means and given to the chunkify function. this is not used yet, but in future this could be used to only read the changed parts and seek over the (known) unchanged parts of a file. sparsemap: the generate range sizes are multiples of the fs block size. the tests assume 4kiB fs block size.
there was some more progress in #5620, limited to chunker and hasher. compression and storage not affected yet. |
See there for the basics: jborg/attic#256
The current state in borg is that it has simple sparse file support (meaning that it does nothing special on "create", but offers the option to deal with all-zero chunks in 2 ways at "extract" time: a) write zeros to disk (default) b) just "seek" in the output file, creating a hole in a sparse file (--sparse).
See the original attic ticket: while this always works correctly from a logical file content perspective, it is not extracting the data and hole sections of files exactly as they were when the archive was made.
Precise reproduction would need saving the type (data vs hole), length and in case of data, the binary data for each section in the file. SEEK_DATA and SEEK_HOLE support this.
Attic (and current Borg) just has a simple stream of binary file contents and as there is no type/length yet, it can't be added in a compatible way on the file contents level.
Update: I put a bounty on this. It is for implementing sparse file support at archive creation time (seeking over holes [not reading them as zeros], storing which parts of a file are holes / contain data) and reproducing sparse files hole/data layout precisely at archive extraction time.
💰 there is a bounty for this
The text was updated successfully, but these errors were encountered: