Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

advanced sparse file support #14

Open
ThomasWaldmann opened this issue May 15, 2015 · 9 comments
Open

advanced sparse file support #14

ThomasWaldmann opened this issue May 15, 2015 · 9 comments

Comments

@ThomasWaldmann
Copy link
Member

See there for the basics: jborg/attic#256

The current state in borg is that it has simple sparse file support (meaning that it does nothing special on "create", but offers the option to deal with all-zero chunks in 2 ways at "extract" time: a) write zeros to disk (default) b) just "seek" in the output file, creating a hole in a sparse file (--sparse).

See the original attic ticket: while this always works correctly from a logical file content perspective, it is not extracting the data and hole sections of files exactly as they were when the archive was made.

Precise reproduction would need saving the type (data vs hole), length and in case of data, the binary data for each section in the file. SEEK_DATA and SEEK_HOLE support this.

Attic (and current Borg) just has a simple stream of binary file contents and as there is no type/length yet, it can't be added in a compatible way on the file contents level.

Update: I put a bounty on this. It is for implementing sparse file support at archive creation time (seeking over holes [not reading them as zeros], storing which parts of a file are holes / contain data) and reproducing sparse files hole/data layout precisely at archive extraction time.


💰 there is a bounty for this

@ThomasWaldmann
Copy link
Member Author

Just to keep a vague idea:

If the different layers in borg would communicate a bit more (like the compression method being flexibly given by upper layers and the decompression result being able to tell whether its result is data that needs to be written to disk or just a sparse hole that needs to be "seeked"), the adv. sparse file support could maybe be just a special kind of "compression/decompression".

sparse compression would just store: 2 type bytes for "sparse", 8 bytes length of hole
sparse decompression would result in: hole=True, hole length
other decompression would result in: hole=False, data

instead of a hole=X value, it could be also an entry in a more generally useful meta dict, see #765.

@ThomasWaldmann ThomasWaldmann removed this from the 1.0 milestone Aug 9, 2015
@ThomasWaldmann
Copy link
Member Author

Advanced sparse file support would also save some time for archive creation as it does not have to read all these zeros in the sparse wholes (simple sparse file support just reads them all).

@enkore
Copy link
Contributor

enkore commented Jul 21, 2016

The less involved feature, not having to chunk-HMAC hole contents is handled in #1354

@ThomasWaldmann
Copy link
Member Author

@milkey-mouse bs just told me you have started on this. maybe first evaluate how much changes this needs - if it is a bigger change, it might be not applicable to 1.1-maint (and also might conflict with 1.2 goals).

@milkey-mouse
Copy link
Contributor

milkey-mouse commented Nov 18, 2017

I wrote some C code to list holes in files here which could implement #1354. I can think of a backwards-compatible way of implementing it on the borg side by using the method described in #1354, but putting a list of hole locations in the chunk metadata (#934).

@ThomasWaldmann
Copy link
Member Author

considering that the new "fixed" chunker is much simpler than the "buzhash" chunker, it might be easy to add sparse file support there.

as the fixed chunker is better suited for block devices and raw disk files, that would be exactly the place where sparse support is most needed.

@ThomasWaldmann
Copy link
Member Author

I'll try to solve this for the fixed-blocksize chunker...

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Dec 11, 2020
…ackup#14

implement sparsemap() to generate data/sparse ranges.
ranges are multiples of fs block size and the tests assume 4kiB block size.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Dec 12, 2020
…ckup#14

a file map can be:

- created internally inside chunkify by calling sparsemap, which uses
  SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a
  seekable sparse file.
  Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ...
  BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize!

- made by some other means and given to the chunkify function.
  this is not used yet, but in future this could be used to only read
  the changed parts and seek over the (known) unchanged parts of a file.

sparsemap: the generate range sizes are multiples of the fs block size.
           the tests assume 4kiB fs block size.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Dec 12, 2020
…ckup#14

a file map can be:

- created internally inside chunkify by calling sparsemap, which uses
  SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a
  seekable sparse file.
  Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ...
  BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize!

- made by some other means and given to the chunkify function.
  this is not used yet, but in future this could be used to only read
  the changed parts and seek over the (known) unchanged parts of a file.

sparsemap: the generate range sizes are multiples of the fs block size.
           the tests assume 4kiB fs block size.
@ThomasWaldmann ThomasWaldmann removed their assignment Dec 13, 2020
@ThomasWaldmann
Copy link
Member Author

Looks like this bounty is too low / too big in scope.

To have a more incremental approach, I created a new issue / new bounty with a more narrow scope:

#5565

So, guess that one will be solved soon and later we can continue with this bounty (after way more backing is added to this bounty).

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Dec 17, 2020
…ckup#14

a file map can be:

- created internally inside chunkify by calling sparsemap, which uses
  SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a
  seekable sparse file.
  Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ...
  BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize!

- made by some other means and given to the chunkify function.
  this is not used yet, but in future this could be used to only read
  the changed parts and seek over the (known) unchanged parts of a file.

sparsemap: the generate range sizes are multiples of the fs block size.
           the tests assume 4kiB fs block size.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Dec 27, 2020
…ckup#14

a file map can be:

- created internally inside chunkify by calling sparsemap, which uses
  SEEK_DATA / SEEK_HOLE to determine data and hole ranges inside a
  seekable sparse file.
  Usage: borg create --sparse --chunker-params=fixed,BLOCKSIZE ...
  BLOCKSIZE is the chunker blocksize here, not the filesystem blocksize!

- made by some other means and given to the chunkify function.
  this is not used yet, but in future this could be used to only read
  the changed parts and seek over the (known) unchanged parts of a file.

sparsemap: the generate range sizes are multiples of the fs block size.
           the tests assume 4kiB fs block size.
@ThomasWaldmann ThomasWaldmann removed this from the hydrogen milestone Jan 3, 2021
@ThomasWaldmann
Copy link
Member Author

there was some more progress in #5620, limited to chunker and hasher. compression and storage not affected yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants