Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brotli framing format requirements #462

Closed
eustas opened this issue Oct 27, 2016 · 7 comments
Closed

Brotli framing format requirements #462

eustas opened this issue Oct 27, 2016 · 7 comments

Comments

@eustas
Copy link
Collaborator

eustas commented Oct 27, 2016

Please write your thoughts on what features Brotli framing format should have.
PS: framing format should not be a "swiss-knife", i.e. match every imaginable use case.

@eustas
Copy link
Collaborator Author

eustas commented Oct 27, 2016

Easy pick:

  • should not be a valid UTF-8 string
  • should not be a valid brotli stream
  • should have unique "magic bytes" signature

First 2 requirements give us that the first "magic byte" is 0x91.

@dsnet
Copy link
Contributor

dsnet commented Oct 27, 2016

I came up with a possible design here: brotli-framing-format.md

High level ideas:

  • Designed to be more efficient with random-access decompression by coalescing records together into a unified index (like xz), rather than having the offsets be distributed throughout the file (like lzip).
  • I don't feel strongly that the format should provide ways to encode the original filename, owner, group, modified date, etc. My opinion is that an archive format like tar or zip should be used and that the framed format is only responsible for compression.
  • No option to choose between multiple checksum methods (just use CRC-32). If a user cares about data integrity, they can encrypt and sign the file afterwards.
  • Limited support for parallel decompression when reading the input forwards as a stream. The "sync markers" to do this also provide limited forms of data recovery with partial files.
  • The StaticDict feature is experimental. I'm not sure if this should be part of the format, but it seems like an interesting idea to me.

You may want to consider giving the framed format a completely different name than simply "brotli framing format". There may be much confusion between what the difference is between two.

@dsnet
Copy link
Contributor

dsnet commented Oct 27, 2016

  • should not be a valid UTF-8 string
  • should not be a valid brotli stream

With 0x91 as the first byte, if the second byte doesn't have the highest bit set, we can guarantee that it isn't UTF-8 either.

@buu700
Copy link

buu700 commented May 21, 2017

@eustas asked me to move my feature request from #549 to this thread:

Would it be practical for a future version of Brotli to detect strings of formats like base64 and hex and decode them to binary before compressing / re-encode them as part of the decompression process?

I noticed just now during a comparison between compressing a binary file and the equivalent base64 (emscripten-core/emscripten#5104 (comment)) that the base64 version ended up larger, which it occurs to me shouldn't need to be the case given that the two contain almost exactly the same information.

Since I don't know anything about Brotli's internals, I have nothing more specific to suggest in terms of the framing format design.

@stokito
Copy link

stokito commented Nov 10, 2018

As I understood the format is already developed and used internally, right?
Can you confirm that this is a current spec of the format https://github.com/madler/brotli/blob/master/br-format-v3.txt

@jason-sachs
Copy link

Since this issue was closed, where is the framing format documented?

@jason-sachs
Copy link

jason-sachs commented Nov 4, 2020

I have used brotli.exe in the Python distribution brotli 1.0.9 and it doesn't appear to include framing or to have any option to include framing. (for example, a zero-byte input compresses to hex A1 01)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants