Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a way to (efficiently) detect if a resource was compressed using Zopfli #1

Closed
alrra opened this issue Feb 20, 2017 · 0 comments
Closed

Comments

@alrra
Copy link
Contributor

alrra commented Feb 20, 2017

Since the Zopfli output (for the gzip option) is valid gzip content, there doesn't seem to be a straightforward and foolproof way to identify files compressed with Zopfli.

From an email discussion with @lvandeve:

There is no way to tell for sure. Adding information to the output to indicate zopfli, would actually add bits to the output so such thing is not done :) Any compressor can set the FLG, MTIME, and so on to anything it wants, and users of zopfli can also change the MTIME bytes that zopfli had output to an actual time.

One heuristic to tell that it was compressed with zopfli or another dense deflate compressor is to compress it with regular gzip -9 (which is fast), and compare that the size of the file to test is for example more than 3% smaller.


Other notes:

gzip

A gzip member header has the following structure

  +---+---+---+---+---+---+---+---+---+---+
  |ID1|ID2|CM |FLG|     MTIME     |XFL|OS | (more-->)
  +---+---+---+---+---+---+---+---+---+---+

where:

  • ID1 = 1f and ID2 = 8b - these are the magic numbers that uniquely identify the content as being gzip.

  • CM = 8 - this is a value customarily used by gzip

  • FLG and MTIME are usually non-zero values.

  • XFL will be either 0, 2, or 4:

    • 0 - default, compressor used intermediate levels of compression (when any of the -2 ... -8 options are used).
    • 2 - the compressor used maximum compression, slowest algorithm (when the -9 or --best option is used).
    • 4 - the compressor used fastest algorithm (when the -1 or --fast option is used).

Zopfli

On thing that Zopfli does is that it sets FLG and MTIME to zero, XFL to 2, and OS to 3, so basically files compressed with Zopfli will most likely start with 1f8b 0800 0000 0000 0203, unless things are changed by the user (which in general doesn't seem very likely to happen).

Now, regular gzip output might also start with that, even thought the chance of doing so is smaller:

  • Most web servers (e.g.: Apache, NGINX), by default, will not opt users into the best compression level, therefore, the output shouldn't have XFL set to 2.
  • Most utilities that output regular gzip will have non-zero values for MTIME and FLG.

So, if a file doesn't start with 1f8b 0800 0000 0000 0203, it's a good (not perfect) indication that Zopfli wasn't used, but it's a fast check compared to compressing files and comparing file sizes. However, if a file does start with that, it can be either Zopfli or gzip, and we cannot really make assumptions here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant