-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] add support for zstd compression #2277
Conversation
Fixes Makefile.config containing: `HAVE_BROTLI = 1]` Which was apparently harmless (unused, I think) but is good to fix anyway.
allocate out-of-object to keep things neat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done C++ in a while, so feel free to ignore these comments if you disagree with them.
@@ -13,12 +13,37 @@ | |||
#include <brotli/encode.h> | |||
#endif // HAVE_BROTLI | |||
|
|||
#if HAVE_ZSTD | |||
#include <zstd.h> | |||
// Not set by earlier versions, so be sure it's set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to do this check with autoconf, or by making a wrapper-header around zstd.h
?
I think the answer depends on how many files you expect to do zstd stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, I'm not loving the macro usage either so suggestions welcome. Not sure what you're suggesting in first clause-- the HAVE_ZSTD
macro is the result of an autoconf check added in this PR.
The second suggestion, generating a wrapper, has potential re:making things less messy.
We could just make zstd non-optional (same for brotli) but isn't done for various reasons-- I think the most important being that not all distributions Nix wants to support provide these packages...
Suggestions welcome, I'll think about this a bit. Popular alternative is to have build system selectively include files -- not sure if that's preferred or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first alternative, I mean do a check in configure.ac
:
AC_MSG_CHECKING([for ZSTD_CLEVEL_DEFAULT in zstd.h])
AC_PREPROC_IFELSE([AC_LANG_SOURCE([[#include <zstd.h>
#ifndef ZSTD_CLEVEL_DEFAULT
#error
#endif]])],[AC_MSG_RESULT([yes])], [AC_MSG_RESULT([no]), AC_DEFINE([ZSTD_CLEVEL_DEFAULT], [3])])
A wrapper header is probably better, that includes <zstd.h>
and defines things that might not be present.
@@ -214,14 +281,21 @@ void decompress(const std::string & method, Source & source, Sink & sink) | |||
return decompressBzip2(source, sink); | |||
else if (method == "br") | |||
return decompressBrotli(source, sink); | |||
#if HAVE_ZSTD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need CPP guards, but brotli doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because nix's brotli support is always "possible"-- if not built with libbrotli Nix will look for the brotli executable on PATH and try to use that with fingers crossed. Among other disadvantages, the way this is implemented operates on the entire input at once IIRC, reading into a giant std::string. Doing this for zstd didn't seem like a good idea, thinking it should be supported properly or not at all.
Brotli actually has some preprocessor bits but they're inside the decompressBrotli method instead of around it.
@@ -302,11 +376,11 @@ struct XzSink : CompressionSink | |||
#ifdef HAVE_LZMA_MT | |||
struct ParallelXzSink : public XzSink | |||
{ | |||
ParallelXzSink(Sink &nextSink) : XzSink(nextSink, [this]() { | |||
ParallelXzSink(Sink &nextSink, int level) : XzSink(nextSink, [this](unsigned level) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you capture level
in the lambda, and avoid having to declare/pass the parameter in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First--yikes indeed re:this mess. :(. Apologies.
That out of the way, I actually originally captured level in the lambda but then the checkLevel(..)
was either duplicated or a new function....
Hmm I think I just left this to focus on getting things working first--- but never came back to try to do something reasonable here O:). Good call, thank you.
Match what we do in other compression methods. This is particularly important when using higher compression levels, in order to ensure interrupts are noticed in a somewhat responsive manner. Breaking into chunks as hinted by zstd should help this further, but this makes things much better already. (not following those hints shouldn't be much worse than what we have with other methods)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that a lot of parameters are being passed around everywhere. It would be better to refactor to have a CompressionConfiguration
class.
Otherwise I agree with most points raised by the other reviewer.
Good change overall.
This would be nice to revive as xz issue is getting annoying :) |
Too much has changed in master for a trivial rebase. |
@dtzWill are you going to give this one another go? |
#3333 may be a better way to get zstd support. |
Notes:
Default zstd level is '3' (range: [1,19])
This PR builds on top of add support for specifying 'compression-level' for NAR's #2276, so duplicates that commit
Copying paths to local stores and checking with 'verify' for various closures
... but I'd feel better with some more testing
Additional review of API usage and such would be appreciated,
The sizes of the buffers used are much larger than used for other compression methods-- this might bias throughput tests and such. This is for three reasons:
zstd documentation is pretty clear that stream/context objects should be reused instead of always allocated/initialized-- which isn't what our compression bits are geared for currently, but if we're seeing things like very small files being slow to compress w/zstd this might be the cause.
Fixes #2255.