-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd performance and Reader/Writer reuse #248
Comments
Yes. There is some special handling when you decode a As you note, the streaming en+decoders are not for payloads this small. Also, if you want to eliminate more allocations you can copy data to a hash function or something so you avoid the |
Thanks for the detailed report 👍 |
Thanks - now it looks like this
What still looks strange is that |
No. With EncodeAll you always know how much you are compressing. Using a stream you don't know if the user will write more, so you need to buffer input and cannot start compressing until the stream is closed. With longer stream you can start compressing when you have enough input for a block. |
There is no buffering in my benchmark - I always write all data with one |
@vmihailenco But there is no way to know that you will not be writing more until you close. I did add another 'short circuit' though. PR in a minute. |
If if writing header (ie first block) and it is the final block, use block compression instead of async. Addition for #248
Thanks for your patience and efforts 👍 My totally uneducated guess based on the code I've read is that you split the stream into blocks and each block is handled by separate goroutine + it looks like you have separate encoder goroutines that encode those blocks that are coming from the block goroutines. Synchronizing that is expensive, but it is very likely I got all that wrong. Sorry if that is the case. |
Exactly. In streaming mode we collect until we have enough for the block size we want, by default 64KB. When there is enough for a block we start compressing it in a separate goroutine. This is first compressing it into sequences which is then handed to another goroutine that encodes the block output. When the block is handed over the next block can start. This means that long streams are quite fast, but small blocks of course suffer from the overhead of having to hand over data. #251 simply skips all the goroutines and compresses it at once if we know we only have to compress a single block (final == true and no frame header written). |
Thanks for confirming - it all makes sense. I am not sure it is such a good idea in case of HTTP server (many concurrent clients and each want to compress a stream), but I definitely see how it can make a lot of sense if you need to compress one (several) large stream. I understand it is a matter of time you are willing to invest (and in my case also lack of knowledge) but some ideas how to improve this situation for HTTP web server use case:
|
This will be addressed when I add full concurrent compression. S2 has Writer.EncodeBuffer. |
* zstd: If first block and 'final', encode direct. If if writing header (ie first block) and it is the final block, use block compression instead of async. Addition for #248
Includes cespare/xxhash#54 Fixes #248
Includes cespare/xxhash#54 Fixes #248
I have a benchmark that produces following results:
That is roughly 2 times slower (and results can be worse on smaller payloads) and allocs are rather big (1646825 to encode & decode 225339 bytes?). I know that I can achieve better results by using
EncodeAll
andDecodeAll
, but I would like to use Encoder/Decoder as wrappers overbytes.Buffer
. So my question is - am I doing anything wrong here with Encoder/Decoder? I've disabled CRC for more fair comparison based on our previous discussion - I plan to use it in production.The text was updated successfully, but these errors were encountered: