zstd performance and Reader/Writer reuse #248

vmihailenco · 2020-03-26T14:45:14Z

I have a benchmark that produces following results:

BenchmarkCZstd-12      	     664	   1793805 ns/op	 125.62 MB/s	  522419 B/op	       9 allocs/op
BenchmarkGoZstd-12    	     374	   3247447 ns/op	  69.39 MB/s	 1646825 B/op	      12 allocs/op

That is roughly 2 times slower (and results can be worse on smaller payloads) and allocs are rather big (1646825 to encode & decode 225339 bytes?). I know that I can achieve better results by using EncodeAll and DecodeAll, but I would like to use Encoder/Decoder as wrappers over bytes.Buffer. So my question is - am I doing anything wrong here with Encoder/Decoder? I've disabled CRC for more fair comparison based on our previous discussion - I plan to use it in production.

The text was updated successfully, but these errors were encountered:

klauspost · 2020-03-26T15:18:29Z

Yes. There is some special handling when you decode a *bytes.Buffer, and it does indeed allocate more than it should. I will send a PR shortly.

As you note, the streaming en+decoders are not for payloads this small.

Also, if you want to eliminate more allocations you can copy data to a hash function or something so you avoid the ioutil.ReadAll.

Fixes #248

klauspost · 2020-03-26T15:19:08Z

Thanks for the detailed report 👍

vmihailenco · 2020-03-26T15:43:41Z

Thanks - now it looks like this

BenchmarkCZstd-12      	     628	   1857834 ns/op	 121.29 MB/s	  522429 B/op	       9 allocs/op
BenchmarkGoZstd-12     	     423	   3165288 ns/op	  71.19 MB/s	  873224 B/op	      11 allocs/op
BenchmarkGoZstd2-12    	     523	   2133557 ns/op	 105.62 MB/s	  583841 B/op	       2 allocs/op

What still looks strange is that EncodeAll/DecodeAll version is faster and allocates less than using Write / Read. I guess there is a good reason for that, but from user POV I don't understand why there is such a big difference - after all Encoder.Writer can easily be a wrapper over EncodeAll.

klauspost · 2020-03-27T08:27:53Z

after all Encoder.Writer can easily be a wrapper over EncodeAll.

No. With EncodeAll you always know how much you are compressing. Using a stream you don't know if the user will write more, so you need to buffer input and cannot start compressing until the stream is closed.

With longer stream you can start compressing when you have enough input for a block.

vmihailenco · 2020-03-27T08:41:05Z

There is no buffering in my benchmark - I always write all data with one Write call. So it is one Write + one Close call. Even you have to buffer that data in 1mb blocks and delay compression - it should not be 30% slower. There is something else that makes Read/Write version significantly slower - something that takes extra 1e6 nanoseconds.

klauspost · 2020-03-27T08:54:41Z

@vmihailenco But there is no way to know that you will not be writing more until you close.

I did add another 'short circuit' though. PR in a minute.

Fixes #248

If if writing header (ie first block) and it is the final block, use block compression instead of async. Addition for #248

vmihailenco · 2020-03-27T09:18:18Z

Thanks for your patience and efforts 👍

My totally uneducated guess based on the code I've read is that you split the stream into blocks and each block is handled by separate goroutine + it looks like you have separate encoder goroutines that encode those blocks that are coming from the block goroutines. Synchronizing that is expensive, but it is very likely I got all that wrong. Sorry if that is the case.

klauspost · 2020-03-27T09:30:20Z

My totally uneducated guess based on the code I've read is that you split the stream into blocks and each block is handled by separate goroutine + it looks like you have separate encoder goroutines that encode those blocks that are coming from the block goroutines. Synchronizing that is expensive, but it is very likely I got all that wrong. Sorry if that is the case.

Exactly. In streaming mode we collect until we have enough for the block size we want, by default 64KB.

When there is enough for a block we start compressing it in a separate goroutine. This is first compressing it into sequences which is then handed to another goroutine that encodes the block output. When the block is handed over the next block can start.

This means that long streams are quite fast, but small blocks of course suffer from the overhead of having to hand over data. #251 simply skips all the goroutines and compresses it at once if we know we only have to compress a single block (final == true and no frame header written).

vmihailenco · 2020-03-27T09:49:59Z

Thanks for confirming - it all makes sense. I am not sure it is such a good idea in case of HTTP server (many concurrent clients and each want to compress a stream), but I definitely see how it can make a lot of sense if you need to compress one (several) large stream.

I understand it is a matter of time you are willing to invest (and in my case also lack of knowledge) but some ideas how to improve this situation for HTTP web server use case:

completely disable "multi-threading" in case of zstd.WithEncoderConcurrency(1)
rework API
- ideally make it more like snappy/s2 API where NewReader/NewWriter are just wrappers over Encode/Decode with some buffering. Multi-threading should be exposed separately as Encoder and Decoder.
- keep it as is but expose more internal guts so NewReader/NewWriter can be implemented without multi-threading in user space. E.g. EncodeBlock(block []byte, final bool).

klauspost · 2020-03-27T10:38:01Z

This will be addressed when I add full concurrent compression.

S2 has Writer.EncodeBuffer.

* zstd: If first block and 'final', encode direct. If if writing header (ie first block) and it is the final block, use block compression instead of async. Addition for #248

Includes cespare/xxhash#54 Fixes #248

klauspost added a commit that referenced this issue Mar 26, 2020

zstd: Reduce allocations when decoding from a bytes.Buffer

eae3c60

Fixes #248

klauspost mentioned this issue Mar 26, 2020

zstd: Reduce allocations when decoding from a bytes.Buffer #249

Merged

klauspost closed this as completed in #249 Mar 27, 2020

klauspost added a commit that referenced this issue Mar 27, 2020

zstd: Reduce allocations when decoding from a bytes.Buffer (#249)

6fa181d

Fixes #248

klauspost reopened this Mar 27, 2020

klauspost added a commit that referenced this issue Mar 27, 2020

zstd: If first block and 'final', encode direct.

c432a5d

If if writing header (ie first block) and it is the final block, use block compression instead of async. Addition for #248

klauspost mentioned this issue Mar 27, 2020

zstd: If first block and 'final', encode direct. #251

Merged

klauspost closed this as completed Mar 27, 2020

vmihailenco mentioned this issue Jan 18, 2021

zstd: add no-goroutine option to encoding #264

Closed

klauspost added a commit that referenced this issue Sep 1, 2021

Upgrade xxhash

ffd4cdb

Includes cespare/xxhash#54 Fixes #248

klauspost mentioned this issue Sep 1, 2021

Upgrade xxhash #430

Merged

klauspost added a commit that referenced this issue Sep 1, 2021

Upgrade xxhash (#430)

d4a7d38

Includes cespare/xxhash#54 Fixes #248

klauspost reopened this Sep 1, 2021

klauspost closed this as completed Sep 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zstd performance and Reader/Writer reuse #248

zstd performance and Reader/Writer reuse #248

vmihailenco commented Mar 26, 2020 •

edited

Loading

klauspost commented Mar 26, 2020

klauspost commented Mar 26, 2020

vmihailenco commented Mar 26, 2020

klauspost commented Mar 27, 2020

vmihailenco commented Mar 27, 2020 •

edited

Loading

klauspost commented Mar 27, 2020

vmihailenco commented Mar 27, 2020

klauspost commented Mar 27, 2020

vmihailenco commented Mar 27, 2020 •

edited

Loading

klauspost commented Mar 27, 2020

zstd performance and Reader/Writer reuse #248

zstd performance and Reader/Writer reuse #248

Comments

vmihailenco commented Mar 26, 2020 • edited Loading

klauspost commented Mar 26, 2020

klauspost commented Mar 26, 2020

vmihailenco commented Mar 26, 2020

klauspost commented Mar 27, 2020

vmihailenco commented Mar 27, 2020 • edited Loading

klauspost commented Mar 27, 2020

vmihailenco commented Mar 27, 2020

klauspost commented Mar 27, 2020

vmihailenco commented Mar 27, 2020 • edited Loading

klauspost commented Mar 27, 2020

vmihailenco commented Mar 26, 2020 •

edited

Loading

vmihailenco commented Mar 27, 2020 •

edited

Loading

vmihailenco commented Mar 27, 2020 •

edited

Loading