Clarifications about streaming support #24

golddranks · 2018-04-13T03:16:01Z

Hi, I'm developing a streaming .ZIP library that uses miniz_oxide for decompressing DEFLATE. Streaming is hard because you get the data in smaller chunks and want to decompress incrementally. Because of some peculiarities of the .ZIP format, you don't sometimes even know the size of the data beforehands. Here's my library (WIP): https://github.com/golddranks/stream_zipper

I've managed to get decompression to work when parsing files as one chunk. However, incremental decompression is giving me trouble. I'd like to ask some questions which I'd hope get eventually clarified in documentation:

What's the difference between FailedCannotMakeProgress and NeedsMoreInput states? If I hand miniz_oxide a partial buffer (try git cloning my repo and running cargo test test_partial_decompression -- --nocapture and see the debug printing; it gives 400 bytes instead of the full buffer), I get FailedCannotMakeProgress.
If I want to continue incrementally, what parameters am I supposed to pass? At the moment, I pass the rest of the input buffer (bytes after 400 EDIT: After making sure that it has indeed consumed 400 bytes up to that point!) and make a fresh output buffer, plus reuse the same DecompressorOxide struct, but it returns a Failure. If I pass the input buffer in one go, it manages to decompress the thing. (Edit: Is it already in a failed state after FailedCannotMakeProgress so that's why it returns Failure?)
If I understand right, DEFLATE stream format has a marker when the stream ends, so knowing the size beforehands isn't required. Is this correct?

The text was updated successfully, but these errors were encountered:

Frommi · 2018-04-13T05:28:06Z

Hi!

FailedCannotMakeProgress means there is no way input data is valid (pending any bugs). Calling decompress again will continue to yield this error until init is called. NeedsMoreInput means that input stopped in the middle of the stream and subsequent decompress calls are needed.
Parameters seem fine, but you actually do not reuse DecompressorOxide as there is init every call of decompress_to_vec and it is called from deflate. This means that DecompressorOxide looses all the in-between state and interprets input[400..] as starting of another stream at the second call.
Yes.

Hope this helps.

golddranks · 2018-04-13T07:06:40Z

Hi, thanks for the answer.

First of all, about 2. indeed, that was an oversight! Silly me.

About 1. So, I'm getting this error when handing a partial buffer to the library – even when calling for the first time, so DecompressorOxide is inited correctly. If I hand the whole buffer, it manages to decompress it without problems. Is miniz_oxide supposed to be able to handle data that is truncated at an arbitrary place? Either I've got that wrong, or then I found a bug, I guess?

golddranks · 2018-04-13T07:44:34Z

Actually I don't quite understand this explanation: "More input data was expected, but the caller indicated that there was more data, so the input stream is likely truncated" How is the caller indicating that there was more data? If the input stream is truncated, how is this different from just NeedsMoreInput error?

golddranks · 2018-04-13T08:05:04Z

Ah! I got it finally! One is supposed to pass the TINFL_FLAG_HAS_MORE_INPUT flag in the case of incremental decompression, right?

Frommi · 2018-04-13T08:05:15Z

Is miniz_oxide supposed to be able to handle data that is truncated at an arbitrary place?

Yes, it is able to handle input even byte-by-byte.

I completely forgot about that. There is a flag inflate_flags::TINFL_FLAG_HAS_MORE_INPUT that needs to be enabled when not all of the input was given. If it is not set, decompress assumes invalid data. There certainly is a need for better docs.

golddranks · 2018-04-13T08:06:35Z

It seems that we posted at the same time :D I'll see if I can send some documentation PR's later today!

golddranks · 2018-04-13T09:40:49Z

The first call now always succeeds with NeedsMoreInput, but the second one always ends with a failure. I made two a bit more streamlined tests test_decompression_full and test_decompression_partial here: https://github.com/golddranks/stream_zipper/blob/master/src/deflate.rs#L114

The non-partial tests succeeds whereas the partial one fails.

Can you say if I'm using the API somehow wrong?

Frommi · 2018-04-13T09:53:44Z

Output buffer that is given to decompress should have the old decompressed output data to do the matches. It must span at least TINFL_LZ_DICT_SIZE back or the amount of already decompressed data, that's why it is a Cursor and not just a slice; to indicate the place where decompressor left off.

If the stream is very large it is a good idea to use wrapping buffer to avoid excessive copying.

golddranks · 2018-04-13T10:20:07Z

Ah, that explains it. Thanks again.

oyvindln · 2018-04-13T13:40:40Z

Yup, we need to do some work on the documentation, the API is quite low-level so it can be a bit difficult to figure it out.

It may also be helpful to look at how the library is used by flate2, which wraps miniz_oxide (though the original miniz is used by default) in reader and writer structs as miniz_oxide is originally a rust port of miniz to get rid of the use of c code in flate2.

link2xt · 2024-10-06T22:00:26Z

flate2 also has problems decompressing a stream using miniz_oxide: rust-lang/flate2-rs#434
When using zlib as a backend, it works correctly.

I have also opened an issue in this repository: #158

oyvindln added the enhancement label Jul 13, 2018

Opioid mentioned this issue May 10, 2019

Expose crc32 #44

Closed

Plecra mentioned this issue Aug 26, 2020

Improve Documentation #93

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifications about streaming support #24

Clarifications about streaming support #24

golddranks commented Apr 13, 2018 •

edited

Loading

Frommi commented Apr 13, 2018 •

edited

Loading

golddranks commented Apr 13, 2018

golddranks commented Apr 13, 2018

golddranks commented Apr 13, 2018

Frommi commented Apr 13, 2018

golddranks commented Apr 13, 2018

golddranks commented Apr 13, 2018

Frommi commented Apr 13, 2018 •

edited

Loading

golddranks commented Apr 13, 2018

oyvindln commented Apr 13, 2018

link2xt commented Oct 6, 2024 •

edited

Loading

Clarifications about streaming support #24

Clarifications about streaming support #24

Comments

golddranks commented Apr 13, 2018 • edited Loading

Frommi commented Apr 13, 2018 • edited Loading

golddranks commented Apr 13, 2018

golddranks commented Apr 13, 2018

golddranks commented Apr 13, 2018

Frommi commented Apr 13, 2018

golddranks commented Apr 13, 2018

golddranks commented Apr 13, 2018

Frommi commented Apr 13, 2018 • edited Loading

golddranks commented Apr 13, 2018

oyvindln commented Apr 13, 2018

link2xt commented Oct 6, 2024 • edited Loading

golddranks commented Apr 13, 2018 •

edited

Loading

Frommi commented Apr 13, 2018 •

edited

Loading

Frommi commented Apr 13, 2018 •

edited

Loading

link2xt commented Oct 6, 2024 •

edited

Loading