Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifications about streaming support #24

Open
golddranks opened this issue Apr 13, 2018 · 11 comments
Open

Clarifications about streaming support #24

golddranks opened this issue Apr 13, 2018 · 11 comments

Comments

@golddranks
Copy link
Contributor

golddranks commented Apr 13, 2018

Hi, I'm developing a streaming .ZIP library that uses miniz_oxide for decompressing DEFLATE. Streaming is hard because you get the data in smaller chunks and want to decompress incrementally. Because of some peculiarities of the .ZIP format, you don't sometimes even know the size of the data beforehands. Here's my library (WIP): https://github.com/golddranks/stream_zipper

I've managed to get decompression to work when parsing files as one chunk. However, incremental decompression is giving me trouble. I'd like to ask some questions which I'd hope get eventually clarified in documentation:

  1. What's the difference between FailedCannotMakeProgress and NeedsMoreInput states? If I hand miniz_oxide a partial buffer (try git cloning my repo and running cargo test test_partial_decompression -- --nocapture and see the debug printing; it gives 400 bytes instead of the full buffer), I get FailedCannotMakeProgress.

  2. If I want to continue incrementally, what parameters am I supposed to pass? At the moment, I pass the rest of the input buffer (bytes after 400 EDIT: After making sure that it has indeed consumed 400 bytes up to that point!) and make a fresh output buffer, plus reuse the same DecompressorOxide struct, but it returns a Failure. If I pass the input buffer in one go, it manages to decompress the thing. (Edit: Is it already in a failed state after FailedCannotMakeProgress so that's why it returns Failure?)

  3. If I understand right, DEFLATE stream format has a marker when the stream ends, so knowing the size beforehands isn't required. Is this correct?

@Frommi
Copy link
Owner

Frommi commented Apr 13, 2018

Hi!

  1. FailedCannotMakeProgress means there is no way input data is valid (pending any bugs). Calling decompress again will continue to yield this error until init is called. NeedsMoreInput means that input stopped in the middle of the stream and subsequent decompress calls are needed.
  2. Parameters seem fine, but you actually do not reuse DecompressorOxide as there is init every call of decompress_to_vec and it is called from deflate. This means that DecompressorOxide looses all the in-between state and interprets input[400..] as starting of another stream at the second call.
  3. Yes.

Hope this helps.

@golddranks
Copy link
Contributor Author

Hi, thanks for the answer.

First of all, about 2. indeed, that was an oversight! Silly me.

About 1. So, I'm getting this error when handing a partial buffer to the library – even when calling for the first time, so DecompressorOxide is inited correctly. If I hand the whole buffer, it manages to decompress it without problems. Is miniz_oxide supposed to be able to handle data that is truncated at an arbitrary place? Either I've got that wrong, or then I found a bug, I guess?

@golddranks
Copy link
Contributor Author

Actually I don't quite understand this explanation: "More input data was expected, but the caller indicated that there was more data, so the input stream is likely truncated" How is the caller indicating that there was more data? If the input stream is truncated, how is this different from just NeedsMoreInput error?

@golddranks
Copy link
Contributor Author

Ah! I got it finally! One is supposed to pass the TINFL_FLAG_HAS_MORE_INPUT flag in the case of incremental decompression, right?

@Frommi
Copy link
Owner

Frommi commented Apr 13, 2018

Is miniz_oxide supposed to be able to handle data that is truncated at an arbitrary place?

Yes, it is able to handle input even byte-by-byte.

I completely forgot about that. There is a flag inflate_flags::TINFL_FLAG_HAS_MORE_INPUT that needs to be enabled when not all of the input was given. If it is not set, decompress assumes invalid data. There certainly is a need for better docs.

@golddranks
Copy link
Contributor Author

It seems that we posted at the same time :D I'll see if I can send some documentation PR's later today!

@golddranks
Copy link
Contributor Author

The first call now always succeeds with NeedsMoreInput, but the second one always ends with a failure. I made two a bit more streamlined tests test_decompression_full and test_decompression_partial here: https://github.com/golddranks/stream_zipper/blob/master/src/deflate.rs#L114

The non-partial tests succeeds whereas the partial one fails.

Can you say if I'm using the API somehow wrong?

@Frommi
Copy link
Owner

Frommi commented Apr 13, 2018

Output buffer that is given to decompress should have the old decompressed output data to do the matches. It must span at least TINFL_LZ_DICT_SIZE back or the amount of already decompressed data, that's why it is a Cursor and not just a slice; to indicate the place where decompressor left off.

If the stream is very large it is a good idea to use wrapping buffer to avoid excessive copying.

@golddranks
Copy link
Contributor Author

Ah, that explains it. Thanks again.

@oyvindln
Copy link
Collaborator

Yup, we need to do some work on the documentation, the API is quite low-level so it can be a bit difficult to figure it out.

It may also be helpful to look at how the library is used by flate2, which wraps miniz_oxide (though the original miniz is used by default) in reader and writer structs as miniz_oxide is originally a rust port of miniz to get rid of the use of c code in flate2.

@link2xt
Copy link

link2xt commented Oct 6, 2024

flate2 also has problems decompressing a stream using miniz_oxide: rust-lang/flate2-rs#434
When using zlib as a backend, it works correctly.

I have also opened an issue in this repository: #158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants