Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for partial decompression #436

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

link2xt
Copy link

@link2xt link2xt commented Oct 7, 2024

This adds the test from #434

@link2xt link2xt force-pushed the link2xt/partial-decompression branch from 525e54c to 8599909 Compare October 7, 2024 16:35
@link2xt
Copy link
Author

link2xt commented Oct 7, 2024

Second commit reorders the tests so all the tests with zlib succeed and then the last test with miniz_oxide fails.

@Byron
Copy link
Member

Byron commented Oct 8, 2024

Thanks a lot for putting the PR together, I think it nicely shows the difference in behaviour. However, I would also have hoped that the code tries to continue decode the input until it received the 'finished' status. To my mind, it's fair that an implementation chooses how much of the input to buffer internally, maybe. After all I really don't know enough about how it should work.

CC @oyvindln if they have some thoughts.

@link2xt
Copy link
Author

link2xt commented Oct 8, 2024

Will try calling again until StreamEnd status, if it works then will adjust async-compression crate and close miniz_oxide issue or turn into documentation issue.

@link2xt
Copy link
Author

link2xt commented Oct 8, 2024

Apparently StreamEnd is never returned. This passes:

#[test]
fn deflate_decoder_partial() {
    let input = vec![
        210, 82, 8, 12, 245, 15, 113, 12, 242, 247, 15, 81, 240, 244, 115, 242, 143, 80, 80, 10,
        45, 78, 45, 82, 40, 44, 205, 47, 73, 84, 226, 229, 210, 130, 200, 163, 136, 42, 104, 4,
        135, 248, 7, 57, 186, 187, 42, 152, 155, 41, 24, 27, 152, 27, 25, 24, 104, 242, 114, 57,
        26, 24, 24, 24, 42, 248, 123, 43, 184, 167, 150, 128, 213, 21, 229, 231, 151, 40, 36, 231,
        231, 22, 228, 164, 150, 164, 166, 40, 104, 24, 232, 129, 20, 104, 43, 128, 104, 3, 133,
        226, 212, 228, 98, 77, 61, 94, 46, 0, 0, 0, 0, 255, 255,
    ];
    let expected_output = b"* QUOTAROOT INBOX \"User quota\"\r\n* QUOTA \"User quota\" (STORAGE 76 307200)\r\nA0001 OK Getquotaroot completed (0.001 + 0.000 secs).\r\n";

    // Create very small output buffer.
    let mut output_buf = [0; 8];
    let mut output: Vec<u8> = Vec::new();

    let zlib_header = false;
    let mut decompress = flate2::Decompress::new(zlib_header);

    let flush_decompress = flate2::FlushDecompress::None;
    loop {
        let prev_out = decompress.total_out();
        let status = decompress
            .decompress(&input[decompress.total_in() as usize..], &mut output_buf, flush_decompress)
            .unwrap();
        output.extend_from_slice(&output_buf[..(decompress.total_out() - prev_out) as usize]);
        eprintln!("{}", output.len());

        // IMAP stream never ends.
        assert_ne!(status, flate2::Status::StreamEnd);

        if output.len() == expected_output.len() {
            assert_eq!(status, flate2::Status::Ok);
            break;
        }
    }

    assert_eq!(output.as_slice(), expected_output);
}

But this is not realistic as normally I don't know in advance the size of the output. Could try calling until it emits zero bytes into the output.

@Byron
Copy link
Member

Byron commented Oct 8, 2024

The question here is if StreamEnd is happening in the libz-backend for instance, which is what I would expect.
Further, I'd add an actual assertion on the output, after all, it's known and by the end everything should have been decompressed.

That way, at least with the non-miniz-oxide implementation there is a working version of this. From there it should be possible to figure out what's happening with the miniz_oxide backend.

Thinking about it, gitoxide does something similar, but also knows the size of the input buffer. However, I don't think it relies on it.

@link2xt
Copy link
Author

link2xt commented Oct 8, 2024

The question here is if StreamEnd is happening in the libz-backend for instance, which is what I would expect.

No, it does not happen. I think StreamEnd is really for the case of the sender "closing" the stream and with IMAP this normally never happens unless you explicitly send LOGOUT command.

This is the current state of the test that passes for zlib and miniz_oxide, and there is an explicit assertion that StreamEnd never happens:

#[test]
fn deflate_decoder_partial() {
    let input = vec![
        210, 82, 8, 12, 245, 15, 113, 12, 242, 247, 15, 81, 240, 244, 115, 242, 143, 80, 80, 10,
        45, 78, 45, 82, 40, 44, 205, 47, 73, 84, 226, 229, 210, 130, 200, 163, 136, 42, 104, 4,
        135, 248, 7, 57, 186, 187, 42, 152, 155, 41, 24, 27, 152, 27, 25, 24, 104, 242, 114, 57,
        26, 24, 24, 24, 42, 248, 123, 43, 184, 167, 150, 128, 213, 21, 229, 231, 151, 40, 36, 231,
        231, 22, 228, 164, 150, 164, 166, 40, 104, 24, 232, 129, 20, 104, 43, 128, 104, 3, 133,
        226, 212, 228, 98, 77, 61, 94, 46, 0, 0, 0, 0, 255, 255,
    ];
    let expected_output = b"* QUOTAROOT INBOX \"User quota\"\r\n* QUOTA \"User quota\" (STORAGE 76 307200)\r\nA0001 OK Getquotaroot completed (0.001 + 0.000 secs).\r\n";

    // Create very small output buffer.
    let mut output_buf = [0; 8];
    let mut output: Vec<u8> = Vec::new();

    let zlib_header = false;
    let mut decompress = flate2::Decompress::new(zlib_header);

    let flush_decompress = flate2::FlushDecompress::None;
    loop {
        let prev_out = decompress.total_out();
        let status = decompress
            .decompress(&input[decompress.total_in() as usize..], &mut output_buf, flush_decompress)
            .unwrap();
        output.extend_from_slice(&output_buf[..(decompress.total_out() - prev_out) as usize]);
        eprintln!("{}", output.len());

        // IMAP stream never ends.
        assert_ne!(status, flate2::Status::StreamEnd);

        if output.len() == expected_output.len() {
            assert_eq!(status, flate2::Status::Ok);
            break;
        }
    }

    assert_eq!(output.as_slice(), expected_output);
}

@link2xt
Copy link
Author

link2xt commented Oct 8, 2024

I made the test work with all implementations. So the right condition for stopping is status == flate2::Status::BufError && output_len == 0, which means we did not output anything but there is a buffering problem so we ran out of input and not output.

I guess I can adjust async-compression crate to do similar thing now.

@oyvindln
Copy link
Contributor

oyvindln commented Oct 8, 2024

StreamEnd happens in either backend if the deflate stream is well formed and the decompressor encounters the end of a block with the last block flag set.

At least in case of zlib, BufError is returned if deflate is called with Z_FINISH as flush mode and it encounters the end of the input data but it the last block did not have an end flag. Thus the stream is technically not well formed but I presume it's treated here as not an error since it seems that this is common to encounter in many protocols and treating it as an error in the stream class may cause issues.

miniz_oxide is supposed to behave the same way so if it isn't it's a bug.

As for the choice whether to consume up to the size of the internal buffer, or just as much as would fit in the output I don't really know what's the most "correct behaviour. I have to investigate a bit more whether zlib always does the latter as it seems to at least in this situation or if it depends on some parameter/config. (Also not sure whether C miniz did here, whether it different from zlib or whether it was an accidental change during porting - I suspect the latter.) Can try to alter this behaviour in miniz_oxide if people wish.

@link2xt
Copy link
Author

link2xt commented Oct 8, 2024

I managed to make async-compression work with miniz_oxide backend for flate2: Nullus157/async-compression#294

Still need some cleanup, but I think the change makes sense in any case and depending on how you look at it we may consider miniz_oxide consuming input into internal state NOTABUG/WONTFIX.

flate2 documentation should probably not say that it is "consuming only as much input as needed" here: https://docs.rs/flate2/1.0.34/flate2/struct.Decompress.html#method.decompress
This is only true with zlib backend, while with miniz_oxide it takes up to 32768 bytes into internal state.

@link2xt
Copy link
Author

link2xt commented Oct 8, 2024

@oyvindln

As for the choice whether to consume up to the size of the internal buffer, or just as much as would fit in the output I don't really know what's the most "correct behaviour. I have to investigate a bit more whether zlib always does the latter as it seems to at least in this situation or if it depends on some parameter/config. (Also not sure whether C miniz did here, whether it different from zlib or whether it was an accidental change during porting - I suspect the latter.) Can try to alter this behaviour in miniz_oxide if people wish.

If consuming more data and saving it into internal state is better for performance (no need to parse the same data second time) then I am fine with it, what is needed is better documentation for flate2 (saying that it might consume more than needed and save it into internal state) and maybe miniz_oxide. Also exact meaning for StreamEnd and BufError is not clear.

@oyvindln
Copy link
Contributor

oyvindln commented Oct 8, 2024

I don't really know whether it would make performance worse or not, would have to test, miniz_oxide needs more performance work in any case. The behaviour could maybe also added as a parameter to the deflate function (though that may require a version bump for API change in miniz_oxide, or maybe added as an alternate function)

@Byron
Copy link
Member

Byron commented Oct 9, 2024

Thanks everyone!

It seems the only action that can be taken here is to update the documentation to specifically mention how miniz_oxide behaves right now. Maybe there could be additions to make clearer how StreamEnd and BufError are to be understood.

The behaviour could maybe also added as a parameter to the deflate function (though that may require a version bump for API change in miniz_oxide, or maybe added as an alternate function)

I like the current setup here where calls to backends seem to be the same, without special adaptations depending on the backend. But I might be wrong about that, and if so, flate2 would certainly make use of the flag to get similar behaviour.

As for this PR, it could certainly be merged with the documentation improvements, and with some clear-text description of what the test is validating. For instance, I would expect that the special handling can be removed once miniz_oxide is acting similarly as the other backends.

Does that make sense?

@link2xt
Copy link
Author

link2xt commented Oct 9, 2024

Maybe instead of status == flate2::Status::BufError the test should check that input has not advanced? So if input has not advanced and output has not advanced, then break out of the loop.

@Byron
Copy link
Member

Byron commented Oct 9, 2024

Both work, but I think it requires more comments to explain what is tested, and what the code should rather look like one day so eventually it can be adjusted to reach its final form.

@oyvindln
Copy link
Contributor

I did some testing and it seems the old C miniz backend behaved the same way as current miniz_oxide so while I would have to test the C functions to be 100% sure it does seem like this is a difference in behaviour between how the z_deflate function acts due to how the between zlib and it's C miniz counterpart due to the differences in how they are implemented miniz_oxide just continued the miniz implementation.

zlib (and I think zlib-ng but would have to check) seems to mainly write to the internal buffer window on exit and uses it as a sort of cache and back-buffer, operating mostly on the output buffer directly otherwise, while miniz/miniz_oxide always writes to an internal buffer and only flushes it when needed or on stream end unless deflate is called only once with the Finish flush option in which case it will just write directly to the provided output buffer since there is no point in using the internal buffer.

Need to do some more digging whether it's practical to alter miniz_oxide to limit the output to not write more than the size of the output buffer or whether that would require a substantial redesign of the internals.

@Byron
Copy link
Member

Byron commented Oct 16, 2024

I did some testing and it seems the old C miniz backend behaved the same way as current miniz_oxide so while I would have to test the C functions to be 100% sure it does seem like this is a difference in behaviour between how the z_deflate function acts due to how the between zlib and it's C miniz counterpart due to the differences in how they are implemented miniz_oxide just continued the miniz implementation.

I see - this issue has always been present even in the miniz C implementation which miniz_oxide is derived off.

Need to do some more digging whether it's practical to alter miniz_oxide to limit the output to not write more than the size of the output buffer or whether that would require a substantial redesign of the internals.

I wonder if there is any way to fix this here, or to fix miniz_oxide to be able to detect such a case so that callers can reliably work with it. To my mind, it's OK to work a little differently as long as it's still possible to work with status codes, which is probably what callers do most of the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants