Add test for partial decompression #436

link2xt · 2024-10-07T16:34:23Z

This adds the test from #434

link2xt · 2024-10-07T16:40:06Z

Second commit reorders the tests so all the tests with zlib succeed and then the last test with miniz_oxide fails.

Byron · 2024-10-08T05:51:12Z

Thanks a lot for putting the PR together, I think it nicely shows the difference in behaviour. However, I would also have hoped that the code tries to continue decode the input until it received the 'finished' status. To my mind, it's fair that an implementation chooses how much of the input to buffer internally, maybe. After all I really don't know enough about how it should work.

CC @oyvindln if they have some thoughts.

link2xt · 2024-10-08T16:59:46Z

Will try calling again until StreamEnd status, if it works then will adjust async-compression crate and close miniz_oxide issue or turn into documentation issue.

link2xt · 2024-10-08T17:31:24Z

Apparently StreamEnd is never returned. This passes:

#[test]
fn deflate_decoder_partial() {
    let input = vec![
        210, 82, 8, 12, 245, 15, 113, 12, 242, 247, 15, 81, 240, 244, 115, 242, 143, 80, 80, 10,
        45, 78, 45, 82, 40, 44, 205, 47, 73, 84, 226, 229, 210, 130, 200, 163, 136, 42, 104, 4,
        135, 248, 7, 57, 186, 187, 42, 152, 155, 41, 24, 27, 152, 27, 25, 24, 104, 242, 114, 57,
        26, 24, 24, 24, 42, 248, 123, 43, 184, 167, 150, 128, 213, 21, 229, 231, 151, 40, 36, 231,
        231, 22, 228, 164, 150, 164, 166, 40, 104, 24, 232, 129, 20, 104, 43, 128, 104, 3, 133,
        226, 212, 228, 98, 77, 61, 94, 46, 0, 0, 0, 0, 255, 255,
    ];
    let expected_output = b"* QUOTAROOT INBOX \"User quota\"\r\n* QUOTA \"User quota\" (STORAGE 76 307200)\r\nA0001 OK Getquotaroot completed (0.001 + 0.000 secs).\r\n";

    // Create very small output buffer.
    let mut output_buf = [0; 8];
    let mut output: Vec<u8> = Vec::new();

    let zlib_header = false;
    let mut decompress = flate2::Decompress::new(zlib_header);

    let flush_decompress = flate2::FlushDecompress::None;
    loop {
        let prev_out = decompress.total_out();
        let status = decompress
            .decompress(&input[decompress.total_in() as usize..], &mut output_buf, flush_decompress)
            .unwrap();
        output.extend_from_slice(&output_buf[..(decompress.total_out() - prev_out) as usize]);
        eprintln!("{}", output.len());

        // IMAP stream never ends.
        assert_ne!(status, flate2::Status::StreamEnd);

        if output.len() == expected_output.len() {
            assert_eq!(status, flate2::Status::Ok);
            break;
        }
    }

    assert_eq!(output.as_slice(), expected_output);
}

But this is not realistic as normally I don't know in advance the size of the output. Could try calling until it emits zero bytes into the output.

Byron · 2024-10-08T17:42:30Z

The question here is if StreamEnd is happening in the libz-backend for instance, which is what I would expect.
Further, I'd add an actual assertion on the output, after all, it's known and by the end everything should have been decompressed.

That way, at least with the non-miniz-oxide implementation there is a working version of this. From there it should be possible to figure out what's happening with the miniz_oxide backend.

Thinking about it, gitoxide does something similar, but also knows the size of the input buffer. However, I don't think it relies on it.

link2xt · 2024-10-08T18:15:04Z

The question here is if StreamEnd is happening in the libz-backend for instance, which is what I would expect.

No, it does not happen. I think StreamEnd is really for the case of the sender "closing" the stream and with IMAP this normally never happens unless you explicitly send LOGOUT command.

This is the current state of the test that passes for zlib and miniz_oxide, and there is an explicit assertion that StreamEnd never happens:

#[test]
fn deflate_decoder_partial() {
    let input = vec![
        210, 82, 8, 12, 245, 15, 113, 12, 242, 247, 15, 81, 240, 244, 115, 242, 143, 80, 80, 10,
        45, 78, 45, 82, 40, 44, 205, 47, 73, 84, 226, 229, 210, 130, 200, 163, 136, 42, 104, 4,
        135, 248, 7, 57, 186, 187, 42, 152, 155, 41, 24, 27, 152, 27, 25, 24, 104, 242, 114, 57,
        26, 24, 24, 24, 42, 248, 123, 43, 184, 167, 150, 128, 213, 21, 229, 231, 151, 40, 36, 231,
        231, 22, 228, 164, 150, 164, 166, 40, 104, 24, 232, 129, 20, 104, 43, 128, 104, 3, 133,
        226, 212, 228, 98, 77, 61, 94, 46, 0, 0, 0, 0, 255, 255,
    ];
    let expected_output = b"* QUOTAROOT INBOX \"User quota\"\r\n* QUOTA \"User quota\" (STORAGE 76 307200)\r\nA0001 OK Getquotaroot completed (0.001 + 0.000 secs).\r\n";

    // Create very small output buffer.
    let mut output_buf = [0; 8];
    let mut output: Vec<u8> = Vec::new();

    let zlib_header = false;
    let mut decompress = flate2::Decompress::new(zlib_header);

    let flush_decompress = flate2::FlushDecompress::None;
    loop {
        let prev_out = decompress.total_out();
        let status = decompress
            .decompress(&input[decompress.total_in() as usize..], &mut output_buf, flush_decompress)
            .unwrap();
        output.extend_from_slice(&output_buf[..(decompress.total_out() - prev_out) as usize]);
        eprintln!("{}", output.len());

        // IMAP stream never ends.
        assert_ne!(status, flate2::Status::StreamEnd);

        if output.len() == expected_output.len() {
            assert_eq!(status, flate2::Status::Ok);
            break;
        }
    }

    assert_eq!(output.as_slice(), expected_output);
}

This reverts commit 8769362.

link2xt · 2024-10-08T18:21:55Z

I made the test work with all implementations. So the right condition for stopping is status == flate2::Status::BufError && output_len == 0, which means we did not output anything but there is a buffering problem so we ran out of input and not output.

I guess I can adjust async-compression crate to do similar thing now.

oyvindln · 2024-10-08T19:58:59Z

StreamEnd happens in either backend if the deflate stream is well formed and the decompressor encounters the end of a block with the last block flag set.

At least in case of zlib, BufError is returned if deflate is called with Z_FINISH as flush mode and it encounters the end of the input data but it the last block did not have an end flag. Thus the stream is technically not well formed but I presume it's treated here as not an error since it seems that this is common to encounter in many protocols and treating it as an error in the stream class may cause issues.

miniz_oxide is supposed to behave the same way so if it isn't it's a bug.

As for the choice whether to consume up to the size of the internal buffer, or just as much as would fit in the output I don't really know what's the most "correct behaviour. I have to investigate a bit more whether zlib always does the latter as it seems to at least in this situation or if it depends on some parameter/config. (Also not sure whether C miniz did here, whether it different from zlib or whether it was an accidental change during porting - I suspect the latter.) Can try to alter this behaviour in miniz_oxide if people wish.

link2xt · 2024-10-08T20:45:22Z

I managed to make async-compression work with miniz_oxide backend for flate2: Nullus157/async-compression#294

Still need some cleanup, but I think the change makes sense in any case and depending on how you look at it we may consider miniz_oxide consuming input into internal state NOTABUG/WONTFIX.

flate2 documentation should probably not say that it is "consuming only as much input as needed" here: https://docs.rs/flate2/1.0.34/flate2/struct.Decompress.html#method.decompress
This is only true with zlib backend, while with miniz_oxide it takes up to 32768 bytes into internal state.

link2xt · 2024-10-08T21:00:28Z

@oyvindln

As for the choice whether to consume up to the size of the internal buffer, or just as much as would fit in the output I don't really know what's the most "correct behaviour. I have to investigate a bit more whether zlib always does the latter as it seems to at least in this situation or if it depends on some parameter/config. (Also not sure whether C miniz did here, whether it different from zlib or whether it was an accidental change during porting - I suspect the latter.) Can try to alter this behaviour in miniz_oxide if people wish.

If consuming more data and saving it into internal state is better for performance (no need to parse the same data second time) then I am fine with it, what is needed is better documentation for flate2 (saying that it might consume more than needed and save it into internal state) and maybe miniz_oxide. Also exact meaning for StreamEnd and BufError is not clear.

oyvindln · 2024-10-08T22:35:21Z

I don't really know whether it would make performance worse or not, would have to test, miniz_oxide needs more performance work in any case. The behaviour could maybe also added as a parameter to the deflate function (though that may require a version bump for API change in miniz_oxide, or maybe added as an alternate function)

Byron · 2024-10-09T05:20:48Z

Thanks everyone!

It seems the only action that can be taken here is to update the documentation to specifically mention how miniz_oxide behaves right now. Maybe there could be additions to make clearer how StreamEnd and BufError are to be understood.

The behaviour could maybe also added as a parameter to the deflate function (though that may require a version bump for API change in miniz_oxide, or maybe added as an alternate function)

I like the current setup here where calls to backends seem to be the same, without special adaptations depending on the backend. But I might be wrong about that, and if so, flate2 would certainly make use of the flag to get similar behaviour.

As for this PR, it could certainly be merged with the documentation improvements, and with some clear-text description of what the test is validating. For instance, I would expect that the special handling can be removed once miniz_oxide is acting similarly as the other backends.

Does that make sense?

link2xt · 2024-10-09T05:59:27Z

Maybe instead of status == flate2::Status::BufError the test should check that input has not advanced? So if input has not advanced and output has not advanced, then break out of the loop.

Byron · 2024-10-09T06:59:47Z

Both work, but I think it requires more comments to explain what is tested, and what the code should rather look like one day so eventually it can be adjusted to reach its final form.

oyvindln · 2024-10-15T19:37:43Z

I did some testing and it seems the old C miniz backend behaved the same way as current miniz_oxide so while I would have to test the C functions to be 100% sure it does seem like this is a difference in behaviour between how the z_deflate function acts due to how the between zlib and it's C miniz counterpart due to the differences in how they are implemented miniz_oxide just continued the miniz implementation.

zlib (and I think zlib-ng but would have to check) seems to mainly write to the internal buffer window on exit and uses it as a sort of cache and back-buffer, operating mostly on the output buffer directly otherwise, while miniz/miniz_oxide always writes to an internal buffer and only flushes it when needed or on stream end unless deflate is called only once with the Finish flush option in which case it will just write directly to the provided output buffer since there is no point in using the internal buffer.

Need to do some more digging whether it's practical to alter miniz_oxide to limit the output to not write more than the size of the output buffer or whether that would require a substantial redesign of the internals.

Byron · 2024-10-16T05:29:54Z

I did some testing and it seems the old C miniz backend behaved the same way as current miniz_oxide so while I would have to test the C functions to be 100% sure it does seem like this is a difference in behaviour between how the z_deflate function acts due to how the between zlib and it's C miniz counterpart due to the differences in how they are implemented miniz_oxide just continued the miniz implementation.

I see - this issue has always been present even in the miniz C implementation which miniz_oxide is derived off.

Need to do some more digging whether it's practical to alter miniz_oxide to limit the output to not write more than the size of the output buffer or whether that would require a substantial redesign of the internals.

I wonder if there is any way to fix this here, or to fix miniz_oxide to be able to detect such a case so that callers can reliably work with it. To my mind, it's OK to work a little differently as long as it's still possible to work with status codes, which is probably what callers do most of the time.

Add failing test

8599909

link2xt force-pushed the link2xt/partial-decompression branch from 525e54c to 8599909 Compare October 7, 2024 16:35

Reorder test without zlib to the end

8769362

link2xt added 3 commits October 8, 2024 18:18

Make partial decompression test work with both zlib and miniz_oxide

57425a6

Revert "Reorder test without zlib to the end"

40790ed

This reverts commit 8769362.

Remove eprintln

1325bb3

rustfmt

8e72245

link2xt mentioned this pull request Oct 8, 2024

miniz_oxide::inflate::stream::inflate consumes more bytes than it actually used Frommi/miniz_oxide#158

Open

link2xt mentioned this pull request Oct 8, 2024

fix(tokio): attempt to decode from internal state even if nothing was read Nullus157/async-compression#294

Merged

Byron added help wanted and removed help wanted labels Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test for partial decompression #436

Add test for partial decompression #436

link2xt commented Oct 7, 2024 •

edited

Loading

link2xt commented Oct 7, 2024

Byron commented Oct 8, 2024

link2xt commented Oct 8, 2024 •

edited

Loading

link2xt commented Oct 8, 2024

Byron commented Oct 8, 2024

link2xt commented Oct 8, 2024

link2xt commented Oct 8, 2024

oyvindln commented Oct 8, 2024

link2xt commented Oct 8, 2024

link2xt commented Oct 8, 2024

oyvindln commented Oct 8, 2024

Byron commented Oct 9, 2024

link2xt commented Oct 9, 2024

Byron commented Oct 9, 2024

oyvindln commented Oct 15, 2024

Byron commented Oct 16, 2024

Add test for partial decompression #436

Are you sure you want to change the base?

Add test for partial decompression #436

Conversation

link2xt commented Oct 7, 2024 • edited Loading

link2xt commented Oct 7, 2024

Byron commented Oct 8, 2024

link2xt commented Oct 8, 2024 • edited Loading

link2xt commented Oct 8, 2024

Byron commented Oct 8, 2024

link2xt commented Oct 8, 2024

link2xt commented Oct 8, 2024

oyvindln commented Oct 8, 2024

link2xt commented Oct 8, 2024

link2xt commented Oct 8, 2024

oyvindln commented Oct 8, 2024

Byron commented Oct 9, 2024

link2xt commented Oct 9, 2024

Byron commented Oct 9, 2024

oyvindln commented Oct 15, 2024

Byron commented Oct 16, 2024

link2xt commented Oct 7, 2024 •

edited

Loading

link2xt commented Oct 8, 2024 •

edited

Loading