Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miniz_oxide::inflate::stream::inflate consumes more bytes than it actually used #158

Open
link2xt opened this issue Oct 6, 2024 · 6 comments
Labels

Comments

@link2xt
Copy link

link2xt commented Oct 6, 2024

Here is a failing test:

#[test]
fn partial_decompression_imap() {
    use miniz_oxide::inflate::stream::{inflate, InflateState};
    use miniz_oxide::{DataFormat, MZFlush};

    // Decompresses to
    // "* QUOTAROOT INBOX \"User quota\"\r\n* QUOTA \"User quota\" (STORAGE 76 307200)\r\nA0001 OK Getquotaroot completed (0.001 + 0.000 secs).\r\n"
    let input = vec![
        210, 82, 8, 12, 245, 15, 113, 12, 242, 247, 15, 81, 240, 244, 115, 242, 143, 80, 80, 10,
        45, 78, 45, 82, 40, 44, 205, 47, 73, 84, 226, 229, 210, 130, 200, 163, 136, 42, 104, 4,
        135, 248, 7, 57, 186, 187, 42, 152, 155, 41, 24, 27, 152, 27, 25, 24, 104, 242, 114, 57,
        26, 24, 24, 24, 42, 248, 123, 43, 184, 167, 150, 128, 213, 21, 229, 231, 151, 40, 36, 231,
        231, 22, 228, 164, 150, 164, 166, 40, 104, 24, 232, 129, 20, 104, 43, 128, 104, 3, 133,
        226, 212, 228, 98, 77, 61, 94, 46, 0, 0, 0, 0, 255, 255,
    ];

    let mut inflate_stream = InflateState::new(DataFormat::Raw);
    let mut output = vec![0; 8];
    let result = inflate(&mut inflate_stream, &input, &mut output, MZFlush::None);

    assert!(result.status.is_ok());
    // Should not consume everything, there is not enough space in the buffer for the output.
    assert_ne!(result.bytes_consumed, input.len())
}

Output buffer is only 8 bytes in size, but miniz_oxide consumes the whole input buffer anyway.
This causes the issue in flate2 that does not happen when using another deflate implementations such as zlib, so I think it is a miniz_oxide bug: rust-lang/flate2-rs#434

@link2xt
Copy link
Author

link2xt commented Oct 7, 2024

This underlying decompress function is probably fine, it takes out and should make sure not to overflow it:

pub fn decompress(
r: &mut DecompressorOxide,
in_buf: &[u8],
out: &mut [u8],
out_pos: usize,
flags: u32,
) -> (TINFLStatus, usize, usize) {

But inflate_loop here passes state.dict into decompress as the output buffer:

And this output buffer size is 32768 (TINFL_FZ_DICT_SIZE).

@link2xt
Copy link
Author

link2xt commented Oct 8, 2024

Maybe there is no bug, I made decompression work with flate2 using miniz_oxide backend: rust-lang/flate2-rs#436
Seems the input just got consumed into InflateState and on the next call it gets out. This does not happen with zlib, but still usable.

@link2xt
Copy link
Author

link2xt commented Oct 9, 2024

@oyvindln I made async-compression (which in turn uses flate2) to work: Nullus157/async-compression#294

But this requires trying to inflate even when we have no input anymore to check if we can extract something from the internal state of the decoder. If it is possible to change miniz_oxide to behave like zlib and not consume more input than actually used that would be great as it will make miniz_oxide more difficult to misuse. Otherwise I think this "footgun" should at least be documented in miniz_oxide and flate2 documentation, current flate2 documentation is misleading and this led to bug in async-compression.

@phord
Copy link
Contributor

phord commented Oct 9, 2024

I believe it is possible to have all of your input "consumed" yet still have remaining bytes to retrieve via zlib, too. It may be only a byte or two. But if you're not checking the state variables for "done". you may still be leaving pieces behind.

@oyvindln
Copy link
Collaborator

Hm, would have to investigate that as well then

@oyvindln
Copy link
Collaborator

Seeing as this seems to be how the original miniz behaves and is designed around I think it makes sense to keep this as the default behaviour (though document it properly.

I think it would be doable to make a separate variant of the function by using a non-circular internal buffer that would limit the buffer in the inner call to decompress to the same size as the output buffer so it behaves similarly to zlib though. It's possible it would have slightly different performance characteristics but I can't tell without implementing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants