Skip to content

ZSTD_error_dstSize_tooSmall when stream-decompressing with ZSTD_d_stableOutBuffer #3004

@fgenesis

Description

@fgenesis

Related to #2094 and maybe #2723, regarding ZSTD_d_stableOutBuffer. Using zstd 1.5.1 on win10 64bit, MSVC 2019.

Context:
I'm trying to use streaming zstd in a use case very similar to https://fgiesen.wordpress.com/2011/11/21/buffer-centric-io/ (Ctrl+F cool thing for extra context). So my primary goal is to stream-decompress small-ish blocks of data to keep memory usage low.

The TL;DR of the link above is that no user output buffer (and thus, no extra copy from the window) is needed when you can directly expose the LZ77 window and give the user a pointer to that.

I allocate a buffer of frameHeader.windowSize bytes for the window, set ZSTD_d_stableOutBuffer, pin the output + size and then start decompressing. A small file works fine but as soon as the decompressed data is too big to fit into the window, it fails with ZSTD_error_dstSize_tooSmall just after parsing the header internally.

The error is set here (in zstd_decompress.c)

            /* Check output buffer is large enough for ZSTD_odm_stable. */
            if (zds->outBufferMode == ZSTD_bm_stable
                && zds->fParams.frameType != ZSTD_skippableFrame
                && zds->fParams.frameContentSize != ZSTD_CONTENTSIZE_UNKNOWN
                && (U64)(size_t)(oend-op) < zds->fParams.frameContentSize) {
                RETURN_ERROR(dstSize_tooSmall, "ZSTD_obm_stable passed but ZSTD_outBuffer is too small");
            }

My window size is ~2MB (as taken from the header; was compressed with level 3) and frameContentSize is 18 MB.
Why is this check there? Shouldn't it it check the window size instead? And the window is large enough to handle any back-references it enounters so that shouldn't be an issue.
The way the code works now effectively breaks chunked streaming with ZSTD_d_stableOutBuffer when the output size is known.

Since my buffer is exactly window-sized and externally read-only, it should be possible to decompress one window worth of data, hand a pointer to the user for reading directly from the window, then resume decompression, overwrite the window, hand it to the user again, and so on.

I'm doing this -- pseudo-ish code, simplified, based on the streaming example:

// this function is called whenever the user requests new decompressed data:
decompSome(z) { // z = ... context, pointers, output, etc. heap-allocated. z->dc is the zstd stream
    N = 0; // # of bytes decompressed
    for (;;)
    {
        p = pointer to some compressed data; streamed in
        avail = # bytes available in p; at least 1 byte.

        ZSTD_inBuffer input = { p, avail, 0 };
        for (;;)
        {
            size_t const oldpos = z->output.pos == z->output.size ? 0 : z->output.pos; // handle window wraparound
            size_t const ret = ZSTD_decompressStream(z->dc, &z->output, &input);
            if (ZSTD_isError(ret))
               return fail(....):
            N += z->output.pos - oldpos;
            p += input.pos;
            if (!ret) // all done?
                goto eof;
            if (z->output.pos == z->output.size) // window full? store state for next time
                goto out;
            if (input.pos == input.size) // input consumed, get more
                break;
        }
    }
eof: ...
out: store state, hand the user (z->output.dst, N)
}

Am i overlooking something, mis-using the API, or is this a bug?

I have this working with an inflater using miniz and also wrote a quick'n'dirty streaming LZ4 decoder that exposes the window like this.

Would be nice to be able to do the same with zstd.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions