Streaming decompress into fragmented (but stable) output without history buffer?

Thank you for this amazingly powerful library. I think we (at https://github.com/vectorizedio/redpanda) may have a unique use case which appears to _almost_ be covered by the API, but there is still an open question about a viable solution, and I'm hoping to get some advice on the limits of the current API.

tl;dr: is it possible to use `ZSTD_c_stableOutBuffer` with a stable (but fragmented) output buffer? we operate in an environment where memory fragmentation prevents on-demand allocation of contiguous regions (even 2 MB) which would otherwise be needed in our use case for the streaming decompression history buffer. We don't control the compression process, and while we do impose limits, when those limits derive from our inability to acquire contiguous memory regions > 1-2 MB the restrictions are too severe.

### full details of use case


We are using zstd to decompress source data that is stored in a heap allocated fragmented buffer (not folly::IOBuf chains, but similar idea):

    source = [buf-0, buf-1, ...]

because the source data is not stored in contiguous memory we cannot use the single-shot API, and so we rely on streaming decompression where the output data is also stored in a fragmented buffer.

* This all works fine, and so far it seems this is a completely normal use of the zstd streaming API.

The problem we are facing arises with the size requirements of the history buffer that streaming decompression requires. Specifically we care about limiting the size of any _contiguous_ region of memory.

It is probably relevant now to state that we do not control the compression process, so we can't enforce specific limits on the window size. Because we can't control the window size, my understanding is that even the buffer-less API is not helpful because the caller must still provide a contiguous memory region large enough for the history which in turn is dependent on the compression process.

* I would expect at this point the suggestion to be to limit window size, and we do up to a reasonable point.

However, we operate in an environment where memory becomes fragmented over time and we cannot rely on the OS to provide us with a contiguous region via virtual memory on demand. The fragmentation becomes bad enough that it can become hard to find even 1 to 2 MBs of contiguous memory. For us this would translate into an unreasonable restriction on the clients' compression process.

* I thought for sure we'd find a solution looking at the use of zstd in the Linux kernel, but I found that zstd in that environment makes use of `vmalloc` so it doesn't have this issue :)

The current solution we are using is to statically allocate a sufficiently large buffer (say 4 or 8 MBs etc...) during boot when memory is not fragmented, and then using the zstd static init experimental API to provide this to the streaming decompression API.

* This isn't ideal because we need to reserve based on a worst case scenarios (up to some limit), and more important, this memory in pinned forever, even though clients may never send us compressed data.

I recently stumbled upon the PR https://github.com/facebook/zstd/pull/2094 that introduced `ZSTD_c_stableOutBuffer` and from what I can tell it solves the issue of the internal history buffer by relying on a stable output.

* Can `ZSTD_c_stableOutBuffer` work with an output buffer that is fragmented? The output buffers can themselves remain stable in memory, but the entirety of the decompressed output would need to be fragmented in (potentially) arbitrary ways, though we could add a stricter allocation policy if necessary.

Are there other approaches to solving the problem we are facing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming decompress into fragmented (but stable) output without history buffer? #2723

full details of use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streaming decompress into fragmented (but stable) output without history buffer? #2723

Description

full details of use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions