Skip to content

Dynamic ring buffer sizing for BZip2 and ZStandard on non-seekable streams#1269

Closed
Copilot wants to merge 5 commits intomasterfrom
copilot/dynamic-default-ringbuffer
Closed

Dynamic ring buffer sizing for BZip2 and ZStandard on non-seekable streams#1269
Copilot wants to merge 5 commits intomasterfrom
copilot/dynamic-default-ringbuffer

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 31, 2026

BZip2 decompresses entire blocks before producing output. At compression level 9 the first block can hold up to 900 KB uncompressed; with incompressible data the compressed bytes approach that size. Similarly, ZStandard blocks can be up to 128 KB (ZSTD_BLOCKSIZE_MAX). The default 80 KB ring buffer was too small for both formats, causing a rewind overflow during IsTarFile format detection on non-seekable streams.

Changes

  • SharpCompressStream.StartRecording(int? minBufferSize = null) — the ring buffer is now allocated with the correct size at StartRecording time and never reallocated. Passing a minBufferSize greater than Constants.RewindableBufferSize allocates a larger buffer upfront. No-op for seekable streams (which use native seeking).

  • Constants.RewindableBufferSize — reverted to 80 KB (81920 bytes). Formats that require larger buffers declare their own minimum via TarWrapper.MinimumRewindBufferSize.

  • TarWrapper.MinimumRewindBufferSize — new property (opt-in via constructor param) declaring how large a ring buffer format detection requires for a given wrapper:

    • BZip2 sets this to BZip2Constants.baseBlockSize * 9 = 900 000 bytes
    • ZStandard sets this to ZstandardConstants.DStreamInSize = 131 075 bytes (ZSTD_BLOCKSIZE_MAX + ZSTD_blockHeaderSize)
    • All other wrappers default to Constants.RewindableBufferSize
  • ZstandardConstants.BlockSizeMax / DStreamInSize — new constants matching ZSTD_BLOCKSIZE_MAX (131072) and ZSTD_DStreamInSize (131075), used to size the ring buffer for ZStandard format detection.

  • TarWrapper.MaximumRewindBufferSize — new static property: the pre-computed maximum MinimumRewindBufferSize across all wrappers (900 000 bytes, driven by BZip2). Used as the buffer size at stream construction so no format-specific branching is needed at call sites.

  • TarFactory / TarReader.Factory — pass TarWrapper.MaximumRewindBufferSize when creating or starting recording on the stream, ensuring the buffer is correctly sized before any reads occur. Applies to both sync and async paths.

The buffer is sized correctly at construction, so seekable streams are unaffected and non-seekable streams always have enough capacity without any post-construction reallocation.

// Before: throws for non-seekable .tar.bz2 or .tar.zst streams with large first blocks
// ArchiveOperationException: Cannot rewind: recording anchor is 870804 bytes behind
// current position, but ring buffer only holds 81920 bytes.

using var nonSeekable = new ForwardOnlyStream(File.OpenRead("archive.tar.bz2"));
using var reader = ReaderFactory.OpenReader(nonSeekable); // now succeeds

using var nonSeekable2 = new ForwardOnlyStream(File.OpenRead("archive.tar.zst"));
using var reader2 = ReaderFactory.OpenReader(nonSeekable2); // now succeeds

Copilot AI changed the title [WIP] Update RingBuffer for dynamic default sizes based on compression level Dynamic ring buffer sizing for BZip2 (and extensible to other formats) Mar 31, 2026
Copilot AI requested a review from adamhathcock March 31, 2026 09:02
// BZip2 decompresses in whole blocks; the compressed size of the first block
// can be close to the uncompressed maximum (9 × 100 000 = 900 000 bytes).
// The ring buffer must hold all compressed bytes read during format detection.
minimumRewindBufferSize: BZip2Constants.baseBlockSize * 9
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot are there more kinds than just BZip2 to worry about? Zstandard?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only BZip2 needs the enlarged buffer. The other formats all decompress incrementally:

  • ZStandard reads from the source in chunks of ZSTD_DStreamInSize() ≈ 128 KB and produces output from each chunk before reading more. The default 160 KB ring buffer is larger, so no special treatment is needed.
  • GZip (DEFLATE), LZip (LZMA), XZ (LZMA2), and LZW are all streaming/dictionary-based decompressors that produce output while consuming input — well within the default 160 KB buffer.

BZip2 is unique in that it must buffer and decode an entire compressed block before producing any output, and at compression level 9 that block can be up to 900 KB of compressed data.

…(minBufferSize)

Agent-Logs-Url: https://github.com/adamhathcock/sharpcompress/sessions/47398db2-828b-472c-a213-6d5865cb0082

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copilot AI requested a review from adamhathcock April 1, 2026 11:56
@adamhathcock adamhathcock marked this pull request as ready for review April 1, 2026 13:04
Copilot AI review requested due to automatic review settings April 1, 2026 13:04
@adamhathcock
Copy link
Copy Markdown
Owner

@copilot revert the default buffer size to 80k and use the same mechanism for BZip2 for Zstandard to be what is needed

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot bot commented Apr 1, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Overview

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 0
Changes Verified (3 files)
  1. src/SharpCompress/Common/Constants.cs — Reverted default buffer from 160KB to 80KB (81920 bytes)
  2. src/SharpCompress/Compressors/ZStandard/ZstandardConstants.cs — Added BlockSizeMax (128KB) and DStreamInSize (131075 bytes) constants
  3. src/SharpCompress/Factories/TarWrapper.cs — Added minimumRewindBufferSize: ZstandardConstants.DStreamInSize to ZStandard wrapper entry
Implementation Verification

The PR owner request has been fully implemented:

  • ✅ Default RewindableBufferSize reverted from 163840 to 81920 (80KB)
  • ✅ ZStandard now uses minimumRewindBufferSize parameter in TarWrapper, same mechanism as BZip2
  • ZstandardConstants.DStreamInSize = 131075 bytes (128KB block + 3-byte header)
  • TarWrapper.MaximumRewindBufferSize correctly computed at 900000 (BZip2's requirement)
  • TarFactory passes MaximumRewindBufferSize at stream construction, ensuring all formats work correctly

The implementation is correct:

  • Default buffer is smaller (80KB) for memory efficiency
  • BZip2 and ZStandard get appropriately sized buffers via their wrapper's MinimumRewindBufferSize
  • Format detection will work correctly for all tar compression formats on non-seekable streams
Files Reviewed
  • src/SharpCompress/Common/Constants.cs - Default buffer size reverted to 80KB
  • src/SharpCompress/Compressors/ZStandard/ZstandardConstants.cs - New constants added
  • src/SharpCompress/Factories/TarWrapper.cs - ZStandard now declares minimum buffer size

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses rewind buffer overflows during TAR format detection on non-seekable .tar.bz2 streams by ensuring the rewind ring buffer is large enough to accommodate BZip2’s large first-block reads during probing.

Changes:

  • Adds an optional minimum buffer size parameter to SharpCompressStream.StartRecording(...) to allow upfront ring-buffer sizing.
  • Introduces TarWrapper.MinimumRewindBufferSize plus a computed TarWrapper.MaximumRewindBufferSize (BZip2 sets 900,000 bytes) and wires this into TAR reader/detection paths.
  • Adds regression and sizing tests for TAR+BZip2 non-seekable scenarios and StartRecording(minBufferSize) behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/SharpCompress.Test/Tar/TarReaderTests.cs Adds regression test for non-seekable .tar.bz2 and asserts wrapper minimum rewind sizes.
tests/SharpCompress.Test/Streams/SharpCompressStreamSeekTest.cs Adds tests intended to cover StartRecording(minBufferSize) and default behavior.
src/SharpCompress/Readers/Tar/TarReader.Factory.cs Ensures TAR reader stream is created with a rewind buffer sized to at least the wrapper maximum.
src/SharpCompress/packages.lock.json Updates locked versions for Microsoft.NET.ILLink.Tasks for net8.0/net10.0 targets.
src/SharpCompress/IO/SharpCompressStream.cs Extends StartRecording with optional minimum buffer size and allocates ring buffer accordingly.
src/SharpCompress/IO/SeekableSharpCompressStream.cs Updates override signature for the new StartRecording(int? ...) API.
src/SharpCompress/Factories/TarWrapper.cs Adds per-wrapper minimum rewind sizing and computes a global maximum across wrappers.
src/SharpCompress/Factories/TarFactory.cs Uses TarWrapper.MaximumRewindBufferSize when starting recording for TAR detection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 188 to 211
@@ -190,10 +200,14 @@ public virtual void StartRecording()
);
}

// Ensure ring buffer exists
// Allocate ring buffer with the requested minimum size (at least the global default).
if (_ringBuffer is null)
{
_ringBuffer = new RingBuffer(Constants.RewindableBufferSize);
var size =
minBufferSize.GetValueOrDefault() > Constants.RewindableBufferSize
? minBufferSize.GetValueOrDefault()
: Constants.RewindableBufferSize;
_ringBuffer = new RingBuffer(size);
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new StartRecording(minBufferSize) only allocates a ring buffer when _ringBuffer is null. For non-seekable streams created via SharpCompressStream.Create(...), _ringBuffer is typically preallocated in the factory constructor, so a larger minBufferSize passed here will be silently ignored even if the existing buffer is smaller than the requested minimum. Consider either (a) validating that the existing ring buffer length meets the requested minimum and throwing a clear exception if not, or (b) ensuring callers that need a larger buffer always pass it at Create-time and documenting that StartRecording cannot increase an already-allocated buffer.

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +69
// because EnsureMinimumRewindBufferSize expands the ring buffer
// to hold the BZip2 block before calling IsTarFile.
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regression test comment references EnsureMinimumRewindBufferSize expanding the ring buffer, but there is no such API in the codebase and the implementation appears to rely on allocating the correct buffer size up-front (e.g., via TarWrapper.MaximumRewindBufferSize / StartRecording(minBufferSize)). Please update the comment to match the actual mechanism so the test remains an accurate explanation of the failure mode and fix.

Suggested change
// because EnsureMinimumRewindBufferSize expands the ring buffer
// to hold the BZip2 block before calling IsTarFile.
// because the rewind ring buffer is allocated with enough capacity
// up front (e.g. via MaximumRewindBufferSize / StartRecording) to hold
// the BZip2 block before calling IsTarFile.

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +150
var stream = SharpCompressStream.Create(nonSeekableMs, largeSize);

// Pass the required size upfront — no expansion needed later
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn’t currently exercise the new StartRecording(minBufferSize) allocation path: SharpCompressStream.Create(nonSeekableMs, largeSize) already preallocates the ring buffer with largeSize, so StartRecording(largeSize) is effectively redundant. To actually validate the new behavior, construct the stream in a way that leaves _ringBuffer null until StartRecording (or start with a smaller buffer and assert the expected behavior when a larger minimum is requested).

Suggested change
var stream = SharpCompressStream.Create(nonSeekableMs, largeSize);
// Pass the required size upfront — no expansion needed later
var stream = SharpCompressStream.Create(nonSeekableMs);
// Pass the required size upfront — forces allocation to at least largeSize

Copilot uses AI. Check for mistakes.
Comment on lines +95 to 99
bufferSize: Math.Max(
readerOptions.RewindableBufferSize ?? 0,
TarWrapper.MaximumRewindBufferSize
)
);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TarReader.Factory now forces the rewind buffer size to at least TarWrapper.MaximumRewindBufferSize (currently 900,000 bytes) on non-seekable streams, even when the caller explicitly sets ReaderOptions.RewindableBufferSize smaller. This is a noticeable memory increase for non-BZip2 tar streams (e.g., tar.gz). If possible, consider honoring a smaller caller-specified size for formats that don’t require the maximum, or document that TarReader will always allocate at least this amount for non-seekable streams.

Suggested change
bufferSize: Math.Max(
readerOptions.RewindableBufferSize ?? 0,
TarWrapper.MaximumRewindBufferSize
)
);
bufferSize: readerOptions.RewindableBufferSize ?? TarWrapper.MaximumRewindBufferSize
);

Copilot uses AI. Check for mistakes.
…(DStreamInSize=131075)

Agent-Logs-Url: https://github.com/adamhathcock/sharpcompress/sessions/d5a696df-2906-4262-acf9-06cc174e91fc

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 1, 2026

@copilot revert the default buffer size to 80k and use the same mechanism for BZip2 for Zstandard to be what is needed

Done in 456a7df. Constants.RewindableBufferSize is back to 81920 (80KB). ZStandard now declares minimumRewindBufferSize: ZstandardConstants.DStreamInSize (131075 bytes = ZSTD_BLOCKSIZE_MAX + ZSTD_blockHeaderSize) in its TarWrapper entry, using the same mechanism as BZip2. MaximumRewindBufferSize (used at stream construction) is still driven by BZip2 at 900 000 bytes.

Copilot AI changed the title Dynamic ring buffer sizing for BZip2 (and extensible to other formats) Dynamic ring buffer sizing for BZip2 and ZStandard on non-seekable streams Apr 1, 2026
@adamhathcock
Copy link
Copy Markdown
Owner

Closing in favor of #1273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dynamic default RingBuffer (for BZip2 and possibly other formats)

3 participants