Skip to content

Dynamic ring buffer sizing for BZip2 and ZStandard on non-seekable streams#1273

Merged
adamhathcock merged 3 commits intoreleasefrom
copilot/dynamic-default-ringbuffer-bzip2
Apr 3, 2026
Merged

Dynamic ring buffer sizing for BZip2 and ZStandard on non-seekable streams#1273
adamhathcock merged 3 commits intoreleasefrom
copilot/dynamic-default-ringbuffer-bzip2

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 2, 2026

BZip2 decompresses in whole blocks — at level 9 the compressed first block can approach 900 KB. ZStandard blocks can be up to 128 KB. The default 80 KB ring buffer causes a rewind overflow during IsTarFile format detection on non-seekable streams for both formats.

Changes

  • TarWrapper — new optional minimumRewindBufferSize constructor param and MinimumRewindBufferSize property (defaults to Constants.RewindableBufferSize). Static MaximumRewindBufferSize pre-computes the largest value across all wrappers.

    • BZip2: BZip2Constants.baseBlockSize * 9 = 900 000 bytes
    • ZStandard: ZstandardConstants.DStreamInSize = 131 075 bytes
  • ZstandardConstants — new BlockSizeMax (131072) and DStreamInSize (131075) constants matching ZSTD_BLOCKSIZE_MAX and ZSTD_DStreamInSize.

  • SharpCompressStream.StartRecording(int? minBufferSize = null) — ring buffer is allocated at recording time with max(minBufferSize, Constants.RewindableBufferSize), avoiding post-construction reallocation.

  • SeekableSharpCompressStream.StartRecording — signature updated to match; parameter ignored (uses native seeking).

  • Constants.RewindableBufferSize — reverted to 81920 (80 KB). Format-specific overrides are now declared at the wrapper level rather than inflating the global default.

  • TarFactory and TarReader.Factory — all StartRecording calls and SharpCompressStream.Create buffer sizes now use TarWrapper.MaximumRewindBufferSize, ensuring the buffer is correctly sized before any reads occur.

// Before: throws ArchiveOperationException for non-seekable .tar.bz2 or .tar.zst
// "Cannot rewind: recording anchor is 870804 bytes behind current position,
//  but ring buffer only holds 81920 bytes."

using var nonSeekable = new ForwardOnlyStream(File.OpenRead("archive.tar.bz2"));
using var reader = ReaderFactory.OpenReader(nonSeekable); // now succeeds

Copilot AI and others added 2 commits April 2, 2026 10:56
Copilot AI changed the title [WIP] Implement dynamic default RingBuffer for BZip2 Dynamic ring buffer sizing for BZip2 and ZStandard on non-seekable streams Apr 2, 2026
Copilot AI requested a review from adamhathcock April 2, 2026 11:01
@adamhathcock adamhathcock marked this pull request as ready for review April 2, 2026 11:14
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot bot commented Apr 2, 2026

Code Review Summary

Status: 1 Suggestion Found | Recommendation: Consider for merge

Overview

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 1
Issue Details (click to expand)

SUGGESTION

File Line Issue
src/SharpCompress/IO/SharpCompressStream.cs 200 Consider adding a null check or using Math.Max for the minBufferSize value to prevent potential issues with negative values
Files Reviewed (8 files)
  • src/SharpCompress/Common/Constants.cs
  • src/SharpCompress/Compressors/ZStandard/ZstandardConstants.cs
  • src/SharpCompress/Factories/TarFactory.cs
  • src/SharpCompress/Factories/TarWrapper.cs
  • src/SharpCompress/IO/SeekableSharpCompressStream.cs
  • src/SharpCompress/IO/SharpCompressStream.cs
  • src/SharpCompress/Readers/Tar/TarReader.Factory.cs
  • tests/SharpCompress.Test/Tar/TarReaderTests.cs

Overall Assessment: The PR implements a robust solution for handling BZip2 and ZStandard compressed tar files on non-seekable streams. The changes are well-documented and the approach of using format-specific minimum buffer sizes is clean. The implementation correctly sizes the ring buffer at construction time to avoid reallocation during format detection. Tests are comprehensive and cover the key scenarios.

The one suggestion is minor - a potential edge case with negative values for minBufferSize, but this would be a very unusual calling pattern and would still work correctly (just use the default buffer size).


Reviewed by minimax-m2.5-20260211 · 516,877 tokens

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses rewind buffer overflows during TAR format detection on non-seekable streams by ensuring the rewind ring buffer is sized for worst-case compressed blocks (notably BZip2 and ZStandard) before any probe reads occur.

Changes:

  • Add per-wrapper minimum rewind buffer sizing to TarWrapper, plus a computed MaximumRewindBufferSize for TAR detection paths.
  • Update TAR reader/factory creation to allocate rewind buffers using TarWrapper.MaximumRewindBufferSize, and add StartRecording(int? minBufferSize = null).
  • Add/adjust constants and regression tests covering non-seekable .tar.bz2 and larger rewind scenarios.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/SharpCompress.Test/Tar/TarReaderTests.cs Adds regression tests for non-seekable .tar.bz2 reading and wrapper minimum buffer sizing.
tests/SharpCompress.Test/Streams/SharpCompressStreamSeekTest.cs Adds tests for StartRecording(minBufferSize) behavior and default sizing.
src/SharpCompress/Readers/Tar/TarReader.Factory.cs Ensures SharpCompressStream.Create uses at least TarWrapper.MaximumRewindBufferSize for TAR probing.
src/SharpCompress/IO/SharpCompressStream.cs Extends StartRecording with optional minimum buffer sizing and allocates ring buffer on-demand.
src/SharpCompress/IO/SeekableSharpCompressStream.cs Updates StartRecording override signature to match the base class.
src/SharpCompress/Factories/TarWrapper.cs Introduces MinimumRewindBufferSize and MaximumRewindBufferSize, with BZip2/ZStandard-specific minima.
src/SharpCompress/Factories/TarFactory.cs Uses StartRecording(TarWrapper.MaximumRewindBufferSize) for TAR detection paths.
src/SharpCompress/Compressors/ZStandard/ZstandardConstants.cs Adds BlockSizeMax and DStreamInSize constants.
src/SharpCompress/Common/Constants.cs Reverts global default rewind buffer size to 80KB and documents wrapper-specific overrides.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@adamhathcock
Copy link
Copy Markdown
Owner

This will fix buffer size issues for Bzip2 and Zstd when using tars. Reduces the default ring buffer size back to 80k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dynamic default RingBuffer (for BZip2 and possibly other formats)

3 participants