Skip to content

Fix ZIP64 stream bounding and WinZip AES read-state corruption in ZIP reader#1247

Closed
fsdsabel wants to merge 5 commits intoadamhathcock:masterfrom
fsdsabel:sabel/fix-zip64-and-zipzstd
Closed

Fix ZIP64 stream bounding and WinZip AES read-state corruption in ZIP reader#1247
fsdsabel wants to merge 5 commits intoadamhathcock:masterfrom
fsdsabel:sabel/fix-zip64-and-zipzstd

Conversation

@fsdsabel
Copy link

@fsdsabel fsdsabel commented Mar 9, 2026

Summary

One thing upfront: This PR has been created with extensive help from Codex. I reviewed the code to the best of my knowledge but I'm in no way an expert in ZIP file handling. The code looks plausible to me though. I tried to keep the changes aligned to the repo standards. And from one dev to the other: Thank you for your work!

This change fixes two independent ZIP reader issues in SharpCompress that affect interoperability with ZIP archives using Zstandard compression, ZIP64, and WinZip AES.

The failures were reproduced with:

  • a mixed archive fixture containing Zstandard entries with both encrypted and unencrypted payloads
  • a generated large ZIP64 archive used to validate stream bounding for entries larger than 4 GiB

Problem

1. ZIP64 entries could over-read into the next ZIP structures

When a ZIP entry was marked as ZIP64, SharpCompress did not always bound the input stream to the entry's compressed size before passing it to the decompressor.

As a result, a decompressor such as Zstandard could continue reading past the end of the current entry and into the next local header or central directory bytes. In practice this could surface as errors like:

  • SharpCompress.Compressors.ZStandard.ZstdException: Unknown frame descriptor

The issue was not the entry start position. The compressed payload began at the correct offset. The problem was that the stream exposed more than the current entry.

2. WinZip AES decryption could corrupt payloads across non-aligned reads

WinzipAesCryptoStream implements CTR-style decryption using 16-byte counter blocks.

The previous implementation did not preserve partially consumed keystream bytes across multiple Read / ReadAsync calls. If a read ended in the middle of a 16-byte counter block, the remaining keystream bytes were discarded and the next read advanced to the next counter block too early.

That breaks CTR state continuity and corrupts the remainder of the decrypted payload. With Zstandard-compressed encrypted entries this could surface as:

  • SharpCompress.Compressors.ZStandard.ZstdException: Data corruption detected

Root Cause

ZIP64 stream bounding

In ZipFilePart.GetCryptoStream and GetCryptoStreamAsync, ZIP64 entries were treated as if they required an unbounded stream. That skipped the ReadOnlySubStream wrapper even when the compressed size was known.

For sized entries, including ZIP64 entries, that behavior is incorrect.

AES CTR state handling

In WinzipAesCryptoStream, the current counter block was effectively treated as fully consumed at the end of each read operation, even if only part of its keystream had actually been used.

CTR mode requires byte-accurate continuation within the current counter block until all 16 bytes are consumed.

Fix

1. Bound all entries with a known compressed size

SharpCompress now keeps the stream unbounded only for the real special case:

  • the compressed size is unknown at header-read time
  • and a post-data descriptor is used

ZIP64 no longer disables stream bounding by itself. If the compressed size is known, the entry stream is wrapped in ReadOnlySubStream, regardless of whether the entry uses ZIP64.

This prevents decompression code from reading into the next ZIP structures.

2. Preserve unused CTR keystream bytes across reads

WinzipAesCryptoStream now tracks the offset within the current 16-byte keystream block.

Instead of generating a new counter block for each transform step, it now:

  • consumes any remaining bytes from the current keystream block first
  • advances the counter only after the full 16-byte block has been used

This makes decryption correct for arbitrary sync and async read sizes.

Tests

Streams/WinzipAesCryptoStreamTests

Added stream-level tests for:

  • successful decryption with aligned reads
  • successful decryption across non-aligned synchronous reads
  • successful decryption across non-aligned asynchronous reads
  • stopping exactly at the encrypted payload length without exposing trailing authentication bytes

These tests cover the generic AES stream behavior and specifically protect the CTR state handling that caused the corruption.

Zip/ZipFilePartTests

Added targeted tests for ZipFilePart stream construction:

  • ZIP64 entries with a known compressed size are bounded with ReadOnlySubStream
  • entries using a post-data descriptor with unknown size remain unbounded as intended

These tests cover the stream-shaping logic that controls whether decompression sees exactly one entry or the rest of the archive.

Zip/ZipArchiveTests

Added integration coverage for two scenarios:

  • a committed fixture archive with mixed Zstandard entries, including WinZip AES encrypted entries
  • a generated large ZIP64 archive test that validates entry stream bounding for archives larger than 4 GiB

The mixed archive fixture uses the same test password convention as the existing ZIP tests.

The large ZIP64 test is marked as explicit because it writes a real archive larger than 4 GiB to disk and is not intended for the default test run.

Validation

The fix was validated in several ways:

  • targeted unit tests for WinzipAesCryptoStream and ZipFilePart
  • archive-level tests using a checked-in mixed Zstandard / WinZip AES fixture
  • a generated large ZIP64 archive written through ZipWriter and read back through ZipArchive
  • real world test in my specific use case with real world data

After the fix:

  • encrypted Zstandard entries can be read successfully
  • ZIP64 entries are bounded correctly and no longer over-read into following ZIP structures
  • mixed encrypted and unencrypted archives can be read end-to-end

Impact

This improves ZIP interoperability for archives that combine:

  • ZIP64
  • Zstandard
  • WinZip AES
  • mixed encrypted and unencrypted entries

The change is internal to stream construction and AES read-state handling and does not alter the public API.

@fsdsabel fsdsabel changed the title Sabel/fix zip64 and zipzstd Fix ZIP64 stream bounding and WinZip AES read-state corruption in ZIP reader Mar 9, 2026
@adamhathcock
Copy link
Owner

Thanks for this. This is also a bit beyond me so I'm relying on AI and tests to help out. It smells right though

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes two independent ZIP reader issues in SharpCompress:

  1. ZIP64 stream bounding: Previously, ZIP64 entries were treated as unbounded streams (like data-descriptor entries with unknown size), allowing decompressors to read past entry boundaries into subsequent ZIP structures. The fix ensures only entries that truly have unknown size (post-data descriptor with CompressedSize == 0) are unbounded; ZIP64 entries with known compressed sizes are now properly bounded with ReadOnlySubStream.

  2. WinZip AES CTR state preservation: The WinzipAesCryptoStream CTR mode implementation discarded partially consumed keystream bytes between reads. The fix introduces _counterOutOffset to track the position within the current 16-byte keystream block, ensuring correct byte-accurate CTR continuation across arbitrary read sizes.

Changes:

  • Removed || Header.IsZip64 from the stream-bounding condition in both sync and async GetCryptoStream methods
  • Replaced the per-block ReadTransformOneBlock / _isFinalBlock pattern in WinzipAesCryptoStream with offset-tracked CTR keystream handling
  • Added comprehensive unit tests for both fixes

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/SharpCompress/Common/Zip/ZipFilePart.cs Removes `
src/SharpCompress/Common/Zip/ZipFilePart.Async.cs Same fix for async path
src/SharpCompress/Common/Zip/WinzipAesCryptoStream.cs Replaces _isFinalBlock with _counterOutOffset tracking for correct CTR state across reads (sync byte[] path)
src/SharpCompress/Common/Zip/WinzipAesCryptoStream.Async.cs Same CTR fix for async Span<byte> path
tests/SharpCompress.Test/Zip/ZipFilePartTests.cs New unit tests for stream-bounding logic
tests/SharpCompress.Test/Zip/ZipArchiveTests.cs Integration tests for mixed Zstandard/AES archives and generated ZIP64 archives
tests/SharpCompress.Test/Streams/WinzipAesCryptoStreamTests.cs Unit tests for CTR state preservation across aligned and non-aligned reads

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fsdsabel
Copy link
Author

Good catch. The async Memory<byte> path was not actually mirroring the sync implementation after the CTR state fix.

ReadTransformOneBlock now interprets its third argument as the remaining byte count, so passing the original count on every iteration was incorrect once posn had advanced. That could cause the async span-based path to XOR beyond the unread portion of the current buffer.

I fixed this by restructuring ReadTransformBlocks(Span<byte>, int) to match the sync path and track a decreasing remaining value per iteration. I also added a dedicated test for the ReadAsync(Memory<byte>) overload, since the existing async test only covered the byte[] overload.

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Mar 11, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Overview

This PR fixes two significant bugs in the WinZip AES crypto stream handling:

  1. WinzipAesCryptoStream Loop Bug (Fixed): The OLD code had a bug in ReadTransformBlocks where the remaining variable was never decremented in the loop. This caused incorrect behavior when reading encrypted streams across multiple blocks. The NEW code correctly tracks and decrements remaining.

  2. ZipFilePart Zip64 Boundary Bug (Fixed): The OLD code incorrectly treated Zip64 entries as unbounded, but Zip64 entries actually have a known size. The NEW code properly bounds them with ReadOnlySubStream.

Files Reviewed (8 files)

  • src/SharpCompress/Common/Zip/WinzipAesCryptoStream.Async.cs - Bug fix for loop counter
  • src/SharpCompress/Common/Zip/WinzipAesCryptoStream.cs - Refactored to use _counterOutOffset pattern
  • src/SharpCompress/Common/Zip/ZipFilePart.Async.cs - Minor changes
  • src/SharpCompress/Common/Zip/ZipFilePart.cs - Bug fix for Zip64 boundary handling
  • tests/SharpCompress.Test/Streams/WinzipAesCryptoStreamTests.cs - New comprehensive tests
  • tests/SharpCompress.Test/Zip/ZipArchiveTests.cs - New tests for mixed archives and Zip64
  • tests/SharpCompress.Test/Zip/ZipFilePartTests.cs - New tests for boundary conditions
  • tests/TestArchives/Archives/Zip.zstd.WinzipAES.mixed.zip - New test archive

Verification

  • The existing inline comments about the remaining parameter bug and unused using statement refer to issues in the OLD code that have been FIXED by this PR
  • The code changes are consistent between sync and async versions
  • Test coverage is comprehensive and validates both bug fixes
  • Changes follow existing code patterns in the codebase

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

adamhathcock and others added 2 commits March 11, 2026 11:53
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@fsdsabel
Copy link
Author

Is there anything I can do to bring this PR forward? I just fixed a build error that I somehow missed. However there were performance test issues that I can't really relate to my code changes?

@adamhathcock
Copy link
Owner

I'll review tomorrow again. Do you need a 0.47.1 release to cover this?

@fsdsabel
Copy link
Author

Yes, a 0.47.1 would be really helpful. Otherwise we would need to fork the project and roll our custom nugets. I'm not a fan of that ;) Thanks again for your efforts!

@adamhathcock
Copy link
Owner

Fixed in #1253

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants