Fix ZIP64 stream bounding and WinZip AES read-state corruption in ZIP reader#1247
Fix ZIP64 stream bounding and WinZip AES read-state corruption in ZIP reader#1247fsdsabel wants to merge 5 commits intoadamhathcock:masterfrom
Conversation
|
Thanks for this. This is also a bit beyond me so I'm relying on AI and tests to help out. It smells right though |
There was a problem hiding this comment.
Pull request overview
This PR fixes two independent ZIP reader issues in SharpCompress:
-
ZIP64 stream bounding: Previously, ZIP64 entries were treated as unbounded streams (like data-descriptor entries with unknown size), allowing decompressors to read past entry boundaries into subsequent ZIP structures. The fix ensures only entries that truly have unknown size (post-data descriptor with
CompressedSize == 0) are unbounded; ZIP64 entries with known compressed sizes are now properly bounded withReadOnlySubStream. -
WinZip AES CTR state preservation: The
WinzipAesCryptoStreamCTR mode implementation discarded partially consumed keystream bytes between reads. The fix introduces_counterOutOffsetto track the position within the current 16-byte keystream block, ensuring correct byte-accurate CTR continuation across arbitrary read sizes.
Changes:
- Removed
|| Header.IsZip64from the stream-bounding condition in both sync and asyncGetCryptoStreammethods - Replaced the per-block
ReadTransformOneBlock/_isFinalBlockpattern inWinzipAesCryptoStreamwith offset-tracked CTR keystream handling - Added comprehensive unit tests for both fixes
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/SharpCompress/Common/Zip/ZipFilePart.cs |
Removes ` |
src/SharpCompress/Common/Zip/ZipFilePart.Async.cs |
Same fix for async path |
src/SharpCompress/Common/Zip/WinzipAesCryptoStream.cs |
Replaces _isFinalBlock with _counterOutOffset tracking for correct CTR state across reads (sync byte[] path) |
src/SharpCompress/Common/Zip/WinzipAesCryptoStream.Async.cs |
Same CTR fix for async Span<byte> path |
tests/SharpCompress.Test/Zip/ZipFilePartTests.cs |
New unit tests for stream-bounding logic |
tests/SharpCompress.Test/Zip/ZipArchiveTests.cs |
Integration tests for mixed Zstandard/AES archives and generated ZIP64 archives |
tests/SharpCompress.Test/Streams/WinzipAesCryptoStreamTests.cs |
Unit tests for CTR state preservation across aligned and non-aligned reads |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Good catch. The async
I fixed this by restructuring |
Code Review SummaryStatus: No Issues Found | Recommendation: Merge OverviewThis PR fixes two significant bugs in the WinZip AES crypto stream handling:
Files Reviewed (8 files)
Verification
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Is there anything I can do to bring this PR forward? I just fixed a build error that I somehow missed. However there were performance test issues that I can't really relate to my code changes? |
|
I'll review tomorrow again. Do you need a 0.47.1 release to cover this? |
|
Yes, a 0.47.1 would be really helpful. Otherwise we would need to fork the project and roll our custom nugets. I'm not a fan of that ;) Thanks again for your efforts! |
|
Fixed in #1253 |
Summary
One thing upfront: This PR has been created with extensive help from Codex. I reviewed the code to the best of my knowledge but I'm in no way an expert in ZIP file handling. The code looks plausible to me though. I tried to keep the changes aligned to the repo standards. And from one dev to the other: Thank you for your work!
This change fixes two independent ZIP reader issues in SharpCompress that affect interoperability with ZIP archives using Zstandard compression, ZIP64, and WinZip AES.
The failures were reproduced with:
Problem
1. ZIP64 entries could over-read into the next ZIP structures
When a ZIP entry was marked as ZIP64, SharpCompress did not always bound the input stream to the entry's compressed size before passing it to the decompressor.
As a result, a decompressor such as Zstandard could continue reading past the end of the current entry and into the next local header or central directory bytes. In practice this could surface as errors like:
SharpCompress.Compressors.ZStandard.ZstdException: Unknown frame descriptorThe issue was not the entry start position. The compressed payload began at the correct offset. The problem was that the stream exposed more than the current entry.
2. WinZip AES decryption could corrupt payloads across non-aligned reads
WinzipAesCryptoStreamimplements CTR-style decryption using 16-byte counter blocks.The previous implementation did not preserve partially consumed keystream bytes across multiple
Read/ReadAsynccalls. If a read ended in the middle of a 16-byte counter block, the remaining keystream bytes were discarded and the next read advanced to the next counter block too early.That breaks CTR state continuity and corrupts the remainder of the decrypted payload. With Zstandard-compressed encrypted entries this could surface as:
SharpCompress.Compressors.ZStandard.ZstdException: Data corruption detectedRoot Cause
ZIP64 stream bounding
In
ZipFilePart.GetCryptoStreamandGetCryptoStreamAsync, ZIP64 entries were treated as if they required an unbounded stream. That skipped theReadOnlySubStreamwrapper even when the compressed size was known.For sized entries, including ZIP64 entries, that behavior is incorrect.
AES CTR state handling
In
WinzipAesCryptoStream, the current counter block was effectively treated as fully consumed at the end of each read operation, even if only part of its keystream had actually been used.CTR mode requires byte-accurate continuation within the current counter block until all 16 bytes are consumed.
Fix
1. Bound all entries with a known compressed size
SharpCompress now keeps the stream unbounded only for the real special case:
ZIP64 no longer disables stream bounding by itself. If the compressed size is known, the entry stream is wrapped in
ReadOnlySubStream, regardless of whether the entry uses ZIP64.This prevents decompression code from reading into the next ZIP structures.
2. Preserve unused CTR keystream bytes across reads
WinzipAesCryptoStreamnow tracks the offset within the current 16-byte keystream block.Instead of generating a new counter block for each transform step, it now:
This makes decryption correct for arbitrary sync and async read sizes.
Tests
Streams/WinzipAesCryptoStreamTestsAdded stream-level tests for:
These tests cover the generic AES stream behavior and specifically protect the CTR state handling that caused the corruption.
Zip/ZipFilePartTestsAdded targeted tests for
ZipFilePartstream construction:ReadOnlySubStreamThese tests cover the stream-shaping logic that controls whether decompression sees exactly one entry or the rest of the archive.
Zip/ZipArchiveTestsAdded integration coverage for two scenarios:
The mixed archive fixture uses the same test password convention as the existing ZIP tests.
The large ZIP64 test is marked as explicit because it writes a real archive larger than 4 GiB to disk and is not intended for the default test run.
Validation
The fix was validated in several ways:
WinzipAesCryptoStreamandZipFilePartZipWriterand read back throughZipArchiveAfter the fix:
Impact
This improves ZIP interoperability for archives that combine:
The change is internal to stream construction and AES read-state handling and does not alter the public API.