DotNet 7 Regression when reading large Zip files #77159

fitdev · 2022-10-18T10:01:33Z

Description

When trying to read ZipArchiveEntrys from an 8GB zip file that contains ~8K ~1MB files I start getting A local file header is corrupt. exception. This happens only once I get passed the "~4GB" mark (i.e. those entries that are located within first 4GB are all read just fine, but all the rest are all unreadable).

After some research, I realized that very likely culprit is in src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs, likely this PR (commit). I traced the issue by stepping through in VS to this place:

runtime/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs

Lines 768 to 773 in 10fc8ae

    
           _archive.ArchiveStream.Seek(_offsetOfLocalHeader, SeekOrigin.Begin); 
        
           if (!ZipLocalFileHeader.TrySkipBlock(_archive.ArchiveReader)) 
        
           { 
        
               message = SR.LocalFileHeaderCorrupt; 
        
               return false; 
        
           }

_offsetOfLocalHeader is set to 0xffffffff and is being passed directly into the reader which tries to read the data, which is obviously wrong. Instead it should be using the offset from the Zip64 extended info.

I have verified using multiple tools like 7zip that the zip file is completely valid. I have also repeatedly reproed the issue by producing various >4GB zip files in various tools - all with the same result: entries that are beyond the 4GB offset are not readable and all return A local file header is corrupt. exception.

Reproduction Steps

Just try calling ZipArchiveEntry.Open() on an entry in a zip archive which is located beyond the 4GB mark (in the archive file).

Expected behavior

As it was in DotNet 6 (which is the last version I used where it worked). Should be able to read entries in a zip archive beyond the 4 GB mark.

Actual behavior

None of the zip entries beyond the 4GB mark can be opened.

Regression?

Yes, tried it on RC1, but looks like it was introduced at least in Preview 5, possibly earlier. And by the looks of it present in RC2 since the affected files have not changed for RC2.

Known Workarounds

Use other library. (Not really a workaround though).

Configuration

.NET SDK:
 Version:   7.0.100-rc.1.22431.12
 Commit:    f1cf61e1c0

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.17134
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\7.0.100-rc.1.22431.12\

Host:
  Version:      7.0.0-rc.1.22426.10
  Architecture: x64
  Commit:       06aceb7015

.NET SDKs installed:
  5.0.400-preview.21328.4 [C:\Program Files\dotnet\sdk]
  6.0.100-preview.5.21302.13 [C:\Program Files\dotnet\sdk]
  6.0.400 [C:\Program Files\dotnet\sdk]
  7.0.100-rc.1.22431.12 [C:\Program Files\dotnet\sdk]

Other information

Hopefully either @danmoseley or @adamsitnik can take a look ASAP. This is a serious regression that prevents a whole lot of valid scenarios!

The text was updated successfully, but these errors were encountered:

ghost · 2022-10-18T10:01:48Z

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When trying to read ZipArchiveEntrys from an 8GB zip file that contains ~8K ~1MB files I start getting A local file header is corrupt. exception. This happens only once I get passed the "~4GB" mark (i.e. those entries that are located within first 4GB are all read just fine, but all the rest are all unreadable).

After some research, I realized that very likely culprit is in src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs, likely this PR (commit). I traced the issue by stepping through in VS to this place:

runtime/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs

Lines 768 to 773 in 10fc8ae

    
           _archive.ArchiveStream.Seek(_offsetOfLocalHeader, SeekOrigin.Begin); 
        
           if (!ZipLocalFileHeader.TrySkipBlock(_archive.ArchiveReader)) 
        
           { 
        
               message = SR.LocalFileHeaderCorrupt; 
        
               return false; 
        
           }

_offsetOfLocalHeader is set to 0xffffffff and is being passed directly into the reader which tries to read the data, which is obviously wrong. Instead it should be using the offset from the Zip64 extended info.

I have verified using multiple tools like 7zip that the zip file is completely valid. I have also repeatedly reproed the issue by producing various >4GB zip files in various tools - all with the same result: entries that are beyond the 4GB offset are not readable and all return A local file header is corrupt. exception.

Reproduction Steps

Just try calling ZipArchiveEntry.Open() on an entry in a zip archive which is located beyond the 4GB mark (in the archive file).

Expected behavior

As it was in DotNet 6 (which is the last version I used where it worked). Should be able to read entries in a zip archive beyond the 4 GB mark.

Actual behavior

None of the zip entries beyond the 4GB mark can be opened.

Regression?

Yes, tried it on RC1, but looks like it was introduced at least in Preview 5, possibly earlier. And by the looks of it present in RC2 since the affected files have not changed for RC2.

Known Workarounds

Use other library. (Not really a workaround though).

Configuration

.NET SDK:
 Version:   7.0.100-rc.1.22431.12
 Commit:    f1cf61e1c0

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.17134
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\7.0.100-rc.1.22431.12\

Host:
  Version:      7.0.0-rc.1.22426.10
  Architecture: x64
  Commit:       06aceb7015

.NET SDKs installed:
  5.0.400-preview.21328.4 [C:\Program Files\dotnet\sdk]
  6.0.100-preview.5.21302.13 [C:\Program Files\dotnet\sdk]
  6.0.400 [C:\Program Files\dotnet\sdk]
  7.0.100-rc.1.22431.12 [C:\Program Files\dotnet\sdk]

Other information

Hopefully either @danmoseley or @adamsitnik can take a look ASAP. This is a serious regression that prevents a whole lot of valid scenarios!

Author:	fitdev
Assignees:	-
Labels:	`area-System.IO.Compression`, `untriaged`
Milestone:	-

danmoseley · 2022-10-18T16:12:09Z

It's not related, but I notice I had a typo in one of the original tests here. Should be uncompressed https://github.com/dotnet/runtime/pull/68106/files#diff-ea21d1af009443a658bc821a6eb47860f7887ff70155a68f11990ec64875a2dcR860

fitdev · 2022-10-28T07:12:44Z

Thank you for looking into this and acting quickly. I noticed the PR is still not merged. Is there a chance it will nonetheless make it into RTM of DotNet 7? This is a serious blocking issue.

adamsitnik · 2022-10-28T17:30:48Z

Thank you for looking into this and acting quickly. I noticed the PR is still not merged. Is there a chance it will nonetheless make it into RTM of DotNet 7? This is a serious blocking issue.

It just got merged, we are going to backport it to 7.0: #77605. I am expecting that it will be included in the first servicing update in January (cc @jeffhandley)

Once again thank you for trying our RC builds and reporting the issue!

jeffhandley · 2022-10-28T17:32:54Z

in the first servicing update in January

Minor correction: The first servicing update is in December. This backport will be considered in Tactics next Tuesday.

ghost added the untriaged New issue has not been triaged by the area owner label Oct 18, 2022

dotnet-issue-labeler bot added the area-System.IO.Compression label Oct 18, 2022

adamsitnik added this to the 7.0.0 milestone Oct 18, 2022

ghost removed the untriaged New issue has not been triaged by the area owner label Oct 18, 2022

adamsitnik added the regression-from-last-release label Oct 18, 2022

adamsitnik self-assigned this Oct 18, 2022

adamsitnik mentioned this issue Oct 18, 2022

Fix unzipping 4GB+ zip files #77181

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 18, 2022

adamsitnik closed this as completed in #77181 Oct 28, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 28, 2022

adamsitnik mentioned this issue Oct 28, 2022

[release/7.0] Fix unzipping 4GB+ zip files #77605

Merged

ghost locked as resolved and limited conversation to collaborators Nov 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DotNet 7 Regression when reading large Zip files #77159

DotNet 7 Regression when reading large Zip files #77159

fitdev commented Oct 18, 2022

ghost commented Oct 18, 2022

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

danmoseley commented Oct 18, 2022

fitdev commented Oct 28, 2022

adamsitnik commented Oct 28, 2022

jeffhandley commented Oct 28, 2022

DotNet 7 Regression when reading large Zip files #77159

DotNet 7 Regression when reading large Zip files #77159

Comments

fitdev commented Oct 18, 2022

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

ghost commented Oct 18, 2022

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

danmoseley commented Oct 18, 2022

fitdev commented Oct 28, 2022

adamsitnik commented Oct 28, 2022

jeffhandley commented Oct 28, 2022