Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DotNet 7 Regression when reading large Zip files #77159

Closed
fitdev opened this issue Oct 18, 2022 · 5 comments · Fixed by #77181
Closed

DotNet 7 Regression when reading large Zip files #77159

fitdev opened this issue Oct 18, 2022 · 5 comments · Fixed by #77181

Comments

@fitdev
Copy link

fitdev commented Oct 18, 2022

Description

When trying to read ZipArchiveEntrys from an 8GB zip file that contains ~8K ~1MB files I start getting A local file header is corrupt. exception. This happens only once I get passed the "~4GB" mark (i.e. those entries that are located within first 4GB are all read just fine, but all the rest are all unreadable).

After some research, I realized that very likely culprit is in src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs, likely this PR (commit). I traced the issue by stepping through in VS to this place:

_archive.ArchiveStream.Seek(_offsetOfLocalHeader, SeekOrigin.Begin);
if (!ZipLocalFileHeader.TrySkipBlock(_archive.ArchiveReader))
{
message = SR.LocalFileHeaderCorrupt;
return false;
}

_offsetOfLocalHeader is set to 0xffffffff and is being passed directly into the reader which tries to read the data, which is obviously wrong. Instead it should be using the offset from the Zip64 extended info.

I have verified using multiple tools like 7zip that the zip file is completely valid. I have also repeatedly reproed the issue by producing various >4GB zip files in various tools - all with the same result: entries that are beyond the 4GB offset are not readable and all return A local file header is corrupt. exception.

Reproduction Steps

Just try calling ZipArchiveEntry.Open() on an entry in a zip archive which is located beyond the 4GB mark (in the archive file).

Expected behavior

As it was in DotNet 6 (which is the last version I used where it worked). Should be able to read entries in a zip archive beyond the 4 GB mark.

Actual behavior

None of the zip entries beyond the 4GB mark can be opened.

Regression?

Yes, tried it on RC1, but looks like it was introduced at least in Preview 5, possibly earlier. And by the looks of it present in RC2 since the affected files have not changed for RC2.

Known Workarounds

Use other library. (Not really a workaround though).

Configuration

.NET SDK:
 Version:   7.0.100-rc.1.22431.12
 Commit:    f1cf61e1c0

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.17134
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\7.0.100-rc.1.22431.12\

Host:
  Version:      7.0.0-rc.1.22426.10
  Architecture: x64
  Commit:       06aceb7015

.NET SDKs installed:
  5.0.400-preview.21328.4 [C:\Program Files\dotnet\sdk]
  6.0.100-preview.5.21302.13 [C:\Program Files\dotnet\sdk]
  6.0.400 [C:\Program Files\dotnet\sdk]
  7.0.100-rc.1.22431.12 [C:\Program Files\dotnet\sdk]

Other information

Hopefully either @danmoseley or @adamsitnik can take a look ASAP. This is a serious regression that prevents a whole lot of valid scenarios!

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 18, 2022
@ghost
Copy link

ghost commented Oct 18, 2022

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When trying to read ZipArchiveEntrys from an 8GB zip file that contains ~8K ~1MB files I start getting A local file header is corrupt. exception. This happens only once I get passed the "~4GB" mark (i.e. those entries that are located within first 4GB are all read just fine, but all the rest are all unreadable).

After some research, I realized that very likely culprit is in src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs, likely this PR (commit). I traced the issue by stepping through in VS to this place:

_archive.ArchiveStream.Seek(_offsetOfLocalHeader, SeekOrigin.Begin);
if (!ZipLocalFileHeader.TrySkipBlock(_archive.ArchiveReader))
{
message = SR.LocalFileHeaderCorrupt;
return false;
}

_offsetOfLocalHeader is set to 0xffffffff and is being passed directly into the reader which tries to read the data, which is obviously wrong. Instead it should be using the offset from the Zip64 extended info.

I have verified using multiple tools like 7zip that the zip file is completely valid. I have also repeatedly reproed the issue by producing various >4GB zip files in various tools - all with the same result: entries that are beyond the 4GB offset are not readable and all return A local file header is corrupt. exception.

Reproduction Steps

Just try calling ZipArchiveEntry.Open() on an entry in a zip archive which is located beyond the 4GB mark (in the archive file).

Expected behavior

As it was in DotNet 6 (which is the last version I used where it worked). Should be able to read entries in a zip archive beyond the 4 GB mark.

Actual behavior

None of the zip entries beyond the 4GB mark can be opened.

Regression?

Yes, tried it on RC1, but looks like it was introduced at least in Preview 5, possibly earlier. And by the looks of it present in RC2 since the affected files have not changed for RC2.

Known Workarounds

Use other library. (Not really a workaround though).

Configuration

.NET SDK:
 Version:   7.0.100-rc.1.22431.12
 Commit:    f1cf61e1c0

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.17134
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\7.0.100-rc.1.22431.12\

Host:
  Version:      7.0.0-rc.1.22426.10
  Architecture: x64
  Commit:       06aceb7015

.NET SDKs installed:
  5.0.400-preview.21328.4 [C:\Program Files\dotnet\sdk]
  6.0.100-preview.5.21302.13 [C:\Program Files\dotnet\sdk]
  6.0.400 [C:\Program Files\dotnet\sdk]
  7.0.100-rc.1.22431.12 [C:\Program Files\dotnet\sdk]

Other information

Hopefully either @danmoseley or @adamsitnik can take a look ASAP. This is a serious regression that prevents a whole lot of valid scenarios!

Author: fitdev
Assignees: -
Labels:

area-System.IO.Compression, untriaged

Milestone: -

@adamsitnik adamsitnik added this to the 7.0.0 milestone Oct 18, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Oct 18, 2022
@adamsitnik adamsitnik self-assigned this Oct 18, 2022
@danmoseley
Copy link
Member

It's not related, but I notice I had a typo in one of the original tests here. Should be uncompressed https://github.com/dotnet/runtime/pull/68106/files#diff-ea21d1af009443a658bc821a6eb47860f7887ff70155a68f11990ec64875a2dcR860

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 18, 2022
@fitdev
Copy link
Author

fitdev commented Oct 28, 2022

Thank you for looking into this and acting quickly. I noticed the PR is still not merged. Is there a chance it will nonetheless make it into RTM of DotNet 7? This is a serious blocking issue.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 28, 2022
@adamsitnik
Copy link
Member

Thank you for looking into this and acting quickly. I noticed the PR is still not merged. Is there a chance it will nonetheless make it into RTM of DotNet 7? This is a serious blocking issue.

It just got merged, we are going to backport it to 7.0: #77605. I am expecting that it will be included in the first servicing update in January (cc @jeffhandley)

Once again thank you for trying our RC builds and reporting the issue!

@jeffhandley
Copy link
Member

in the first servicing update in January

Minor correction: The first servicing update is in December. This backport will be considered in Tactics next Tuesday.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants