Fix unzipping 4GB+ zip files #77181

adamsitnik · 2022-10-18T18:20:24Z

fixes #77159 without reverting #68106

I've tried to keep the changes as small as possible to make it easier to review and backport.

Explanation: for a file that is over 4 GB after zipping, some of the entries were reporting extraField.Size == 8, but readUncompressedSize, readCompressedSize and readStartDiskNumber equal false and readLocalHeaderOffset equal true. So what the code was doing, was reading first int64, not using it and exiting:

runtime/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs

Lines 202 to 207 in f7e6cf1

    
           long value64 = reader.ReadInt64(); 
        
           if (readUncompressedSize) 
        
               zip64Block._uncompressedSize = value64; 
        
           if (ms.Position > extraField.Size - sizeof(long)) 
        
               return true;

The fix is to allow advancing the stream (by reading from it) only when all 4 fields are provided (extraField.Size == 28) or when they are explicitly requested (the boolean flags set to true)

…T of time to execute it

ghost · 2022-10-18T18:20:35Z

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

fixes #77159 without reverting #68106

Author:	adamsitnik
Assignees:	-
Labels:	`area-System.IO.Compression`
Milestone:	-

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs

adamsitnik · 2022-10-19T11:59:47Z

/azp list

azure-pipelines · 2022-10-19T11:59:53Z

CI/CD Pipelines for this repository: runtime-coreclr outerloop runtime-coreclr jitstress runtime-coreclr jitstressregs runtime-coreclr jitstress2-jitstressregs runtime-coreclr gcstress0x3-gcstress0xc runtime-coreclr gcstress-extra runtime-coreclr r2r-extra runtime-coreclr jitstress-isas-x86 runtime-coreclr jitstress-isas-arm runtime-coreclr jitstressregs-x86 runtime-coreclr libraries-jitstressregs runtime-coreclr libraries-jitstress2-jitstressregs runtime-coreclr r2r runtime-coreclr runincontext runtime-coreclr crossgen2 runtime-libraries-coreclr outerloop runtime-libraries-coreclr outerloop-windows runtime-libraries-coreclr outerloop-linux runtime-libraries-coreclr outerloop-osx runtime runtime-libraries enterprise-linux runtime-libraries stress-http runtime-libraries stress-ssl runtime-dev-innerloop runtime-coreclr crossgen2 outerloop coreclr-release-outerloop-nightly runtime-coreclr crossgen2-composite runtime-jit-experimental runtime-coreclr libraries-jitstress dotnet-linker-tests runtime-coreclr ilasm runtime-coreclr crossgen2-composite gcstress runtime-staging runtime-coreclr pgo runtime-coreclr libraries-pgo Antigen runtime-community Fuzzlyn runtime-coreclr superpmi-replay runtime-wasm runtime-coreclr superpmi-diffs runtime-coreclr superpmi-asmdiffs-checked-release runtime-extra-platforms jit-cfg runtime-wasm-perf runtime-llvm runtime-coreclr jitstress-random runtime-coreclr libraries-jitstress-random runtime-android-grpc-client-tests runtime-wasm-libtests runtime-wasm-non-libtests runtime-android runtime-androidemulator runtime-ioslike runtime-ioslikesimulator runtime-linuxbionic runtime-maccatalyst

adamsitnik · 2022-10-19T12:00:38Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2022-10-19T12:00:51Z

Azure Pipelines successfully started running 1 pipeline(s).

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs

danmoseley · 2022-10-24T15:32:19Z

Seems reasonable but I'd rather someone on the IO crew sign off on the change.

BTW if you touch this PR again maybe you could fix this line which I believe should check uncompressed length.
https://github.com/dotnet/runtime/pull/68106/files#diff-ea21d1af009443a658bc821a6eb47860f7887ff70155a68f11990ec64875a2dcR860

adamsitnik · 2022-10-24T17:02:10Z

Seems reasonable but I'd rather someone on the IO crew sign off on the change.

@jozkee PTAL

jozkee · 2022-10-24T17:03:36Z

Taking a look...

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs

jozkee · 2022-10-24T17:48:16Z

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs

+        [OuterLoop("It requires almost 12 GB of free disk space")]
+        public static void UnzipOver4GBZipFile()
+        {
+            byte[] buffer = GC.AllocateUninitializedArray<byte>(1_000_000_000); // 1 GB


Technically, this is not 1Gb, but since you are creating 6 files, the test is still valid.

I would rather create two tests, one that succeeds at the limit (say 3.999 Gb) without the fix and one that fails and then verify that both pass with your fix.

Since this is urgent for a backport, we can defer that.

Technically, this is not 1Gb, but since you are creating 6 files, the test is still valid.

You mean that $1000^3$ is not 1 GB but $1024^3$ is?

I would rather create two tests, one that succeeds at the limit (say 3.999 Gb) without the fix and one that fails and then verify that both pass with your fix.

My typical workflow is to add test that fails first, then make it pass. We have plenty of tests that verify < 4 GB archives so I don't see the value in adding it (also it would require us to use even more disk space and take longer to execute).

In the long term we could add more tests, especially such that verify that decompress(compress(x)) == x (which this test avoids, as it was taking 10 minutes to do so on my beefy PC).

You mean that $1000^3$ is not 1 GB but $1024^3$ is?

Yes.

We have plenty of tests that verify < 4 GB archives so I don't see the value in adding it (also it would require us to use even more disk space and take longer to execute).

I meant to verify on the edge cases of the bug. In this case that would be 4Gb - 1, 4Gb, and 4Gb + 1; rather than a 6Gb case. But yes, that would multiply test execution time which is already quite large with this new test.

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs

jozkee · 2022-10-24T17:56:41Z

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs

+                // Advancing the stream (by reading from it) is possible only when:
+                // 1. There is an explicit ask to do that (valid files, corresponding boolean flag(s) set to true, #77159).
+                // 2. When the size indicates that all the information is available ("slightly invalid files", #49580).
+                bool readAllFields = extraField.Size >= sizeof(long) + sizeof(long) + sizeof(long) + sizeof(int);
+
+                long value64 = readUncompressedSize || readAllFields ? reader.ReadInt64() : -1;


Could this fail if only 2 or 3 fields are available for reading but the bool params are false?

i.e: what if extraField.size = 16 and readUncompressedSize = false and readCompressedSize = false?

I don't think that this is possible. @danmoseley has added following comment:

runtime/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs

Lines 190 to 197 in 50b65de

// The spec section 4.5.3:

// The order of the fields in the zip64 extended

// information record is fixed, but the fields MUST

// only appear if the corresponding Local or Central

// directory record field is set to 0xFFFF or 0xFFFFFFFF.

// However tools commonly write the fields anyway; the prevailing convention

// is to respect the size, but only actually use the values if their 32 bit

// values were all 0xFF.

My understanding is that we have two options:

Correct archives: the fields are provided when the directory record fields are set (the right booleans are set to true, the size is their total size)

"Slightly incorrect archives": the boolean fields are not set, but the provided size suggests that they are all present. Tests added in Fix reading slightly incorrect Zip files #68106 use such approach.

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs

adamsitnik · 2022-10-28T07:26:49Z

@jozkee @stephentoub @danmoseley ping

jozkee · 2022-10-28T14:39:37Z

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs

+                else if (readAllFields)
+                {
+                    _ = reader.ReadInt32();
+                }

                // original values are unsigned, so implies value is too big to fit in signed integer
                if (zip64Block._uncompressedSize < 0) throw new InvalidDataException(SR.FieldTooBigUncompressedSize);


These validations only occur if extraField.Size >= 28 since you are now returning if there's less than that.

Shouldn't we validate when reader.ReadInt* is called and/or if read of the field was requested?

This is how it always was, I would prefer not to do it now as I intend to backport it to 7.0 and trying to make it as defensive/safe as possible.

The reason I was asking is because it wasn't always like that. It was changed to how it is now on #68106, which is the PR that introduced the regression.

stephentoub · 2022-10-28T15:15:51Z

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs

+        [OuterLoop("It requires almost 12 GB of free disk space")]
+        public static void UnzipOver4GBZipFile()
+        {
+            byte[] buffer = GC.AllocateUninitializedArray<byte>(1_000_000_000); // 1 GB


Is there a reasonable chance this will OOM? If so, consider catching that and skipping the test, rather than failing it.

Is there a reasonable chance this will OOM?

It should be executed only in Outerloop runs for the 64 bit "fast runtimes" and not in parallel with other tests so I don't think it's needed (at least for now).

runtime/src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs

Line 13 in 4979dff

[ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.IsSpeedOptimized), nameof(PlatformDetection.Is64BitProcess))] // don't run it on slower runtimes

runtime/src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs

Line 10 in 4979dff

[Collection(nameof(DisableParallelization))]

runtime/src/libraries/Common/tests/TestUtilities/System/PlatformDetection.cs

Lines 105 to 106 in 98a89c9

public static bool IsSpeedOptimized => !IsSizeOptimized;

public static bool IsSizeOptimized => IsBrowser || IsAndroid || IsAppleMobile;

adamsitnik · 2022-10-28T17:18:28Z

/backport to release/7.0

github-actions · 2022-10-28T17:18:43Z

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3347261332

adamsitnik · 2022-10-28T17:36:36Z

@danmoseley @jozkee @stephentoub thank you for the reviews!

carlossanlop

Assuming the CI is green, this LGTM. I'd also wait for a sign-off from @jozkee.

Edit: Nevermind. This was merged while I was still adding a review. :)

carlossanlop · 2022-10-28T17:26:31Z

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs

+                }
+                else if (readAllFields)
+                {
+                    _ = reader.ReadInt64();


For my own education: is there a perf improvement when using _ = ReadInt64(); to discard the value, compared to just not using _? Or is the idea here to explicitly signal that we are ignoring the value on purpose?

It shouldn't have a runtime effect, it's self documenting, both for humans and also for linters/analyzers that might otherwise flag an unused method result.

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs

adamsitnik added 3 commits October 18, 2022 14:12

add a failing test

5a69a7c

fix

6ec07c7

extend the test to verify more, move it to outerloop as it takes a LO…

b5c3faf

…T of time to execute it

adamsitnik added the area-System.IO.Compression label Oct 18, 2022

adamsitnik requested review from danmoseley and jozkee October 18, 2022 18:20

ghost assigned adamsitnik Oct 18, 2022

iSazonov reviewed Oct 19, 2022

View reviewed changes

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs Outdated Show resolved Hide resolved

adamsitnik added 2 commits October 19, 2022 13:23

test fewer things, but way faster so the test can be run more frequently

ee2afaa

add a comment

3db8821

danmoseley reviewed Oct 20, 2022

View reviewed changes

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs Show resolved Hide resolved

danmoseley reviewed Oct 20, 2022

View reviewed changes

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs Outdated Show resolved Hide resolved

danmoseley reviewed Oct 20, 2022

View reviewed changes

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs Outdated Show resolved Hide resolved

address code review feedback

50b65de

adamsitnik requested a review from danmoseley October 24, 2022 11:25

build-analysis bot mentioned this pull request Oct 24, 2022

ComInterfaceGenerator.Unit.Tests.WorkItemExecution failing with Internal CLR error on Win x86 #77087

Closed

jozkee reviewed Oct 24, 2022

View reviewed changes

use temp directory for temporary test files

b34bafb

stephentoub reviewed Oct 25, 2022

View reviewed changes

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipBlocks.cs Outdated Show resolved Hide resolved

address code review feedback

4979dff

adamsitnik requested a review from stephentoub October 25, 2022 12:05

adamsitnik requested a review from jozkee October 25, 2022 12:05

jozkee reviewed Oct 28, 2022

View reviewed changes

stephentoub reviewed Oct 28, 2022

View reviewed changes

stephentoub approved these changes Oct 28, 2022

View reviewed changes

adamsitnik merged commit 399c6dc into dotnet:main Oct 28, 2022

github-actions bot mentioned this pull request Oct 28, 2022

[release/7.0] Fix unzipping 4GB+ zip files #77605

Merged

carlossanlop reviewed Oct 28, 2022

View reviewed changes

ghost locked as resolved and limited conversation to collaborators Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unzipping 4GB+ zip files #77181

Fix unzipping 4GB+ zip files #77181

adamsitnik commented Oct 18, 2022 •

edited

Loading

ghost commented Oct 18, 2022

adamsitnik commented Oct 19, 2022

azure-pipelines bot commented Oct 19, 2022

adamsitnik commented Oct 19, 2022

azure-pipelines bot commented Oct 19, 2022

danmoseley commented Oct 24, 2022

adamsitnik commented Oct 24, 2022

jozkee commented Oct 24, 2022

jozkee Oct 24, 2022

adamsitnik Oct 24, 2022

jozkee Oct 24, 2022

jozkee Oct 24, 2022

adamsitnik Oct 24, 2022

adamsitnik commented Oct 28, 2022

jozkee Oct 28, 2022

adamsitnik Oct 28, 2022

jozkee Oct 28, 2022

stephentoub Oct 28, 2022

adamsitnik Oct 28, 2022

adamsitnik commented Oct 28, 2022

github-actions bot commented Oct 28, 2022

adamsitnik commented Oct 28, 2022

carlossanlop left a comment •

edited

Loading

carlossanlop Oct 28, 2022

danmoseley Oct 28, 2022

	long value64 = reader.ReadInt64();
	if (readUncompressedSize)
	zip64Block._uncompressedSize = value64;

	if (ms.Position > extraField.Size - sizeof(long))
	return true;

	// The spec section 4.5.3:
	// The order of the fields in the zip64 extended
	// information record is fixed, but the fields MUST
	// only appear if the corresponding Local or Central
	// directory record field is set to 0xFFFF or 0xFFFFFFFF.
	// However tools commonly write the fields anyway; the prevailing convention
	// is to respect the size, but only actually use the values if their 32 bit
	// values were all 0xFF.

	public static bool IsSpeedOptimized => !IsSizeOptimized;
	public static bool IsSizeOptimized => IsBrowser \|\| IsAndroid \|\| IsAppleMobile;

Fix unzipping 4GB+ zip files #77181

Fix unzipping 4GB+ zip files #77181

Conversation

adamsitnik commented Oct 18, 2022 • edited Loading

ghost commented Oct 18, 2022

adamsitnik commented Oct 19, 2022

azure-pipelines bot commented Oct 19, 2022

adamsitnik commented Oct 19, 2022

azure-pipelines bot commented Oct 19, 2022

danmoseley commented Oct 24, 2022

adamsitnik commented Oct 24, 2022

jozkee commented Oct 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamsitnik commented Oct 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamsitnik commented Oct 28, 2022

github-actions bot commented Oct 28, 2022

adamsitnik commented Oct 28, 2022

carlossanlop left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamsitnik commented Oct 18, 2022 •

edited

Loading

carlossanlop left a comment •

edited

Loading