Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ported interleaving support to System.IO.Packaging #97898

Merged
merged 12 commits into from
Jul 16, 2024

Conversation

edwardneal
Copy link
Contributor

Contributes to (and appears to fix) #51929.
Also contributes to (and appears to fix) dotnet/wpf#3546.

This was previously part of PR #78374, (originally by @KevinCathcart 's work) which was closed due to poor test coverage in System.IO.Packaging. I've added a set of tests for my changes, (although there's some overlap with normal functionality) and corrected a few deviations.

To try to keep the internal state of ZipPackage manageable, I've introduced a behaviour change: users can no longer try to create an atomic part (i.e. a part backed by a single-ZipArchiveEntry) with a URI which belongs to an interleaved part; this means that trying to pass a URI of /[Content_Types].xml/[0].piece to Package.CreatePart will fail. I don't think this'll actually have an impact though - it's a violation of the OPC spec, and would have resulted in inconsistent behaviour anyway.

The OPC spec also states that the piece number should be a single digit. ZipPackage is more forgiving in this PR, for two reasons:

  1. Compatibility with the .NET Framework (which has the same behaviour)
  2. The XPS Document Writer (and presumably, other applications) generates packages which have many pieces. I tested with one of the files in this comment and found parts with thousands of pieces

I've made a few changes to KevinCathcart's original code, but most of them are stylistic. The only key differences are in the InterleavedZipPackagePartStream - it would try to seek the underlying streams, and calling SetLength didn't always work because ZipArchiveEntry had a stream left open.

This is essentially KevinCathcart's work, with a test suite and changes required to ensure that these tests pass.
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Feb 2, 2024
@ghost
Copy link

ghost commented Feb 2, 2024

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to (and appears to fix) #51929.
Also contributes to (and appears to fix) dotnet/wpf#3546.

This was previously part of PR #78374, (originally by @KevinCathcart 's work) which was closed due to poor test coverage in System.IO.Packaging. I've added a set of tests for my changes, (although there's some overlap with normal functionality) and corrected a few deviations.

To try to keep the internal state of ZipPackage manageable, I've introduced a behaviour change: users can no longer try to create an atomic part (i.e. a part backed by a single-ZipArchiveEntry) with a URI which belongs to an interleaved part; this means that trying to pass a URI of /[Content_Types].xml/[0].piece to Package.CreatePart will fail. I don't think this'll actually have an impact though - it's a violation of the OPC spec, and would have resulted in inconsistent behaviour anyway.

The OPC spec also states that the piece number should be a single digit. ZipPackage is more forgiving in this PR, for two reasons:

  1. Compatibility with the .NET Framework (which has the same behaviour)
  2. The XPS Document Writer (and presumably, other applications) generates packages which have many pieces. I tested with one of the files in this comment and found parts with thousands of pieces

I've made a few changes to KevinCathcart's original code, but most of them are stylistic. The only key differences are in the InterleavedZipPackagePartStream - it would try to seek the underlying streams, and calling SetLength didn't always work because ZipArchiveEntry had a stream left open.

Author: edwardneal
Assignees: -
Labels:

area-System.IO, community-contribution

Milestone: -

Also removed other unused "using" statements.
Removed use of string.CompareOrdinal when processing part pieces.
Replaced byte allocations with ArrayPool Rent/Returns.
Made InterleavedZipPackagePartStream use Spans instead of byte arrays. This has increased memory usage when .NET Framework calls Read(byte[], int, int) and Write(byte[], int, int).
Split Read and ReadCore apart, so that a new Read(Span<byte>) method doesn't appear on the .NET Standard 2.0 path. Also adjusted the conditions from "NETCOREAPP2_1_OR_GREATER" to "!NETFRAMEWORK && !NETSTANDARD2_0", to align with the rest of .NET.
…poses

I can't see a simple way to do this in the Write() path - we rent and return across some loop iterations
@KevinCathcart
Copy link
Contributor

I should probably mention that my work here was mostly just porting back the code from .NET Framework. While essentially all of that code got contributed under MIT between the WPF repository, and this repository, portions were ifdefed out, and some was modified in the initial commits, so I did need to reference the actual .NET Framework sources in a few spots (via the referencesource site, not the MIT reference source repo, which lacked needed libraries), to verify what the code originally looked like. I feel like anything taken form there is likely too small to matter, and probably was authorized to be relicensed anyway. Presumably the changes in the initial commits vs framework were likely not for licensing reasons, but stemmed from a desire to have the initially checked in code able to actually compile.

Of course there was some original work to get things to fit on top of ZipStream instead of MS.Internal.IO.Zip, and any original contributions of mine there are covered under MIT, and my CLA with the .NET Foundation.

@wstaelens
Copy link

@huoyaoyuan can you review please?

@huoyaoyuan
Copy link
Member

I'm not area owner or experienced about System.IO.Packaging. I can only provide review for general practices. Please wait for the area owners.

@edwardneal
Copy link
Contributor Author

Thanks - I agree with your comments. The Read logic makes sense, not quite sure what the thought process was there! I'll take a look this evening and resolve them.

No worries with the PR timing. It's a little larger than I'd normally want to submit, it's a feature implementation rather than a bugfix and I imagine the preview release has taken some of the teams' bandwidth, so I appreciate it'll take longer to review.

@danmoseley
Copy link
Member

@richlander the code on referencesource - is it all under MIT like the reference source repo is?

* Corrections to comments and spacing.
* Switched to the use of an EndOfStreamException in the correct situation.
* Adjusted the SeekUnderlyingPieceStream method to slightly tweak the mechanism and to make use of a readBuffer which might be larger than BufferSize.
* Small refactor of part piece deletion.
@edwardneal
Copy link
Contributor Author

The post-code review changes are completed now, thanks @huoyaoyuan. I'm not convinced that the buffer length calculation in SeekUnderlyingPieceStream was incorrect, but it was definitely unclear and it wouldn't always use the entire buffer (if ArrayPool.Rent returned an array which was larger than expected.) I've changed it to make it a little clearer.

@wstaelens
Copy link

wstaelens commented Feb 20, 2024

The post-code review changes are completed now, thanks @huoyaoyuan. I'm not convinced that the buffer length calculation in SeekUnderlyingPieceStream was incorrect, but it was definitely unclear and it wouldn't always use the entire buffer (if ArrayPool.Rent returned an array which was larger than expected.) I've changed it to make it a little clearer.

https://github.com/dotnet/runtime/blob/e9218122ae717a393198be04c611061a14f6ee88/src/libraries/System.IO.Packaging/src/System/IO/Packaging/PackUriHelper.cs#L81#L83

shouldn't this (and others in the class) be https://learn.microsoft.com/en-us/dotnet/api/system.argumentnullexception.throwifnull?view=net-7.0 or is it because of compatibility with .net standard?

https://github.com/dotnet/runtime/blob/e9218122ae717a393198be04c611061a14f6ee88/src/libraries/System.IO.Packaging/src/System/IO/Packaging/PackUriHelper.cs#L142#L144
pattern match to avoid casting?

@edwardneal
Copy link
Contributor Author

That's correct, yes - the code needs to compile under .NET Standard 2.0, which excludes that particular helper method.

A pattern match would definitely look a bit better, but it wouldn't avoid a cast. This:

if (!(partUri is ValidatedPartUri validatedUri))
    Console.WriteLine("Hello, World!");

will actually compile down to:

ValidatedPartUri validatedUri = partUri as ValidatedPartUri;
if (validatedUri != null)
    Console.WriteLine("Hello, World!");

Eliminating the second cast could be helpful though; I'll make the changes tomorrow morning to incorporate any intermediary changes.

@edwardneal
Copy link
Contributor Author

I think all of the comments from the initial code review have been completed. Is anything preventing this going to the @dotnet/area-system-io area owners for review besides a response from @richlander to danmoseley's question about licensing for the .NET Reference Source?

It's a larger PR, so I expect it'll take a while to review. As far as I know though, there's nothing more that I need to do to enable it to enter that particular queue. Once it's reviewed, I'll follow through with any requested changes as normal.

@danmoseley
Copy link
Member

Maybe @terrajobst can answer the license question, but I doubt it's a problem

@wstaelens
Copy link

@edwardneal what is the status of this? Will it be in .net 9 (and maybe backported?)

@edwardneal
Copy link
Contributor Author

I've not got a status update to share unfortunately. I'd be glad to see this in .NET 9, but it's currently waiting on code review from the System.IO area owners.

A .NET 6.0/7.0/8.0 project could still reference the updated System.IO.Packaging NuGet package; as a result, the older project should use the .NET Standard 2.0 build within the package and be able to use the fixed functionality. This'll hopefully do what you're looking for, but if it doesn't and the .NET team approve the PR for backport, I'm happy enough to help where needed.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

This was only used to reference ZipPackagePartPiece.TryParse and .Index, so switched the tests to use reflection.
Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your change, @edwardneal . I left some comments for you to consider. The one I care the most about is making sure we are able to handle unseekable streams properly. I left a comment in the test file with a suggestion.

* Using exception helpers for .NET
* Removing #regions
* Fixing nullability in references to ZipPackagePartPiece.PieceDescriptors
* Spacing/spelling/grammar changes to comments
Replaced references to NETFRAMEWORK and NETSTANDARD2_0 with NET and !NET as appropriate.
@richlander
Copy link
Member

When is the P7 snap @carlossanlop? We should try to get this one in.

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready to merge. P7 snap is next week. This will be part of it.

Thanks so much @edwardneal for working on this, and also thank you for your patience.

Edit: I was just waiting for the CI re-runs to pass, there was a ton of noise.

@carlossanlop carlossanlop merged commit f49add9 into dotnet:main Jul 16, 2024
73 of 83 checks passed
@danmoseley
Copy link
Member

For the folks tracking this -- it would be great if you could get the preview 7 build as soon as its out (or any daily build that has this in) and validate that you're getting good results.

@ericstj
Copy link
Member

ericstj commented Jul 22, 2024

You can find the latest package with this fix here: https://dev.azure.com/dnceng/public/_artifacts/feed/dotnet9/NuGet/System.IO.Packaging/overview/9.0.0-preview.7.24371.6

NuGet.Config

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <add key="dotnet9" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet9/nuget/v3/index.json" />
  </packageSources>
</configuration>

Package reference:

<PackageReference Include="System.IO.Packaging" Version="9.0.0-preview.7.24371.6" />

Please let us know if it works for you!

@edwardneal
Copy link
Contributor Author

Sorry for the delay here, but I've successfully tested with a known-interleaved XPS document from @wstaelens original issue dotnet/wpf#3546 - the file in 02 from this comment.

Sample code:

using System.IO.Packaging;

const string FileName = "02.xps";
const string InterleavedPart = "/FixedDocumentSequence.fdseq";

using var package = Package.Open(FileName);
var parts = package.GetParts();
var namedPart = package.GetPart(new Uri(InterleavedPart, UriKind.Relative));

Console.WriteLine($"Loaded file {Path.GetFileName(FileName)}");
Console.WriteLine($"Parts: {parts.Count()}");
Console.WriteLine($"Found named part \"{InterleavedPart}\": {namedPart is not null}");

if (namedPart is not null)
{
    using var partStream = namedPart.GetStream(FileMode.Open, FileAccess.Read);
    using var reader = new StreamReader(partStream);

    Console.WriteLine("Contents of named part:");
    Console.WriteLine(reader.ReadToEnd());
}

Thanks to everyone for your reviews, and for pushing the merge through @carlossanlop.

@edwardneal edwardneal deleted the issue-51929 branch July 24, 2024 05:42
@wstaelens
Copy link

Currently on holiday without pc access and limited internet access.
I could have tested the same code as in my original ticket as @edwardneal did to see if that worked.

(Sent already to my colleagues but did not hear from them yet)

@danmoseley
Copy link
Member

Sorry for the delay here, but I've successfully tested with a known-interleaved XPS document from @wstaelens original issue dotnet/wpf#3546 - the file in 02 from dotnet/wpf#3546 (comment).

I want to add a woohoo here, I moved to ASP.NET since this issue was opened but really happy to see it happen, and kudos again for the hard work and contributions of everyone here.

cc @pchaurasia14

@github-actions github-actions bot locked and limited conversation to collaborators Aug 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO.Compression community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.