Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Comment in ZipArchive and ZipArchiveEntry #59442

Merged
merged 25 commits into from
Feb 4, 2022

Conversation

carlossanlop
Copy link
Member

@carlossanlop carlossanlop commented Sep 22, 2021

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Sep 22, 2021

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #1547

Author: carlossanlop
Assignees: carlossanlop
Labels:

area-System.IO.Compression

Milestone: 7.0.0

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlossanlop thank you for your contribution! PTAL at my comments

@carlossanlop carlossanlop force-pushed the zipComment branch 2 times, most recently from 57cfa5a to cdb57d5 Compare December 4, 2021 03:10
@carlossanlop
Copy link
Member Author

@adamsitnik I addressed your last comments.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @carlossanlop !

…long characters like emojis.

Simplify tests.
@ericstj
Copy link
Member

ericstj commented Jan 31, 2022

Is this one ready to merge?

@carlossanlop
Copy link
Member Author

Is this one ready to merge?

I'd like @jozkee to take a last look, since it was his comments which I addressed in my last commit.

Comment on lines 388 to 389
protected const string LowerCaseOUmlautChar = "\u00F6";
protected const string CopyrightChar = "\u00A9";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's special about these versus '1' or 'a'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed chars that cannot be represented in ASCII.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you express that in the name of the const or in a comment?

Comment on lines +413 to +416
foreach (object[] e in SharedComment_Data())
{
yield return e;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to the end of the method to avoid mixing what is being reused from what is unique to this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to yield the shared cases first.

Comment on lines 405 to 406
string asciiExpectedExactMaxLength = new('a', ushort.MaxValue);
string asciiOriginalOverMaxLength = asciiExpectedExactMaxLength + "aaa";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these prefixed as Ascii considering that the method name is Utf8Comment_Data?

Suggested change
string asciiExpectedExactMaxLength = new('a', ushort.MaxValue);
string asciiOriginalOverMaxLength = asciiExpectedExactMaxLength + "aaa";
string utf8ExpectedExactMaxLength = new('a', ushort.MaxValue);
string utf8OriginalOverMaxLength = asciiExpectedExactMaxLength + "aaa";

Copy link
Member Author

@carlossanlop carlossanlop Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's test data. This method return test cases for both ASCII and UTF8. Renaming these variables would be incorrect.

string asciiOriginalOverMaxLength = asciiExpectedExactMaxLength + "aaa";

// A smiley emoji code point consists of two characters,
// meaning the whole emoji should be fully truncated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a case where the emoji does not get truncated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case where the string is truncated is when it's length is greater than expected. The contents don't matter. So if a string is smaller than the max length, it should remain untouched, regardless of its contents.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Comment on lines +240 to +241
bytes = encoding.GetBytes(text);
return maxBytes < bytes.Length ? bytes[0..maxBytes] : bytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes[0..maxBytes] is an allocation of it's own, isn't it? If so, consider implementing ArrayPool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed before. We don't expect either the Comment setter or the EntryName setter to be called in a loop. That would be weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is being called in a foreach loop in your own tests.

Assert.Equal(expectedCreateComment, entry.Comment);

I don't know if enumeration of entries should care or not about one allocation, but it can definitely be improved to my suggestion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened issue #64767 to follow up on this.

[Theory]
[MemberData(nameof(Utf8Comment_Data))]
public static void Create_Comment_Utf8EntryName_NullEncoding(string originalComment, string expectedComment) =>
Create_Comment_EntryName_Encoding_Internal($"{SmileyEmoji}.txt", originalComment, expectedComment, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the entry name really need to be tested? It is not something that is being added in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entry name is specifically being affected by this PR, because both the comment and the entry name share the same encoding the user specifies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting that there were already tests that covered EntryName with multiple encodings, if that's not the case then it is fine.

[Theory]
[MemberData(nameof(Latin1Comment_Data))]
public static void Create_Comment_Utf8EntryName_Latin1Encoding(string originalComment, string expectedComment) =>
// Emoji not supported by latin1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you pass an emoji with latin1 encoding? I assume it just encodes it as something meaningless. We should assert that the latin1-encoded-emoji still gets truncated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I would assert that in the tests. I just wanted to make sure the string got truncated, even if the final outputted string, when viewed by the user, will be garbage where the emoji was saved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I would assert that in the tests.

On Latin1Comment_Data add something like this:

string latin1ExpectedALettersAndTruncatedEmoji = new string('a', ushort.MaxValue - 1) + SmileyEmoji[0];
string latin1OriginalALettersAndEmoji = new string('a', ushort.MaxValue - 1) + SmileyEmoji;

My point being that the truncation logic only runs when it makes sense (UTF8).

public static void Update_Comment_AsciiEntryName_NullEncoding(string originalComment, string expectedComment) =>
Update_Comment_EntryName_Encoding_Internal("file.txt",
originalComment, expectedComment, null,
new string('a', ushort.MaxValue - 1) + $"{CopyrightChar}", new string('a', ushort.MaxValue - 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing Utf8Comment_Data.Count() times the string new string('a', ushort.MaxValue - 1) + $"{CopyrightChar}", you should create another Utf8Comment_Data that includes this as a parameter and remove the redundancy.

Comment on lines 63 to 76
using (var zip = new ZipArchive(ms, ZipArchiveMode.Create, leaveOpen: true, encoding))
{
ZipArchiveEntry entry = zip.CreateEntry(entryName, CompressionLevel.NoCompression);
entry.Comment = comment;

Assert.Equal(entryName, entry.FullName);
Assert.Equal(comment, entry.Comment);
entry.Comment = originalCreateComment;
Assert.Equal(expectedCreateComment, entry.Comment);
}

// Open with no encoding to verify
using (var zip = new ZipArchive(testStream, ZipArchiveMode.Read, leaveOpen: true))
using (var zip = new ZipArchive(ms, ZipArchiveMode.Read, leaveOpen: true, encoding))
{
ZipArchiveEntry entry = zip.Entries.FirstOrDefault();

Assert.Equal(entryName, entry.FullName);
Assert.Equal(comment, entry.Comment);
foreach (ZipArchiveEntry entry in zip.Entries)
{
Assert.Equal(expectedCreateComment, entry.Comment);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call Create_Comment_EntryName_Encoding_Internal instead to avoid code duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better, you could test ZipArchiveMode.Update inside Create_Comment_EntryName_Encoding_Internal and remove zip_UpdateTests.Comments.cs entirely.

@@ -383,5 +383,57 @@ internal static void AddEntry(ZipArchive archive, string name, string contents,
w.WriteLine(contents);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be in tests folder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I don't know why it was created there. Maybe we used to share something with another assembly, but it got removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah search didn't return the other file that consumes ZipTestHelper. It's the ZipFile.csproj. I'll revert the move of this file.

Copy link
Member

@jozkee jozkee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the test nits; LGTM.

@carlossanlop carlossanlop merged commit 7065279 into dotnet:main Feb 4, 2022
@carlossanlop carlossanlop deleted the zipComment branch February 4, 2022 03:35
@carlossanlop
Copy link
Member Author

FYI I ran the MacOS CI leg twice but it failed with errors unrelated to this change:

runtime 20220203.90 / Libraries Test Run release coreclr OSX x64 Debug
[Log] The job running on agent Azure Pipelines 5 ran longer than the maximum time of 150 minutes.
runtime 20220203.90 / Libraries Test Run release coreclr OSX x64 Debug / Send to Helix
[Log] The operation was canceled.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the ArchiveComment property for ZipArchive class
6 participants