Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Comment in ZipArchive and ZipArchiveEntry #59442

Merged
merged 25 commits into from
Feb 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
08f2a8b
ref: Add public APIs.
carlossanlop Sep 22, 2021
973fc8c
src: Expose the archive and entry comments.
carlossanlop Sep 22, 2021
be54e4a
tests: Add update tests for archives and for entries. They cover crea…
carlossanlop Sep 22, 2021
28467ae
Fix encoding detection feedback
carlossanlop Sep 28, 2021
297b9e5
Fix encoding detection feedback
carlossanlop Sep 28, 2021
338b184
Address suggestions
carlossanlop Oct 6, 2021
31a2227
Switch names of archive comment fields.
carlossanlop Oct 18, 2021
78d7e37
Address unicode bit flag sharing problem.
carlossanlop Oct 19, 2021
0a29149
Add more test cases
carlossanlop Oct 19, 2021
b93305b
Adjust tests
carlossanlop Oct 19, 2021
7496628
Add newline so comment only applies to one line
carlossanlop Oct 19, 2021
a949883
Ensure string byte truncation is aligned to encoding's char size.
carlossanlop Nov 27, 2021
9738c30
Remove empty check for non-nullable string. Also remove unnecessary D…
carlossanlop Nov 29, 2021
5179ea2
Defer calculation of truncated encoding string to getter and to writ…
carlossanlop Dec 4, 2021
0e47de4
Rename test arguments
carlossanlop Dec 6, 2021
18bbcd4
Only use bytes[]
carlossanlop Dec 7, 2021
31602b3
Remove unnecessary bit comment
carlossanlop Dec 7, 2021
0ae8a64
Remove unnecessary length check
carlossanlop Dec 7, 2021
e9d5b67
Address feedback
carlossanlop Dec 8, 2021
77b6ee5
Suggestion by adamsitnik: write only if length > 0
carlossanlop Dec 7, 2021
8daedcf
Simplify EntryName code
carlossanlop Dec 8, 2021
1596bb3
Move entryName code to original location
carlossanlop Dec 8, 2021
3444631
In UTF8, use Runes to detect code point length to prevent truncating …
carlossanlop Jan 19, 2022
0749c84
Address suggestions
carlossanlop Feb 3, 2022
b08f0ac
Move ZipTestHelper back to its original position because S.IO.Compres…
carlossanlop Feb 3, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions src/libraries/Common/tests/System/IO/Compression/ZipTestHelper.cs
Original file line number Diff line number Diff line change
Expand Up @@ -383,5 +383,69 @@ internal static void AddEntry(ZipArchive archive, string name, string contents,
w.WriteLine(contents);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be in tests folder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I don't know why it was created there. Maybe we used to share something with another assembly, but it got removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah search didn't return the other file that consumes ZipTestHelper. It's the ZipFile.csproj. I'll revert the move of this file.

}
}

protected const string Utf8SmileyEmoji = "\ud83d\ude04";
protected const string Utf8LowerCaseOUmlautChar = "\u00F6";
protected const string Utf8CopyrightChar = "\u00A9";
protected const string AsciiFileName = "file.txt";
// The o with umlaut is a character that exists in both latin1 and utf8
protected const string Utf8AndLatin1FileName = $"{Utf8LowerCaseOUmlautChar}.txt";
// emojis only make sense in utf8
protected const string Utf8FileName = $"{Utf8SmileyEmoji}.txt";
protected static readonly string ALettersUShortMaxValueMinusOne = new string('a', ushort.MaxValue - 1);
protected static readonly string ALettersUShortMaxValue = ALettersUShortMaxValueMinusOne + 'a';
protected static readonly string ALettersUShortMaxValueMinusOneAndCopyRightChar = ALettersUShortMaxValueMinusOne + Utf8CopyrightChar;
protected static readonly string ALettersUShortMaxValueMinusOneAndTwoCopyRightChars = ALettersUShortMaxValueMinusOneAndCopyRightChar + Utf8CopyrightChar;

// Returns pairs that are returned the same way by Utf8 and Latin1
// Returns: originalComment, expectedComment
private static IEnumerable<object[]> SharedComment_Data()
{
yield return new object[] { null, string.Empty };
yield return new object[] { string.Empty, string.Empty };
yield return new object[] { "a", "a" };
yield return new object[] { Utf8LowerCaseOUmlautChar, Utf8LowerCaseOUmlautChar };
}

// Returns pairs as expected by Utf8
// Returns: originalComment, expectedComment
public static IEnumerable<object[]> Utf8Comment_Data()
{
string asciiOriginalOverMaxLength = ALettersUShortMaxValue + "aaa";

// A smiley emoji code point consists of two characters,
// meaning the whole emoji should be fully truncated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a case where the emoji does not get truncated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case where the string is truncated is when it's length is greater than expected. The contents don't matter. So if a string is smaller than the max length, it should remain untouched, regardless of its contents.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

string utf8OriginalALettersAndOneEmojiDoesNotFit = ALettersUShortMaxValueMinusOne + Utf8SmileyEmoji;

// A smiley emoji code point consists of two characters,
// so it should not be truncated if it's the last character and the total length is not over the limit.
string utf8OriginalALettersAndOneEmojiFits = "aaaaa" + Utf8SmileyEmoji;

yield return new object[] { asciiOriginalOverMaxLength, ALettersUShortMaxValue };
yield return new object[] { utf8OriginalALettersAndOneEmojiDoesNotFit, ALettersUShortMaxValueMinusOne };
yield return new object[] { utf8OriginalALettersAndOneEmojiFits, utf8OriginalALettersAndOneEmojiFits };

foreach (object[] e in SharedComment_Data())
{
yield return e;
}
Comment on lines +428 to +431
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to the end of the method to avoid mixing what is being reused from what is unique to this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to yield the shared cases first.

}

// Returns pairs as expected by Latin1
// Returns: originalComment, expectedComment
public static IEnumerable<object[]> Latin1Comment_Data()
{
// In Latin1, all characters are exactly 1 byte

string latin1ExpectedALettersAndOneOUmlaut = ALettersUShortMaxValueMinusOne + Utf8LowerCaseOUmlautChar;
string latin1OriginalALettersAndTwoOUmlauts = latin1ExpectedALettersAndOneOUmlaut + Utf8LowerCaseOUmlautChar;

yield return new object[] { latin1OriginalALettersAndTwoOUmlauts, latin1ExpectedALettersAndOneOUmlaut };

foreach (object[] e in SharedComment_Data())
{
yield return e;
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,8 @@ public ZipArchive(System.IO.Stream stream) { }
public ZipArchive(System.IO.Stream stream, System.IO.Compression.ZipArchiveMode mode) { }
public ZipArchive(System.IO.Stream stream, System.IO.Compression.ZipArchiveMode mode, bool leaveOpen) { }
public ZipArchive(System.IO.Stream stream, System.IO.Compression.ZipArchiveMode mode, bool leaveOpen, System.Text.Encoding? entryNameEncoding) { }
[System.Diagnostics.CodeAnalysis.AllowNull]
public string Comment { get { throw null; } set { } }
public System.Collections.ObjectModel.ReadOnlyCollection<System.IO.Compression.ZipArchiveEntry> Entries { get { throw null; } }
public System.IO.Compression.ZipArchiveMode Mode { get { throw null; } }
public System.IO.Compression.ZipArchiveEntry CreateEntry(string entryName) { throw null; }
Expand All @@ -106,6 +108,8 @@ public partial class ZipArchiveEntry
{
internal ZipArchiveEntry() { }
public System.IO.Compression.ZipArchive Archive { get { throw null; } }
[System.Diagnostics.CodeAnalysis.AllowNull]
public string Comment { get { throw null; } set { } }
public long CompressedLength { get { throw null; } }
[System.CLSCompliantAttribute(false)]
public uint Crc32 { get { throw null; } }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -218,8 +218,8 @@
<data name="EntriesInCreateMode" xml:space="preserve">
<value>Cannot access entries in Create mode.</value>
</data>
<data name="EntryNameEncodingNotSupported" xml:space="preserve">
<value>The specified entry name encoding is not supported.</value>
<data name="EntryNameAndCommentEncodingNotSupported" xml:space="preserve">
<value>The specified encoding is not supported for entry names and comments.</value>
</data>
<data name="EntryNamesTooLong" xml:space="preserve">
<value>Entry names cannot require more than 2^16 bits.</value>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Text;

namespace System.IO.Compression
Expand All @@ -27,8 +28,8 @@ public class ZipArchive : IDisposable
private uint _numberOfThisDisk; //only valid after ReadCentralDirectory
private long _expectedNumberOfEntries;
private Stream? _backingStream;
private byte[]? _archiveComment;
private Encoding? _entryNameEncoding;
private byte[] _archiveComment;
carlossanlop marked this conversation as resolved.
Show resolved Hide resolved
private Encoding? _entryNameAndCommentEncoding;
carlossanlop marked this conversation as resolved.
Show resolved Hide resolved

#if DEBUG_FORCE_ZIP64
public bool _forceZip64;
Expand Down Expand Up @@ -121,7 +122,7 @@ public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen, Encoding?
if (stream == null)
throw new ArgumentNullException(nameof(stream));

EntryNameEncoding = entryNameEncoding;
EntryNameAndCommentEncoding = entryNameEncoding;
carlossanlop marked this conversation as resolved.
Show resolved Hide resolved
Stream? extraTempStream = null;

try
Expand Down Expand Up @@ -173,7 +174,7 @@ public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen, Encoding?
_centralDirectoryStart = 0; // invalid until ReadCentralDirectory
_isDisposed = false;
_numberOfThisDisk = 0; // invalid until ReadCentralDirectory
_archiveComment = null;
_archiveComment = Array.Empty<byte>();

switch (mode)
{
Expand Down Expand Up @@ -211,6 +212,20 @@ public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen, Encoding?
}
}

/// <summary>
/// Gets or sets the optional archive comment.
/// </summary>
/// <remarks>
/// The comment encoding is determined by the <c>entryNameEncoding</c> parameter of the <see cref="ZipArchive(Stream,ZipArchiveMode,bool,Encoding?)"/> constructor.
/// If the comment byte length is larger than <see cref="ushort.MaxValue"/>, it will be truncated when disposing the archive.
/// </remarks>
[AllowNull]
public string Comment
{
get => (EntryNameAndCommentEncoding ?? Encoding.UTF8).GetString(_archiveComment);
carlossanlop marked this conversation as resolved.
Show resolved Hide resolved
set => _archiveComment = ZipHelper.GetEncodedTruncatedBytesFromString(value, EntryNameAndCommentEncoding, ZipEndOfCentralDirectoryBlock.ZipFileCommentMaxLength, out _);
carlossanlop marked this conversation as resolved.
Show resolved Hide resolved
}

/// <summary>
/// The collection of entries that are currently in the ZipArchive. This may not accurately represent the actual entries that are present in the underlying file or stream.
/// </summary>
Expand Down Expand Up @@ -345,9 +360,9 @@ public void Dispose()

internal uint NumberOfThisDisk => _numberOfThisDisk;

internal Encoding? EntryNameEncoding
internal Encoding? EntryNameAndCommentEncoding
{
get { return _entryNameEncoding; }
get => _entryNameAndCommentEncoding;

private set
{
Expand All @@ -370,10 +385,10 @@ private set
(value.Equals(Encoding.BigEndianUnicode)
|| value.Equals(Encoding.Unicode)))
{
throw new ArgumentException(SR.EntryNameEncodingNotSupported, nameof(EntryNameEncoding));
throw new ArgumentException(SR.EntryNameAndCommentEncodingNotSupported, nameof(EntryNameAndCommentEncoding));
}

_entryNameEncoding = value;
_entryNameAndCommentEncoding = value;
}
}

Expand Down Expand Up @@ -547,9 +562,7 @@ private void ReadEndOfCentralDirectory()

_expectedNumberOfEntries = eocd.NumberOfEntriesInTheCentralDirectory;

// only bother saving the comment if we are in update mode
if (_mode == ZipArchiveMode.Update)
_archiveComment = eocd.ArchiveComment;
_archiveComment = eocd.ArchiveComment;

TryReadZip64EndOfCentralDirectory(eocd, eocdStart);

Expand Down
Loading