Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ArchiveComment property for ZipArchive class #1547

Closed
Tracked by #62658
Aimeast opened this issue Jan 29, 2017 · 4 comments · Fixed by #59442
Closed
Tracked by #62658

Add the ArchiveComment property for ZipArchive class #1547

Aimeast opened this issue Jan 29, 2017 · 4 comments · Fixed by #59442
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.IO.Compression
Milestone

Comments

@Aimeast
Copy link

Aimeast commented Jan 29, 2017

Edit by carlossanlop: API Proposal here


When we make a zip package, the comment is very important function that we use it very often. The corefx has already implemented it, see the definition _archiveComment field at ZipArchive.cs#L32, the reader at ZipArchive.cs#L561 and the writter at ZipArchive.cs#L693.

We can inbreak the ZipArchive instance by reflect and modify the _archiveComment field, then saved zip file is fine.

Proposed API

Just make private field _archiveComment as public property ArchiveComment, or other advises.

@ianhays
Copy link
Contributor

ianhays commented Jan 30, 2017

related to https://github.com/dotnet/corefx/issues/14853 and (to a lesser extent) https://github.com/dotnet/corefx/issues/12066 and https://github.com/dotnet/corefx/issues/10223

There are many fields in the ZipArchive header that I would like to see exposed in some manner eventually. I'm a bit hesitant to start exposing them one-by-one as we may end up with a messy API that we wish we could change but can't because of compat. I'd rather we look at the entire Zip header and decide what to expose - and how - all at once. https://github.com/dotnet/corefx/issues/14853 is one possible way of doing this.

@carlossanlop
Copy link
Member

carlossanlop commented Sep 13, 2019

Python exposes the zip archive comment in their public APIs, for both the ZipFile (the actual archive) and the ZipInfo (an individual archive member).

Java offers only a get method for the ZipFile comment, but has a get and set methods for the ZipEntry.

WinZip and WinRAR have the ability to show it too. Theunzip tool in Unix automatically prints the zip archive comment right before starting to extract files.

7-zip, Windows Explorer and corefx do not show the zip archive comment, although Windows Explorer allows hovering on a zip file and show a snippet of that comment, but not all.

@carlossanlop carlossanlop transferred this issue from dotnet/corefx Jan 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.IO.Compression untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@carlossanlop carlossanlop added api-suggestion Early API idea and discussion, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@carlossanlop carlossanlop added this to the Future milestone Jan 9, 2020
@carlossanlop
Copy link
Member

carlossanlop commented Sep 21, 2021

We are already collecting the archive comment:

And the file comment:

So making these fields public is just a matter of ensuring the setters are properly delimited according to the spec.

The API proposal may look like this:

namespace System.IO.Compression
{
    public partial class ZipArchive : IDisposable
    {
        public ZipArchive(Stream stream);
        public ZipArchive(Stream stream, ZipArchiveMode mode);
        public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen);
        public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen, Encoding? entryNameEncoding);
+       public string? Comment { get; set; }
        public ReadOnlyCollection<ZipArchiveEntry> Entries { get; }
        public ZipArchiveMode Mode { get; }
        public ZipArchiveEntry CreateEntry(string entryName);
        public ZipArchiveEntry CreateEntry(string entryName, CompressionLevel compressionLevel);
        public void Dispose();
        protected virtual void Dispose(bool disposing);
        public ZipArchiveEntry? GetEntry(string entryName);
    }
    public partial class ZipArchiveEntry
    {
        internal ZipArchiveEntry() { }
        public ZipArchive Archive { get ; }
+       public string? Comment { get; set; }
        public long CompressedLength { get ; }
        public uint Crc32 { get; }
        public int ExternalAttributes { get; set; }
        public string FullName { get; }
        public DateTimeOffset LastWriteTime { get; set; }
        public long Length { get; }
        public string Name { get; }
        public void Delete();
        public Stream Open();
        public override string ToString();
    }
}

Questions

  • For ZipArchive, should we also add a constructor that takes a comment? The question does not apply for ZipArchiveEntry because it only has an internal default constructor.
  • Should the Comment default value be a string? or a string?

Additional details

The ZIP file format spec indicates that:

  • The comment of each individual file is defined by two fields in the file header: a 2-byte comment length, and the comment itself.
  • The comment of the archive is defined in the end of the central directory record, also as a 2-byte comment length, then the actual comment itself.

Archive comment restrictions

Archive comment spec definition
   4.3.16  End of central directory record:

      end of central dir signature    4 bytes  (0x06054b50)
      number of this disk             2 bytes
      number of the disk with the
      start of the central directory  2 bytes
      total number of entries in the
      central directory on this disk  2 bytes
      total number of entries in
      the central directory           2 bytes
      size of the central directory   4 bytes
      offset of start of central
      directory with respect to
      the starting disk number        4 bytes
      .ZIP file comment length        2 bytes
      .ZIP file comment       (variable size)
Comment encoding
Bit 11: Language encoding flag (EFS).  If this bit is set,
                the filename and comment fields for this file
                MUST be encoded using UTF-8. (see APPENDIX D)

And also:

D.2 If general purpose bit 11 is unset, the file name and comment SHOULD conform 
to the original ZIP character encoding.  If general purpose bit 11 is set, the 
filename and comment MUST support The Unicode Standard, Version 4.1.0 or 
greater using the character encoding form defined by the UTF-8 storage 
specification.  The Unicode Standard is published by the The Unicode
Consortium (www.unicode.org).  UTF-8 encoded data stored within ZIP files 
is expected to not include a byte order mark (BOM).
Max size of comment
   4.4.10 file name length: (2 bytes)
   4.4.11 extra field length: (2 bytes)
   4.4.12 file comment length: (2 bytes)

       The length of the file name, extra field, and comment
       fields respectively.  The combined length of any
       directory record and these three fields SHOULD NOT
       generally exceed 65,535 bytes.  If input came from standard
       input, the file name length is set to zero.  

File comment restrictions

File comment spec definition
4.3.12  Central directory structure:

      [central directory header 1]
      .
      .
      . 
      [central directory header n]
      [digital signature] 

      File header:

        central file header signature   4 bytes  (0x02014b50)
        version made by                 2 bytes
        version needed to extract       2 bytes
        general purpose bit flag        2 bytes
        compression method              2 bytes
        last mod file time              2 bytes
        last mod file date              2 bytes
        crc-32                          4 bytes
        compressed size                 4 bytes
        uncompressed size               4 bytes
        file name length                2 bytes
        extra field length              2 bytes
        file comment length             2 bytes
        disk number start               2 bytes
        internal file attributes        2 bytes
        external file attributes        4 bytes
        relative offset of local header 4 bytes

        file name (variable size)
        extra field (variable size)
        file comment (variable size)
Non-encryption of archive comment
4.4.26 .ZIP file comment: (Variable)

      The comment for this .ZIP file.  ZIP file comment data
      is stored unsecured.  No encryption or data authentication
      is applied to this area at this time.  Confidential information
      SHOULD NOT be stored in this section.
Max size of comment
 Note: As stated above, the size of the entire .ZIP file
         header, including the file name, comment, and extra
         field SHOULD NOT exceed 64K in size.

Note

There's mention of a Info-ZIP Unicode Comment Extra Field in the Third party mappings section of the spec. This is not related to the standard comments mentioned above.

@carlossanlop carlossanlop added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Sep 21, 2021
@carlossanlop carlossanlop modified the milestones: Future, 7.0.0 Sep 21, 2021
@bartonjs
Copy link
Member

bartonjs commented Sep 21, 2021

Video

Since the spec says that there's always a comment (but it can be zero length) we don't think the properties should be nullable (null-returning). The setters can certainly accept null to normalize to whatever the default we have historically done. (string.Empty / "Built with .NET", whatever it is we do...)

If you don't want to do null-normalization, then remove the [AllowNull]s, but it's probably nice to do.

namespace System.IO.Compression
{
    public partial class ZipArchive : IDisposable
    {
        public ZipArchive(Stream stream);
        public ZipArchive(Stream stream, ZipArchiveMode mode);
        public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen);
        public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen, Encoding? entryNameEncoding);
+       [AllowNull]
+       public string Comment { get; set; }
        public ReadOnlyCollection<ZipArchiveEntry> Entries { get; }
        public ZipArchiveMode Mode { get; }
        public ZipArchiveEntry CreateEntry(string entryName);
        public ZipArchiveEntry CreateEntry(string entryName, CompressionLevel compressionLevel);
        public void Dispose();
        protected virtual void Dispose(bool disposing);
        public ZipArchiveEntry? GetEntry(string entryName);
    }
    public partial class ZipArchiveEntry
    {
        internal ZipArchiveEntry() { }
        public ZipArchive Archive { get ; }
+       [AllowNull]
+       public string Comment { get; set; }
        public long CompressedLength { get ; }
        public uint Crc32 { get; }
        public int ExternalAttributes { get; set; }
        public string FullName { get; }
        public DateTimeOffset LastWriteTime { get; set; }
        public long Length { get; }
        public string Name { get; }
        public void Delete();
        public Stream Open();
        public override string ToString();
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Sep 21, 2021
@carlossanlop carlossanlop self-assigned this Sep 21, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 22, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 4, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Mar 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.IO.Compression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants