add encoding parameter to creating tar entry #364

itn3000 · 2019-06-25T04:53:58Z

This PR is similar to #182, but this PR adds System.Text.Encoding to TarEntry related API.
default is same as current master behavior(omit upper byte) because more faster than using encoding if tar archive assured having ASCII filename only.

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

default is same as current master behavior(omit upper byte)

itn3000 · 2019-08-06T01:49:51Z

Codacy says "TarInputStream.CreateEntry(byte[] headerBuffer, Encoding enc) should be static method".
But other TarInputStream.CreateEntry functions are not static, should I CreateEntry(byte[] headerBuffer, Encoding enc) make static?

piksel · 2019-08-06T11:28:44Z

The problem right now with Codacy is that the current code is way beyond bad.
I think it's still better to be consistent here than to adhere to Codacy since it's better to do a full cleanup in another commit of all the functions.

piksel · 2020-04-12T11:21:24Z

As stated in #182:
The specification suggests using extended headers for encodings other than ASCII, but GNU Tar writes the UTF8-encoded bytes as the Name so I guess we should too.

This should be fine.

piksel

Great implementation, my argument name change suggestion resulted in 40+ changes, hope that doesn't seem to daunting (was meant as a quick way to use the suggestions).

As for the defaults, I would mark the encoding-less constructors as

[Obsolete("No Encoding for Name field is specified, any non-ASCII bytes will be discarded")]

And on that note, the behaviour of passing null as the encoding is not specified. It should be documented.

And one last thing. The test does only test for UTF-8 in, UTF-8 out. It does not verify the actual bytes in the name. The reason for testing this is that Encoding.UTF8 emits BOM, which I don't think we want in the name bytes (but I can be wrong here, I will check what GNU Tar does).
If, instead, UTF8Encoding() is used, it will not emit any BOM. Maybe we should guide the consumer to making the right choice here as well?

Thanks for your contribution!

src/ICSharpCode.SharpZipLib/Tar/TarArchive.cs

src/ICSharpCode.SharpZipLib/Tar/TarOutputStream.cs

test/ICSharpCode.SharpZipLib.Tests/Tar/TarTests.cs

itn3000 · 2020-04-14T02:57:08Z

Thank you for reviewing, I'd like to reflect the review results.

And on that note, the behaviour of passing null as the encoding is not specified. It should be documented.

Should it be written on README or API comment?
It may be long description for API comment.

And one last thing. The test does only test for UTF-8 in, UTF-8 out. It does not verify the actual bytes in the name. The reason for testing this is that Encoding.UTF8 emits BOM, which I don't think we want in the name bytes (but I can be wrong here, I will check what GNU Tar does).
If, instead, UTF8Encoding() is used, it will not emit any BOM. Maybe we should guide the consumer to making the right choice here as well?

Asserting raw bytes of name is necessary, I will add.
This impl use Encoding.GetBytes/GetString for manupilating name bytes and does not use StreamReader/Writer.
Encoding.UTF8.GetBytes() does not write BOM(I did ensure with output tar, there is no 0xEF 0xBB 0xBF in binary), so there is no problem to specify Encoding.UTF8 as encoding parameter.

piksel · 2020-04-15T15:42:44Z

Regarding documentation:
Perhaps something short, like "or null for ASCII"?

Regarding Encoding.UTF8:
That's odd? The documentation for that property explicitly warns about how it emits them:
https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.utf8?view=netframework-4.8#remarks
Edit: Never mind. I found the reason here:
https://docs.microsoft.com/en-us/dotnet/api/system.text.utf8encoding.getpreamble?view=netframework-4.8#remarks (the notice in the bottom).

…harpcode#364) but IEntryFactory does not considering name encoding.

itn3000 · 2020-04-27T15:27:52Z

@piksel I've almost done with requested changes, please review it.

src/ICSharpCode.SharpZipLib/Tar/TarArchive.cs

add encoding parameter to creating tar entry

96f78f8

default is same as current master behavior(omit upper byte)

piksel mentioned this pull request Jul 29, 2019

I think we need the IsUnicodeText property in TarEntry #156

Closed

piksel requested changes Apr 12, 2020

View reviewed changes

Merge branch 'master' into add-tar-encoding-pr

4405dbd

itn3000 added 3 commits April 24, 2020 09:13

add encoding tests(cp932) and add doc comment(icsharpcode#364)

59b1cba

add header bytes test and mark obsolete methods without encodings(ics…

aff780a

…harpcode#364) but IEntryFactory does not considering name encoding.

forget to mark as obsoleting(icsharpcode#364)

5457905

Numpsy reviewed May 20, 2020

View reviewed changes

src/ICSharpCode.SharpZipLib/Tar/TarArchive.cs Outdated Show resolved Hide resolved

src/ICSharpCode.SharpZipLib/Tar/TarArchive.cs Outdated Show resolved Hide resolved

add doc comment for name encoding parameter (icsharpcode#364)

80fd970

piksel self-requested a review May 28, 2020 18:49

piksel approved these changes Jun 19, 2020

View reviewed changes

piksel merged commit 4bbcb4b into icsharpcode:master Jun 19, 2020

This was referenced Jun 19, 2020

Tar filenames with national characters not correctly stored. #26

Closed

writing correct Tar UTF8 filenames #182

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add encoding parameter to creating tar entry #364

add encoding parameter to creating tar entry #364

itn3000 commented Jun 25, 2019

itn3000 commented Aug 6, 2019

piksel commented Aug 6, 2019 •

edited

Loading

piksel commented Apr 12, 2020

piksel left a comment

itn3000 commented Apr 14, 2020 •

edited

Loading

piksel commented Apr 15, 2020 •

edited

Loading

itn3000 commented Apr 27, 2020 •

edited

Loading

add encoding parameter to creating tar entry #364

add encoding parameter to creating tar entry #364

Conversation

itn3000 commented Jun 25, 2019

itn3000 commented Aug 6, 2019

piksel commented Aug 6, 2019 • edited Loading

piksel commented Apr 12, 2020

piksel left a comment

Choose a reason for hiding this comment

itn3000 commented Apr 14, 2020 • edited Loading

piksel commented Apr 15, 2020 • edited Loading

itn3000 commented Apr 27, 2020 • edited Loading

piksel commented Aug 6, 2019 •

edited

Loading

itn3000 commented Apr 14, 2020 •

edited

Loading

piksel commented Apr 15, 2020 •

edited

Loading

itn3000 commented Apr 27, 2020 •

edited

Loading