OS specific invalid characters are causing extraction to corrupt file #858

DineshSolanki · 2024-07-24T06:05:06Z

https://www.deviantart.com/zenoasis/art/Japanese-TV-Dorama-folder-icon-pack-162-1077192465
Download zip from there

Extract using SharpCompress,
we'll see that the files REAL 恋愛殺人捜査班 : Real - Renai Satsujin Sosa Han.png and あの子の子ども : My Girlfriend's Child.png are extracted as 0 byte file with following names REAL µüïµä¢µ«║Σ║║µì£µƒ╗τÅ¡ , πüéπü«σ¡Éπü«σ¡Éπü¿πéÖπéé

I also tried with IBM437 encoding but same result.

However when you extract using 7zip you can see that it extracts fine and 7zip makes some changes to file name - which seeems to be removing : character which might be either U+A789 or U+2236

filenames from 7zip extraction REAL 恋愛殺人捜査班 _ Real - Renai Satsujin Sosa Han.png, あの子の子ども _ My Girlfriend's Child.png

The text was updated successfully, but these errors were encountered:

…harpcompress#858

Morilli · 2024-07-24T21:43:58Z

There is no validation of the destination file name, so an attempt is made to write to a filestream with a filename containing :.
Interestingly this seems to somewhat work and no library function throws an exception (not even Path.GetFullPath even though it's clearly documented to do so), but the output is kinda garbage.

While I'm actually kind of interested in figuring out what actually happens when this is attempted, the fix here is most likely to sanitize the output path before attempting to open any output:

diff --git a/src/SharpCompress/Common/ExtractionMethods.cs b/src/SharpCompress/Common/ExtractionMethods.cs
index 27d4164..80b8e9d 100644
--- a/src/SharpCompress/Common/ExtractionMethods.cs
+++ b/src/SharpCompress/Common/ExtractionMethods.cs
@@ -37,6 +37,7 @@ internal static class ExtractionMethods
         options ??= new ExtractionOptions() { Overwrite = true };

         var file = Path.GetFileName(entry.Key.NotNull("Entry Key is null")).NotNull("File is null");
+        file = string.Join("_", file.Split(Path.GetInvalidFileNameChars()));
         if (options.ExtractFullPath)
         {
             var folder = Path.GetDirectoryName(entry.Key.NotNull("Entry Key is null"))

DineshSolanki · 2024-07-25T05:00:50Z

Yes that seems to be the solution, I wonder if 7zip handles it in same way.
to answer on what happens when we try it, it seems to truncate the filename and stream is consumed somewhere else as its definitely not writing to that file with truncated name.
even windows explorer can't extract it

adamhathcock · 2024-07-25T07:34:37Z

Could it be as simple as put the string through the encoding?

DineshSolanki · 2024-07-25T16:45:52Z

Could it be as simple as put the string through the encoding?

didn't work, tried UTF 8 and IBM437

var file = Path.GetFileName(entry.Key.NotNull("Entry Key is null")).NotNull("File is null");
var encoding = Encoding.UTF8;
 file = encoding.GetString(encoding.GetBytes(file)) ?? "";

However a 7zip discussion also mentions the encoding https://sourceforge.net/p/sevenzip/discussion/45798/thread/82ae0f9c/
Maybe I'm encoding in wrong manner?

DineshSolanki · 2024-07-25T16:46:54Z

issue seems to be with invalid character itself instead of the encoding, I was wrong in saying that the colon was probably unicode U+A789.

the following code will run into same issue

File.WriteAllText("あの子の子ども : My Girlfriend's Child.txt", "test");

so we have to remove invalid characters as @Morilli suggested, I tried his solution and it works

7zip https://github.com/ip7z/7zip/blob/a7a1d4a241492e81f659a920f7379c193593ebc6/CPP/7zip/UI/Common/ExtractingFilePath.cpp#L23

and
the SevenZipSharp also seems to does the same https://github.com/StevenBonePgh/SevenZipSharp/blob/d0bf5c4a3d65ea5e85ffe7ef94312354475f5714/SevenZip/ArchiveExtractCallback.cs#L624C13-L641C14

adamhathcock · 2024-07-26T12:36:36Z

Your fix names sense: we shouldn't put invalid path characters in....but seems like it wouldn't work for everything? I'm inclined to accept it as other things do it

DineshSolanki · 2024-07-26T12:45:53Z

@adamhathcock It surely work for the issue we are having, its the same logic that 7zip is using

…-in-filename Fix #858 - Replaces invalid filename characters

--- SharpCompress allowed files with invalid characters (according to Windows) to be extracted incorrectly, causing data loss. Therefore, fixing issue adamhathcock/sharpcompress#858 is important. --- Type: upd Breaking: False Doc Required: False Backport Required: False Part: 1/1

DineshSolanki added a commit to DineshSolanki/FoliCon that referenced this issue Jul 24, 2024

fix character encoding issues in extraction - refer to adamhathcock/s…

6a42266

…harpcompress#858

adamhathcock added bug up for grabs labels Jul 24, 2024

DineshSolanki changed the title ~~unicode characters are causing file to Currupt at extraction~~ OS specific invalid characters are causing extraction to corrupt file Jul 25, 2024

adamhathcock closed this as completed in 2d10df8 Jul 30, 2024

adamhathcock added a commit that referenced this issue Jul 30, 2024

Merge pull request #859 from DineshSolanki/#858-fix-invalid-character…

06a983e

…-in-filename Fix #858 - Replaces invalid filename characters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OS specific invalid characters are causing extraction to corrupt file #858

OS specific invalid characters are causing extraction to corrupt file #858

DineshSolanki commented Jul 24, 2024

Morilli commented Jul 24, 2024

DineshSolanki commented Jul 25, 2024 •

edited

Loading

adamhathcock commented Jul 25, 2024

DineshSolanki commented Jul 25, 2024 •

edited

Loading

DineshSolanki commented Jul 25, 2024 •

edited

Loading

adamhathcock commented Jul 26, 2024

DineshSolanki commented Jul 26, 2024

OS specific invalid characters are causing extraction to corrupt file #858

OS specific invalid characters are causing extraction to corrupt file #858

Comments

DineshSolanki commented Jul 24, 2024

Morilli commented Jul 24, 2024

DineshSolanki commented Jul 25, 2024 • edited Loading

adamhathcock commented Jul 25, 2024

DineshSolanki commented Jul 25, 2024 • edited Loading

DineshSolanki commented Jul 25, 2024 • edited Loading

adamhathcock commented Jul 26, 2024

DineshSolanki commented Jul 26, 2024

DineshSolanki commented Jul 25, 2024 •

edited

Loading

DineshSolanki commented Jul 25, 2024 •

edited

Loading

DineshSolanki commented Jul 25, 2024 •

edited

Loading