-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OS specific invalid characters are causing extraction to corrupt file #858
Comments
There is no validation of the destination file name, so an attempt is made to write to a filestream with a filename containing While I'm actually kind of interested in figuring out what actually happens when this is attempted, the fix here is most likely to sanitize the output path before attempting to open any output: diff --git a/src/SharpCompress/Common/ExtractionMethods.cs b/src/SharpCompress/Common/ExtractionMethods.cs
index 27d4164..80b8e9d 100644
--- a/src/SharpCompress/Common/ExtractionMethods.cs
+++ b/src/SharpCompress/Common/ExtractionMethods.cs
@@ -37,6 +37,7 @@ internal static class ExtractionMethods
options ??= new ExtractionOptions() { Overwrite = true };
var file = Path.GetFileName(entry.Key.NotNull("Entry Key is null")).NotNull("File is null");
+ file = string.Join("_", file.Split(Path.GetInvalidFileNameChars()));
if (options.ExtractFullPath)
{
var folder = Path.GetDirectoryName(entry.Key.NotNull("Entry Key is null")) |
Could it be as simple as put the string through the encoding? |
didn't work, tried UTF 8 and IBM437
However a 7zip discussion also mentions the encoding https://sourceforge.net/p/sevenzip/discussion/45798/thread/82ae0f9c/ |
issue seems to be with invalid character itself instead of the encoding, I was wrong in saying that the colon was probably unicode the following code will run into same issue File.WriteAllText("あの子の子ども : My Girlfriend's Child.txt", "test"); so we have to remove invalid characters as @Morilli suggested, I tried his solution and it works and |
Your fix names sense: we shouldn't put invalid path characters in....but seems like it wouldn't work for everything? I'm inclined to accept it as other things do it |
@adamhathcock It surely work for the issue we are having, its the same logic that 7zip is using |
…-in-filename Fix #858 - Replaces invalid filename characters
--- SharpCompress allowed files with invalid characters (according to Windows) to be extracted incorrectly, causing data loss. Therefore, fixing issue adamhathcock/sharpcompress#858 is important. --- Type: upd Breaking: False Doc Required: False Backport Required: False Part: 1/1
https://www.deviantart.com/zenoasis/art/Japanese-TV-Dorama-folder-icon-pack-162-1077192465

Download zip from there
Extract using SharpCompress,
we'll see that the files
REAL 恋愛殺人捜査班 : Real - Renai Satsujin Sosa Han.png
andあの子の子ども : My Girlfriend's Child.png
are extracted as 0 byte file with following namesREAL 恋愛殺人捜査班
,πüéπü«σ¡Éπü«σ¡Éπü¿πéÖπéé
I also tried with IBM437 encoding but same result.
However when you extract using 7zip you can see that it extracts fine and 7zip makes some changes to file name - which seeems to be removing
:
character which might be eitherU+A789
orU+2236
filenames from 7zip extraction
REAL 恋愛殺人捜査班 _ Real - Renai Satsujin Sosa Han.png
,あの子の子ども _ My Girlfriend's Child.png
The text was updated successfully, but these errors were encountered: