Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass Windows clarified filesystem path. #348

Closed
indrakaw opened this issue Jul 23, 2019 · 8 comments
Closed

Pass Windows clarified filesystem path. #348

indrakaw opened this issue Jul 23, 2019 · 8 comments

Comments

@indrakaw
Copy link

Uh, oh. I don't think the title would describe it correctly. Sorry, my English is bad.

To the point. Let's say, I'm running on Unix like OS and had to point downloaded folder into a Windows filesystems partition (eg, NTFS, FAT, etc). There will be a problem since Windows strict some character for directory and file.

Let's have an example:

I download gallery on danbooru with these tags score:>100. On Linux host, it serves as it. On Windows host (as the gallery-dl installed and run), it serves as compatible character.

The problem happen when I remove/rename the directory on NTFS partition while on Linux host , it shows "cannot remove 'some_directory': Directory not empty". I don't think there will be no problem when I have to get the file to Windows host.

It would be nice if there an option argument to fix this: On Linux, save the file path as Windows computability.

@indrakaw indrakaw changed the title Pass Windows classified filesystem path. Pass Windows clarified filesystem path. Jul 23, 2019
@indrakaw
Copy link
Author

Another example. On wget, there is --restrict-file-names=windows.

mikf added a commit that referenced this issue Jul 23, 2019
@mikf
Copy link
Owner

mikf commented Jul 23, 2019

b1bea8a adds a restrict-filenames config option that lets you specify which characters in directory and file names should be "escaped" by replacing them with an underscore. You can either set it to a string of characters you want to be replaced, or use auto, unix, or windows as special values. In your case gallery-dl -o restrict-filenames=windows ... should do the trick.

@Hrxn
Copy link
Contributor

Hrxn commented Jul 24, 2019

Okay, just to get this right, this setting is a new (global) default, i.e. for all extractors?
Because restrict-filenames is now used everywhere instead of the old clean_path function?

self.clean_path = self._build_cleanfunc(restrict)

@mikf
Copy link
Owner

mikf commented Jul 24, 2019

This option can be applied globally or per-extractor, like all the extractor.*.… options. Its default has the same behavior as before: replace <>:"\|/?* on Windows and / on everything else.

All this commit did was to move the old text.clean_path() functions into the class responsible for building filesystem paths, add a way to select/customize their behavior, and optimize them a bit (pre-compiled regex, no try-catch block)

@Hrxn
Copy link
Contributor

Hrxn commented Jul 24, 2019

Ah yes, that is basically what I meant. 😄

The old text.clean_path() functions have been repurposed, but the functionality is still the same here.

@Hrxn
Copy link
Contributor

Hrxn commented Jul 25, 2019

@mikf The example for restrict-filenames in docs/configuration.rst leads me to another question:

Considering I use this as my string value for restrict-filenames globally: "<>:\"\\|/?* "

(That is the same sequence as in the "windows" special value + a whitespace appended at the end.)

And since 95b1e4c there is the R<old>/<new>/ string formatting option:

- "R<old>/<new>/":
Replaces all occurrences of <old> with <new>
Example: {f:R /_/} -> "f_o_o_b_a_r" (if "f" is "f o o b a r")

If I now use a sting formatting option somewhere, like this: {f:R_//}

What would happen? Whatever gets applied last?

@mikf
Copy link
Owner

mikf commented Jul 25, 2019

All characters in restrict-filenames that are found in an already formatted directory or filename, however they came to be, will be replaced with underscores in the last step. In your case it would first replace underscores with spaces and then back to underscores.

In detail:

  • {f:R_/ /} would first replace all underscores in f with spaces
    (I assume you meant that instead of replacing _ with empty strings)
    obj = obj.replace(old, new)
  • The resulting string, together with the rest of the whole format string, would then be joined together to form a path segment.
    for index, func in self.fields:
    self.result[index] = func(kwargs)
    return "".join(self.result)
  • And finally this string is put through a clean_path() function to make sure it doesn't contain any "illegal" characters
    self.clean_path(
    Formatter(segment, self.kwdefault)
    .format_map(keywords).strip())

@Hrxn
Copy link
Contributor

Hrxn commented Jul 25, 2019

All characters in restrict-filenames that are found in an already formatted directory or filename, however they came to be, will be replaced with underscores in the last step. In your case it would first replace underscores with spaces and then back to underscores.

In detail:

  • {f:R_/ /} would first replace all underscores in f with spaces
    (I assume you meant that instead of replacing _ with empty strings)

Yeah, the contradiction is what I meant. 😄
Okay, the order makes sense, thanks for the clarification.

@mikf mikf closed this as completed Aug 1, 2019
mikf added a commit that referenced this issue Aug 17, 2019
- add 'path-remove' option to specify the set of characters that
 should be removed
- rename 'restrict-filenames' to 'path-restrict'
- #348, #380
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants