screenshot: long filename handling improvements#10052
screenshot: long filename handling improvements#10052rtldg wants to merge 1 commit intompv-player:masterfrom
Conversation
|
That's done |
b8eb5bb to
0738e2e
Compare
0738e2e to
8d62bb6
Compare
8d62bb6 to
a89f2dd
Compare
|
I'd take the UTF-8 fix without the win32-specific changes. There's #12119 for that now. |
a89f2dd to
ca0c65a
Compare
|
Forced pushed to remove the Win32 changes and to make a couple of the screenshot file writing functions use |
player/screenshot.c
Outdated
| talloc_free(append); | ||
| } | ||
|
|
||
| static void trim_invalid_utf8(char *s, size_t len) |
There was a problem hiding this comment.
Why would you want to "fix" invalid names? If the name is invalid then mpv should fail when it tries to use it.
There's no end to trying to "fix" bad names, and mpv should not try that, except if there's a very good reason.
There was a problem hiding this comment.
truncate_long_base_filename can leave the basename with an invalid codepoint so trim_invalid_utf8 is used to remove such a thing. (this is described in the PR text above)
ca0c65a to
25c1a4c
Compare
25c1a4c to
a7b5097
Compare
For screenshot filenames, it was possible for the basename to be
longer than what filesystems generally support.
On Linux, this is 255 bytes. On Windows, this is 255 wchar_t units.
Thus basenames are truncated to under 255 bytes so that the
basename + extension are <= 255 with `truncate_long_base_filename`.
It also makes sure not to produce an invalid UTF-8 codepoint in the filename.
For testing, filling `screenshot-template=` with 3-byte or 4-byte
UTF-8 codepoints is best. Such as "ウ" (3-byte) or "🌂" (4-byte).
Example: 84 * strlen("ウ") + strlen(".jpg") == 256
The last "ウ" is removed and the basename string will be
filled with 83 "ウ" characters and ".jpg" totalling 253 bytes.
a7b5097 to
03e91b6
Compare
| // If truncation produces an invalid UTF-8 codepoint, then chop that off. | ||
| static void truncate_long_base_filename(char *s, const size_t ext_len) | ||
| { | ||
| const size_t max_utf8_bytes = 255 - (ext_len + 1); // ext_len+1 for '.' |
There was a problem hiding this comment.
255 is a magic number. It should not be hardcoded. It should probably be MAX_PATH.
On windows MAX_PATH can be more than 255, because it has enough space to hold UTF8 of 260 wchar_t elements.
Also, if extlen is 255 or more, then max_utf8_bytes will wrap around to a huge number...
There was a problem hiding this comment.
It should not be MAX_PATH because MAX_PATH is not the basename length. MAX_PATH is to hold drive-letter + ":\" + basename of 255 256(??) wchar_t/char elements + NUL terminator.
Per Maximum Path Length Limitation for Windows:
This type of path is composed of components separated by backslashes, each up to the value returned in the lpMaximumComponentLength parameter of the GetVolumeInformation function (this value is commonly 255 characters).
Bothering to check GetVolumeInformation isn't worth doing though.
All relevant filesystem use 255 for segments of filename (including for non-Windows OSes).
If ext_len is 255 let mpv blow up because that's an absurd case to care about.
There was a problem hiding this comment.
It should not be
MAX_PATH
Which is why I said probably, I.e. you should figure whether it should be MAX_PATH or something else, like MAX_NAME.
The point is that 255 should not be hardcoded. It should be appropriate for the current platform, and if it's not MAX_PATH and not MAX_NAME then you should figure out what it needs to be, without calling APIs.
It should probably be some existing constant of the platform, and not hardcoded inside this function.
If
ext_lenis 255 let mpv blow up because that's an absurd case to care about.
In your applications maybe. Not in mpv.
You mean that the example you gave which should be fixed are not absurd cases, and so we should really care about them, like this?
screenshot-template="a🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂"
Luckily for you though, it won't blow up, but it will also not work.
It's not rocket science. Please fix it correctly.
There was a problem hiding this comment.
Which is why I said probably, I.e. you should figure whether it should be MAX_PATH or something else, like MAX_NAME.
Which is what I did and why 255.
The point is that 255 should not be hardcoded. It should be appropriate for the current platform, and if it's not MAX_PATH and not MAX_NAME then you should figure out what it needs to be, without calling APIs.
Don't hardcode but also figure it out without calling APIs? Okay...
If ext_len is 255 let mpv blow up because that's an absurd case to care about.
In your applications maybe. Not in mpv.
The extension comes from mpv. If mpv decides to use 255 character long extensions then that is mpv's fault.
screenshot-template="a🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂"
Luckily for you though, it won't blow up, but it will also not work.
It's not rocket science. Please fix it correctly.
You're missing something if you think that won't work, or why it was listed as an example for testing removal of invalidated UTF-8 codepoints due to truncation.
There was a problem hiding this comment.
It should probably be some existing constant of the platform
I don't know if it's available.
We have this:
Lines 539 to 547 in ef4c6df
But it's only used privately at this C file when enumerating files in a directory.
Not sure how to solve this in general. I don't think we should change the global MAX_PATH either.
There was a problem hiding this comment.
NAME_MAX seems to available in limits.h for Linux/macos as 255. It's also in my mingw64/msys2's limits.h but behind a _POSIX_ ifdef. BSDs have MAXNAMELEN which is 255 as far as I can tell.
Something like this, hardcoding 255, or calling out to GetVolumeInformation/pathconf(_PC_NAME_MAX)
#include <limits.h>
#ifndef NAME_MAX
#ifdef MAXNAMELEN
#define NAME_MAX MAXNAMELEN
#else
#define NAME_MAX 255
#endif
#endif
There was a problem hiding this comment.
For win32 you can use _MAX_FNAME https://learn.microsoft.com/en-us/cpp/c-runtime-library/path-field-limits
Note that _MAX_FNAME includes space for terminating null, while NAME_MAX does not.
EDIT: And just as I mentioned in the other PR. Shouldn't we set long paths support in manifest on Windows?
There was a problem hiding this comment.
Note that
_MAX_FNAMEincludes space for terminating null, whileNAME_MAXdoes not.
I didn't include it because of that but yeah a Windows specific define could be (_MAX_FNAME-1) which is 255
There was a problem hiding this comment.
EDIT: And just as I mentioned in the other PR. Shouldn't we set long paths support in manifest on Windows?
In my opinion, yes..
|
After an IRC discussion it turned out that choosing the right amount to truncate at is hard because it's not straightforward bytes on every OS. I see three options:
|
Subtitle text via
Converting the basename with Detailsedit: doesn't append the basename back to the path or anything so would need to be edited to do that |
You have to admit that this is a niche usecase. Users may very well write a script to correctly take screenshots named after subtitle text if they want to do that.
Sure, but this is a good example for platform-specific complicated support code that I'd like to avoid.
Terrible idea IMO. |
That's much more work than just throwing
I'd like to avoid it too especially since it could cause file access issues if you were to have a filename on NTFS that'd be longer than 255 UTF-8 bytes in Linux.
Networked file shares and also file access issues from Linux -> Windows again. |
|
Honestly, I think long/unsupported filenames should be rejected with an error for user to act on. Truncating it implicitly doesn't really help anyone. |
|
/shrug |
For screenshot filenames, it was possible for the basename to be
longer than what filesystems generally support.
On Linux, this is 255 bytes. On Windows, this is 255 wchar_t units.
Thus basenames are truncated to under 255 bytes so that the
basename + extension are <= 255 with
truncate_long_base_filename.It also makes sure not to produce an invalid UTF-8 codepoint in the filename.
For testing, filling
screenshot-template=with 3-byte or 4-byteUTF-8 codepoints is best. Such as "ウ" (3-byte) or "🌂" (4-byte).
Example: 84 * strlen("ウ") + strlen(".jpg") == 256
The last "ウ" is removed and the basename string will be
filled with 83 "ウ" characters and ".jpg" totalling 253 bytes.
I only tested on Windows 10 21H2 x64 and also here's some copy & paste
screenshot-templates