-
Notifications
You must be signed in to change notification settings - Fork 17
Windows: "PDF error: Couldn't open file" with some unicode filenames #111
Comments
Thanks for the bug report. pdf2djvu doesn't itself perform any conversions on the arguments.
Anyway, I wrote a small test program that should show what's exactly going on here. Could you run it with Attachment: testencoding.zip |
Source of the test program: #include <stdio.h>
#include <sys/stat.h>
#include <windows.h>
int main(int argc, char **argv)
{
struct stat st;
int rc;
int i;
printf("GetACP() = %d\n", GetACP());
printf("GetConsoleOutputCP() = %d\n", GetConsoleOutputCP());
for (i = 1; i < argc; i++) {
printf("argv[%d] = \"", i);
const char *p = argv[i];
while (*p)
printf("\\x%02X", (unsigned char)*p++);
printf("\"\n");
rc = stat(argv[i], &st);
printf("stat(argv[%d]) = %d", i, rc);
if (rc != 0)
printf(" (%s)", strerror(errno));
printf("\n");
}
wchar_t **argvw;
int argcw;
argvw = CommandLineToArgvW(GetCommandLineW(), &argcw);
if (argvw == NULL) {
fprintf(stderr, "CommandLineToArgvW() failed\n");
return 1;
}
for (i = 1; i < argcw; i++) {
printf("argvw[%d] = L\"", i);
const wchar_t *p = argvw[i];
while (*p)
printf("\\u%04X", *p++);
printf("\"\n");
rc = wstat(argvw[i], &st);
printf("wstat(argvw[%d]) = %d", i, rc);
if (rc != 0)
printf(" (%s)", strerror(errno));
printf("\n");
}
return 0;
}
/* vim:set ts=4 sts=4 sw=4 et:*/ |
Comment submitted by Thank you. I see. AFAIK non of the Microsoft defined codepages contain the character "ی". Here is the output: F:\Downloads>testencoding.exe "E:\ی.pdf"
GetACP() = 1256
GetConsoleOutputCP() = 720
argv[1] = "\x45\x3A\x5C\xED\x2E\x70\x64\x66"
stat(argv[1]) = -1 (No such file or directory)
argvw[1] = L"\u0045\u003A\u005C\u06CC\u002E\u0070\u0064\u0066"
wstat(argvw[1]) = 0 |
U+06CC (ARABIC LETTER FARSI YEH) cannot be represented in CP1256, which is your ANSI codepage. Apparently the C runtime converts the character to 0xED, which is U+064A (ARABIC LETTER YEH). That's going to be tough to fix. :-\ But I'll try at least improve the error message. |
Issue reported by
40a
at Bitbucket:I'm using pdf2djvu.exe on windows 8.1.
I have noticed that for all pdf files that contain "ی" character (U+06CC) in their names I get the following error:
When running the filepath directly ("E:\ی.pdf") it works fine and causes the file to be opened in Adobe Reader. So I suspect that the issue is caused by the way pdf2djvu decodes its arguments.
I already have tried using the
chcp 65001
command to change the cmd's codepage to utf-8, but still the same error, only the shape of the mojibake in the error message changes.Currently I have found no way around this but to rename the file to something else and then do the conversion.
The text was updated successfully, but these errors were encountered: