-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use fgetc/fputc instead of getc/putc #50
Conversation
Hi @anzz1 this is a cgreat PR. I just allowed CI to run. My I'll probably end up accepting your PR (whith many thanks). And I appreciate your detailed description. One thing that could, however, help is if you could make one or two test cases on OS-level that exhibit need for this change. I am desperately trying to get more tests and test coverage for this project. Kind regards, Chris |
Hi @anzz1 before I base a new release on this (and this request is entirely optional for you) could you please try to make a few calls to the xxd program on an OS of your choice that exercise these lines? Then I'll add these to CI. I can also add some UCRT runners and the like as we move forward. Kind regards, Chris |
I would love to have posted a reproducible test-case for the bug, but unfortunately I was unable to recreate the bug at will, as it only happened a few times as part of my regular workflows and I noticed the erroneous output only afterwards, when the original terminal session that caused it and thus any ability to debug it was long gone. Bugs which happen randomly seemingly out of nowhere are hard to debug, especially when the bug is not in your own code nor anywhere else where source code is available. To add to the randomness of it, some times it added the extra '\r' twice, making every 0A a 0D0D0A, which I cannot think of a reason why that should ever happen and is no more simply a newline conversion error. It always affected the whole file though (meaning it would either have the same bug for all the 0As or none of them), and always with the If it weren't for the lucky, for a lack of a better word, fact that I've encountered a similar error before in another project as noted earlier, I probably would've had a hard time figuring out how did it even happen. The bug is very esoteric and I'm fairly certain is dependent on some certain variables on your terminal session, but I don't know exactly what and I'm not up to going down the rabbit hole (again) of trying to follow codepaths inside libraries in the increasingly complex Windows CRT environment armed with only a disassembler. Debugging Windows libraries is probably a job better suited for the Microsoft engineers who, you know, actually have the source code to them.
There is no need for an UCRT runner, as the current release and runner you already have uses it. This is simply due to UCRT being the default and only option since VS2015, so by using a VS runner you are using UCRT. For a while it was possible to circumvent having to use UCRT by targeting the Windows XP toolset, but since VS2019 this is also no longer an option. While today there is no official option to target the earlier CRT's, it is still possible by creating an msvcrt.lib out of the msvcrt.dll file using the More information: MSDN: Universal CRT deployment |
OK. Got it.
Thanks @anzz1 for the detailed descriptions. Probably the best thing I could do is make dedicated tests and setup a code coverage analysis. At least then the changed lines can be tracked. This always helps quality. I have been putting this off for a while (and it will also take a while more until I get around to it). See also #52. |
This fixes a bug where sometimes line feeds 0x0A '\n' get an additional friend carriage return 0x0D '\r' (sometimes two) with them on Windows when reading/writing files.
Even without the bug, it's good form to use the same function family for all the operations,
fopen()
,fread()
,fclose()
; oropen()
,read()
,close()
; etc. and to not mix them even if mixing works fine in most situations.More information at length, not required to read below this point
This is because in the current universal CRT version
getc()
doesn't always honor the mode set in the file handle but uses the one set by the CRT initialization stub which is dependent on your terminal and it's session (what programs were you running before and what the mode was last set to, etc.). This is most likely not intended behaviour and just one of many bugs in the modern Windows UCRT, or in Windows Terminal, or both.The root culprit is probably one of these functions:
api-ms-win-crt-stdio-l1-1-0.dll:_setmode()
which gets it's value from
api-ms-win-crt-stdio-l1-1-0.dll:__acrt_iob_func()
or
api-ms-win-crt-stdio-l1-1-0.dll:_set_fmode()
which sets it to _O_TEXT (0x4000)
or maybe even
api-ms-win-crt-runtime-l1-1-0.dll:_set_app_type()
Of which I'm not certain is the one causing this, since I couldn't get this to reliably fail, only sometimes, dependent on what I was doing before in a certain terminal session.
Those calls are all part of the fluff added by the universal CRT. There are also other character-conversion related problems in it, namely UTF8 <> UTF16, which leads to problems in the std:: family functions and ASCII / WideChar versions of the CRT functions like
getc()
/getwc()
. One such error was investigated in here in case you're interested: ggerganov/llama.cpp#1010 (comment)A later comment implies that this bug was fixed in a later commit to the Windows Terminal, but I wouldn't count on that. Also Windows Terminal is not the only command-line environment, and there are multiple other venues to launch command-line processes ranging from good ol cmd (which isn't Terminal) to PowerShell to telnet to remote administration via WinRM, etc. etc.
This is just one of the many reasons I like to avoid the "modern" universal CRT like the plague. Even though Raymond Chen doesn't like you using it, the tried-and-true MSVCRT.DLL works just fine. That can't be said about the new UCRT. Unfortunately in Microsoft's push for everyone to use the new UCRT, modern MSVC versions make it annoying to link against the old working CRT as there is no option for doing so.
MinGW on the other hand links against msvcrt.lib by default and thus works fine, but GCC in Windows comes with it's own pitfalls.
Ultimately the best way to make sure Windows works without bugs is to throw the CRT out of the window altogether, but this can be annoying when writing portable code since you'd need to write everything twice, once using the Windows API and then again for everything else, adding an ugly web of #ifdefs.
Luckily in this case it seems to be enough to just use
fgetc()
instead ofgetc()
and not have to throw the whole CRT in the trash. At least the bug hasn't occured since I switched the commands.Also the MSDN description of
getc()
is wrongMaybe they once were the same, but now they are different functions in the universal CRT.
I can't be arsed to start digging through the disassembly of those UCRT dlls to figure out how they are different though.
I've already wasted enough energy dealing with the UCRT as-is.