You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. :) I'm reporting a regression. Issue is that getEnv doesn't return
valid UTF-8 on Windows.
Why do I expect getEnv to return UTF-8? Nim expects getEnv to return a
UTF-8 string. This is evident in std/os.findExe, where it assumes that getEnv("PATH") is UTF-8 encoded, because it calls fileExists, which calls GetFileAttributesW, converting portions of PATH from UTF-8 to UTF-16.
Looking at the code, the Windows implementation retrieves the environment
variable from ANSI API. This is getenv and putenv, which, after
researching, returns strings encoded in the currently or previously used
Windows codepage.
Example
import std/envvars, sequtils
procc_wputenv(envstring: WideCString): cint {.importc: "_wputenv", header: "<stdlib.h>".}
const
envName ="test13"
strUtf8 ="\xc3\x86"# "LATIN CAPITAL LETTER AE" in UTF-8discardc_wputenv(newWideCString(envName &"="& strUtf8))
echogetEnv(envName).toSeq
Current Output
@['\xc6']
This is strUtf8 encoded in my code page (windows-1252).
Expected Output
@['\xc3', '\x86']
Possible Solution
It's not reliable to convert from ANSI encoding because the code page can
change over the course of a process' lifetime.
A solution is to use _wgetenv CRT API, which returns the environment variable
in UTF-16 encoding, and Nim can then convert it to UTF-8. I have already
implemented this solution. I will make a PR after submitting this issue.
Additional Information
My Windows codepage:
import encodings
echogetCurrentEncoding(true)
returns: windows-1252
getEnv returned UTF-8 on Windows before 6b3c77e (#18575), so this is a
regression. When bisecting, imported std/os instead of std/envvars, because std/envvars was added recently.
$ nim -v
Nim Compiler Version 1.7.1 [Linux: amd64]
Compiled at 2022-07-17
Copyright (c) 2006-2022 by Andreas Rumpf
git hash: 0d8bec695606a65c5916d0da7fcb0a976a4e1f7b
active boot switches: -d:release
The text was updated successfully, but these errors were encountered:
Hello. :) I'm reporting a regression. Issue is that
getEnv
doesn't returnvalid UTF-8 on Windows.
Why do I expect
getEnv
to return UTF-8? Nim expectsgetEnv
to return aUTF-8 string. This is evident in
std/os.findExe
, where it assumes thatgetEnv("PATH")
is UTF-8 encoded, because it callsfileExists
, which callsGetFileAttributesW
, converting portions of PATH from UTF-8 to UTF-16.Looking at the code, the Windows implementation retrieves the environment
variable from ANSI API. This is
getenv
andputenv
, which, afterresearching, returns strings encoded in the currently or previously used
Windows codepage.
Example
Current Output
This is
strUtf8
encoded in my code page (windows-1252).Expected Output
Possible Solution
It's not reliable to convert from ANSI encoding because the code page can
change over the course of a process' lifetime.
A solution is to use
_wgetenv
CRT API, which returns the environment variablein UTF-16 encoding, and Nim can then convert it to UTF-8. I have already
implemented this solution. I will make a PR after submitting this issue.
Additional Information
My Windows codepage:
returns:
windows-1252
getEnv
returned UTF-8 on Windows before6b3c77e
(#18575), so this is aregression. When bisecting, imported
std/os
instead ofstd/envvars
, becausestd/envvars
was added recently.The text was updated successfully, but these errors were encountered: