Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getEnv doesn't return UTF-8 on Windows #20083

Closed
havardjohn opened this issue Jul 25, 2022 · 1 comment · Fixed by #20084
Closed

getEnv doesn't return UTF-8 on Windows #20083

havardjohn opened this issue Jul 25, 2022 · 1 comment · Fixed by #20084

Comments

@havardjohn
Copy link
Contributor

Hello. :) I'm reporting a regression. Issue is that getEnv doesn't return
valid UTF-8 on Windows.

Why do I expect getEnv to return UTF-8? Nim expects getEnv to return a
UTF-8 string. This is evident in std/os.findExe, where it assumes that
getEnv("PATH") is UTF-8 encoded, because it calls fileExists, which calls
GetFileAttributesW, converting portions of PATH from UTF-8 to UTF-16.

Looking at the code, the Windows implementation retrieves the environment
variable from ANSI API. This is getenv and putenv, which, after
researching, returns strings encoded in the currently or previously used
Windows codepage.

Example

import std/envvars, sequtils
proc c_wputenv(envstring: WideCString): cint {.importc: "_wputenv", header: "<stdlib.h>".}
const
  envName = "test13"
  strUtf8 = "\xc3\x86" # "LATIN CAPITAL LETTER AE" in UTF-8
discard c_wputenv(newWideCString(envName & "=" & strUtf8))
echo getEnv(envName).toSeq

Current Output

@['\xc6']

This is strUtf8 encoded in my code page (windows-1252).

Expected Output

@['\xc3', '\x86']

Possible Solution

It's not reliable to convert from ANSI encoding because the code page can
change over the course of a process' lifetime.

A solution is to use _wgetenv CRT API, which returns the environment variable
in UTF-16 encoding, and Nim can then convert it to UTF-8. I have already
implemented this solution. I will make a PR after submitting this issue.

Additional Information

My Windows codepage:

import encodings
echo getCurrentEncoding(true)

returns: windows-1252

getEnv returned UTF-8 on Windows before 6b3c77e (#18575), so this is a
regression. When bisecting, imported std/os instead of std/envvars, because
std/envvars was added recently.

$ nim -v
Nim Compiler Version 1.7.1 [Linux: amd64]
Compiled at 2022-07-17
Copyright (c) 2006-2022 by Andreas Rumpf

git hash: 0d8bec695606a65c5916d0da7fcb0a976a4e1f7b
active boot switches: -d:release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants