-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding store Paths on Windows and Unix #3197
Comments
Is
So per https://www.lua.org/manual/5.3/manual.html#3.1 strings in Lua can contain arbitrary bytes (even including nulls), so the WTF-8 or even UTF-16 + unpaired surrogates (the original), at the cost of confusing literals, will work fine.
But the canonical form which is hashed is part of the interface. I would hope "clean" ASCII / Unicode relative paths (assuming something like #2634 where we don't store the But if you meant all store paths are valid unicode (hashed from utf-8), so UTF-16 is only a concern to the |
I marked this as stale due to inactivity. → More info |
I closed this issue due to inactivity. → More info |
Still interested. |
#9205 should help with this |
As #2634 points out, we can share derivations and builds between Windows and Unix machines. That means we cannot just be like Rust, new Python, etc., and do both types of path correctly and be done with it. We need to also figure out how to put a Windows path in a Unix store, and Unix Path in a Windows store.
#2634 handles problems with the path root (
/...
vs DOS-styleC:\...
vs UNC\\..\...
), this can be just about the encoding. As @conferno points out::To start tackling this issue, I would recommend https://simonsapin.github.io/wtf-8/. Rust uses it too. It can encode any windows path such that valid unicode is meaning-preserved in both directions, and also round trip. It cannot, however, represent non-UTF-8, non-WTF-8 Unix paths on Windows. We cannot fix that because as Windows uses a fixed-length encoding, there is no more room to represent anything else. Beyond representing foreign paths, this is a good canonical form to ensure that "normal" Windows paths have the same hash.
We can also normalize path separators, since Windows accepts both.
Unix paths that are not well-formed WTF-8 I suggest we just ban. Do they exist already, say in
cache.nix.org
?The text was updated successfully, but these errors were encountered: