New path handling module that uses Nim's type system, using Path instead of string #54

timotheecour · 2018-07-18T23:03:27Z

@Araq in https://github.com/nim-lang/Nim/issues/8268#issuecomment-405817765 suggested:

You know... I think we need type Path = distinct string; type Filename = distinct string; type FileExt = distinct string and a new path handling module that uses Nim's type system.

let me expand on this a bit with other ideas to open a discussion

example usage

import ospaths2

let path ="/tmp//foo.txt".Path # cross platform, works on posix, windows, etc (no allocation)
when defined(posix)
  doAssert path.internal == "/tmp//foo.txt" # just returns internal representation (no allocation)
  doAssert $path == "/tmp/foo.txt" # note: `//` got normalized to `/`
when defined(Windows)
  doAssert path.internal == "/tmp/foo.txt"
  doAssert $path == r"C:\tmp\foo.txt"
  doAssert path == r"C:\TMP\\foo.txt" # platform specific path compare (note the case insensitivity and \\

# revamp functions that accept paths to be type-safe, self-documenting, and avoid confusion between string params (eg file contents) and path params
copyFile($"/tmp/foo.txt", $"/tmp/foo2.txt") # ok
copyFile("/tmp/foo.txt", "/tmp/foo2.txt") # ok via implicit conversion

related work

C++ boost: https://theboostcpplibraries.com/boost.filesystem-paths
super relevant discussion from D forum: Path as an object in std.path ; contains lots of pros / cons of such approach in the thread

benefits

more type safety: eg, we can have == overloaded to be platform specific path comparison
self-documenting APIs, better encapsulation
avoid confusion between string params (eg file contents) and path params
cross platform paths, eg this could work on windows: "/tmp/foo.txt".Path

cons

see arguments from Walter Bright here: https://forum.dlang.org/post/[email protected]

cannot hope to duplicate the rich interface available for strings
APIs that deal with filenames take strings and return strings, not Path objects. Your code gets littered with path and filename components that are sometimes Paths and sometimes strings and sometimes both
People like writing paths as "/etc/hosts", not Path("/etc/hosts"). People will not stand for a Path constructor that winds up allocating memory so it can rewrite the string in a canonical path representation.
There really isn't any such thing as a portable path representation. It's more than just \ vs /. There are the drive prefixes in Windows that have no analog in Linux. Sometimes case matters in Linux, where it would be ignored under Windows. There are 8.3 issues sometimes. The only thing you can do is come up with a subset of what works across systems, and then of course you have to go back to using strings when you need to access D:\foo\abc.c

question

how easy would be to migrate code? automated tooling possible
estimate code breakage; can breakage be 100% avoided? (as always, problem is third party libs)
efficiency: will that lead to more or less efficient code? eg possibly less conversion in most apps assuming most string handling happens early on

design decisions

type Path = distinct string or type Path = object ... ? the latter allows more efficient operations potentially
should $path be path.internal or path.canonical ?
should Path implicitly convert to string? cf in D we can use alias this to have that; what about Nim?
if it doesn't implicitly convert to string, a ton of code will break. => SUGGESTION: should implicitly convert
which operations allocate?

var myPath = "/etc/hosts".Path # does this allocate? I think it should not (eg in case it's unused in some code path)
let a = $myPath # this should allocate (or: this always allocates)
let ok = myPath == "/ETC//hosts" # does this allocate?

should we allow this on windows: let path ="/tmp/foo.txt".Path
should we allow this on windows: let path =r"C:\tmp\foo.txt".Path

links

Relevant: https://irclogs.nim-lang.org/06-09-2018.html#14:17:36
@mratsim

Maybe we should have a Path "distinct string" type that stores validated canonical paths.

The text was updated successfully, but these errors were encountered:

Araq · 2018-07-19T05:38:23Z

Here is just a suggestion: Start with the distinct string idea, don't use a converter to implicitly convert it back to string, write a simplistic ospaths2 library and port the compiler and koch.nim to use that instead. This way we can get realworld insights. Btw Walter Bright's points are excellent but I think 0-overhead construction via Nim's path"/foo/bar" syntax would be acceptable.

awr1 · 2018-07-25T14:11:06Z

vanilla strings of folders, files, etc. spliced against the / proc should return a Path type IMO

barcharcraz · 2022-03-13T04:33:48Z

The "real" reason to use a path type instead of "just strings" is that it makes it somewhat easier to round-trip uint16_t paths (like on windows).

The design options here are:

paths are represented as strings, transcoded to utf-8, simple, can use fast bulk transcode, but DOES NOT round trip, you can give the path API a path that exists and then it transforms it to some internal representation and gives you back a path that doesn't exist
wtf-8, this is UTF-8 but unpaired surrogates are encoded as normal, so you can represent all windows paths with no data loss
uft-8b (pep383 format) Similar to utf-8 but additionally can represent any other non-unicode narrow encoding bytes. I think you can use both, (so 383 for narrow<->narrow and wtf-8 for wide<->narrow) but you need to keep in mind http://unicode.org/L2/L2009/09236-pep383-problems.html. Basically when converting from the storage format (that's supposed to round trip) to any other format, if you encounter invalid utf-8 you must ensure that the result would produce invalid UTF-8 if you were to subsequently transform it back to the storage encoding.

In any event, supporting this for the filesystem isn't needed for nim, if you have a narrow string we should just not touch it.

the entire program (basically) is parameterized on the path type. On windows "path" is wide_string and on unix "path" is narrow_string, this makes it pretty easy to write non-portable programs but is the simplest and arguably fastest. (ideally this includes the type of the parameters to main, but happily nim hides that from you in a way that makes this insanely easy to do correctly, no need for tmain horribleness.

I think (2) and (4) are the best options, as for if it's literal string types or some path object I don't think it really matters, except that I think the obvious data representation is either an array of uint8 or an array of uint16 (little endian, always, except it doesn't really matter because only windows does wide paths and windows has never supported any big endian architecture, and probably never will (windows does not support big-endian mode on arm, or arm64, or even itanium). I think representing things as some array of path components is a crummy idea, just store offsets to where they are.

Basically I think it matters much more that we can guide users down a path (heh) that never modifies their paths than it is that we use any particular fancy object-based API. Especially important is that if you read a directory and store the result somewhere, then later (without calling any APIs to modify the path) open that file to do operations on it that open call should both complete successfully (assuming nobody deleted that file out from under you) and open the same file the OS returned. You would think this would be easy but it's not, and on windows a huge quantity of software gets this wrong, try making a file with the name "U+D800" and running some common tools on it :). If you use such an API for security sensitive things it can even cause vulnerabilities, although, to be honest, it's really window's fault for not having an "openat" system call, despite NT supporting that construct.

timotheecour changed the title ~~[WIP] [RFC] new path handling module that uses Nim's type system~~ [WIP] [RFC] new path handling module that uses Nim's type system, using Path instead of string Jul 18, 2018

timotheecour closed this as completed Jul 18, 2018

timotheecour reopened this Sep 9, 2018

narimiran transferred this issue from nim-lang/Nim Jan 2, 2019

ringabout mentioned this issue Jan 21, 2022

Roadmap for Nim #437

Closed

33 tasks

ringabout changed the title ~~[WIP] [RFC] new path handling module that uses Nim's type system, using Path instead of string~~ New path handling module that uses Nim's type system, using Path instead of string Jan 24, 2022

ringabout mentioned this issue Oct 17, 2022

add typesafe std/paths, std/files, std/dirs, std/symlinks nim-lang/Nim#20582

Merged

7 tasks

Araq closed this as completed in nim-lang/Nim#20582 Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New path handling module that uses Nim's type system, using Path instead of string #54

New path handling module that uses Nim's type system, using Path instead of string #54

timotheecour commented Jul 18, 2018

Araq commented Jul 19, 2018

awr1 commented Jul 25, 2018

barcharcraz commented Mar 13, 2022

New path handling module that uses Nim's type system, using Path instead of string #54

New path handling module that uses Nim's type system, using Path instead of string #54

Comments

timotheecour commented Jul 18, 2018

example usage

related work

benefits

cons

question

design decisions

links

Araq commented Jul 19, 2018

awr1 commented Jul 25, 2018

barcharcraz commented Mar 13, 2022