-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6622: [R] Normalize paths for filesystem API on Windows #5445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think I'm with @pitrou here that introducing automatic path normalization into the C++ filesystem API might be a bit too much scope creep -- we could provide a built-in path normalization function for consumers that don't have it (e.g. Python has pathlib, R has its own function here) |
|
I can accept that, though we should document the C++ API's requirements accordingly. I created https://issues.apache.org/jira/browse/ARROW-6629 because I didn't see any filesystem docs at all, so nowhere to just add a note in this patch. |
|
This is failing, though differently than before: https://ci.appveyor.com/project/nealrichardson/arrow/builds/27536178/job/gu10af0vck7gum81#L4342 The path normalization is making absolute paths with respect to the current working directory in the R process, and that can clash with whatever the FileSystem object has as its path. I'll need to rethink the approach. |
bef7f7f to
a1c1e3f
Compare
Codecov Report
@@ Coverage Diff @@
## master #5445 +/- ##
=========================================
Coverage ? 77.18%
=========================================
Files ? 58
Lines ? 4011
Branches ? 0
=========================================
Hits ? 3096
Misses ? 915
Partials ? 0
Continue to review full report at Codecov.
|
|
This is passing and ready for review @romainfrancois |
romainfrancois
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use fs:: here ?
r/R/filesystem.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like things potentially better fixed with fs:: cc @jimhester. I know fs has been downgraded to Suggests here, but I'm just worried we'd be missing cases with the implementation below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the fs function that does the winslash normalization we need here, but more than happy to use it if it exists.
r/R/filesystem.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably use .Platform$file.sep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not sure about the name of this function, it is not really doing anything to compute a relative path, you could probably just use normalizePath(mustWork = FALSE), assuming path doesn't actually have to exist on the filesystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: .Platform$file.sep, I intended to but then on my Windows VM, .Platform$file.sep reported "/" while tempfile() still created paths with "\" 🤷♂
I originally did normalizePath but unfortunately for this use case, normalizePath does path expansion, even with mustWork=FALSE. We need to keep the path relative (to whatever the FileSystem object holds as its cwd) and just need Windows \ to be converted to /. Regular expressions were my last resort and I'm happy to use something better or more robust if it exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a bit weird to do a no-op replacement on Unix, though I suppose it doesn't work.
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look correct to me. The simple solution should be to start with /-separated slashes and only use that.
r/R/filesystem.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? Will it work correctly if base_dir is e.g. a path in an S3 filesystem, or will it produce bogus results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question for all such cases below.
r/tests/testthat/test-filesystem.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you should simply /-normalize td above. Should be easy :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At worse you simply string-replace \ with / (but only when on Windows :-)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, then you shouldn't call it fs_relative_path...
|
Also note that |
8de4cbf to
515d710
Compare
|
I renamed the functions to I wouldn't be surprised if we had to revisit this in the future, but for now this seems like a step forward, so I'll merge unless there are objections. |
I fixed this in R, though I wonder if that's not enough. At a minimum, the C++ docs should note this requirement (@pitrou maybe you know the best place in the docs to add this?), and I still think it would be nice if all of this path normalization were handled in C++ (cf. https://issues.apache.org/jira/browse/ARROW-6324).