Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(path): change normalization strategy #7658

Closed
wants to merge 0 commits into from

Conversation

woojiq
Copy link
Contributor

@woojiq woojiq commented Jul 17, 2023

Closes #7458
If the file does not have a language detected but has a symbolic link, try using symlink file extension to set the language. This applies to opening new files as well as previewing them in the file picker.

Demo: https://asciinema.org/a/KISAzgMSh7ToTcZ3yKme5f0hA
I run the patched helix for the first time, where you can see at the bottom of the screen that the canonicalized path is a file with no extension (+ syntax hightlighting).
I am using the master helix for the second time (- syntax hightlighting)

TODO:

  • Add tests with symlinks

helix-core/src/path.rs Outdated Show resolved Hide resolved
helix-view/src/document.rs Outdated Show resolved Hide resolved
helix-view/src/document.rs Outdated Show resolved Hide resolved
@pascalkuthe
Copy link
Member

pascalkuthe commented Jul 17, 2023

I think this is actually an accidental/unintentional side effect introduced by #6290. I think the real crux is that we don't actually want to resolve symbolic link in normalize_paths.

I think we can still keep the cannonicalization (which is the only way to correctly compare two paths) while making that work. We probably want to iterate the ancestors in reverse instead and check:

  • if its a symlink => cannonicalize parent, cannonicalize this path and rember the pair for later
  • at the last ancestor that exists (so usually the full path) and when encountering another symlink cannonicalize the path, strip the last symlink, strip the last symlink we encountered (which could be this path) and then join the result on top of the last saved parent

@woojiq woojiq marked this pull request as draft July 17, 2023 12:02
@woojiq
Copy link
Contributor Author

woojiq commented Jul 18, 2023

I'm not sure I understand you. Should we change canonicalization strategy?
For example we have ~/1/2/a.cpp which is a symlink to /3/4/b what are steps? I can't going step by step with your explanation.
Maybe I overthink.

@pascalkuthe
Copy link
Member

Hmm let me show some exanoles in your case we would cannicolize ~/1/2/ and then must houn the link name.

What I mean twith my description is that we basically want to cannonicalize the parent directory and then just join the link name on top...

Howver the parent directory (and any of its parents) could also be a symlink. So you would need to start at the bottom.

For example /foo/link1/bar/link2/baz/link3.cpp we would cannicolize /foo

Then join link1, then cannonicalize the resukting oarh and remeber the re

Next we cannonicalize /too/link1/bar and strip_prefix the oath we remembered (so we just endup with bar again but cannonicalizd) and join that in too of our existing paths.

We repeat the sane thing for link2 and baz (cannonicalize link parent, join link name, cannonicalize path, caboonicalize next link parent and strip prefix to join again).

Thks way we basically don't resolve symlinks while still normalizing filenames/paths correctly in every case

@woojiq
Copy link
Contributor Author

woojiq commented Jul 29, 2023

If I understand correctly, if we implement this, we should call get_normalize_path when opening files instead of get_canonicalize_path, but use the second one to check if we have already opened same file.

@pascalkuthe
Copy link
Member

I don't think that makes too much of a difference what functions are called when should not change, only get_normalized_path should be changed as I described above

@woojiq
Copy link
Contributor Author

woojiq commented Jul 29, 2023

If we implement this and try to open the original file (after symlink), we will have two buffers. VsCode and Vim behave this way, it seems. Is this the desired result or am I wrong?

@pascalkuthe
Copy link
Member

pascalkuthe commented Jul 29, 2023

If we implement this and try to open the original file (after symlink), we will have two buffers. VsCode and Vim behave this way, it seems. Is this the desired result or am I wrong?

that's at least how helix used to behave before the (unintentional) change to cannoncialize symlinks. Even in the past the cannoncialize function has never resolved symlinks.

A buffer should have a single unique path. We should only ever use that path to determine the language of the file and interact with it in other way. Otherwise, you can symlink (foo.c, foo.rs and foo.sh all to foo and the language would be what? We also need to provide a single path to the LSP and the LSP can't backtrack that the file comes from a link so for symlinks to work with LSP we need to treat them as separate buffers.

The problem of potentially having multiple links open at once exists anyway (hardlinks) so I don't think that's that big of an issue (especially because we hide symlinks that point into the current workspace from the file picker so you have to put in some effort to open the same file twice)

@woojiq
Copy link
Contributor Author

woojiq commented Jul 29, 2023

Yeah, that's exactly what I was thinking. Very clear explanation. I just didn't know what the previous behavior was when opening a symlink and original file, so I assumed that maybe current behavior was desired. Now I got it, ty.

@woojiq woojiq force-pushed the symlink_extension_fallback branch 2 times, most recently from e1e0857 to abdf417 Compare July 31, 2023 11:30
@woojiq
Copy link
Contributor Author

woojiq commented Jul 31, 2023

While changing the implementation, I found one interesting case.
E.g. we have foo/bar which is a symbolic link to baz/. And we have the path smth/foo/bar/... What do we want to see as a result: smth/foo/ or smth/? We don't want to resolve symlinks, so I suspect smth/foo/ is the desired result?

I believe Pr is now ready for reviewing, but in a few days I will add tests with creating files and links to catch more pitfalls.

@woojiq woojiq marked this pull request as ready for review July 31, 2023 11:38
@woojiq woojiq changed the title Symlink extension as a fallback language Change path normalization strategy Jul 31, 2023
@pascalkuthe
Copy link
Member

pascalkuthe commented Jul 31, 2023

While changing the implementation, I found one interesting case. E.g. we have foo/bar which is a symbolic link to baz/. And we have the path smth/foo/bar/... What do we want to see as a result: smth/foo/ or smth/? We don't want to resolve symlinks, so I suspect smth/foo/ is the desired result?

I believe Pr is now ready for reviewing, but in a few days I will add tests with creating files and links to catch more pitfalls.

I think in that case we actually want to keep the colons. smth/ would be wrong (that's not how symlink resolution works on the OS level and the main invariant of normalization is that the new path should always point to the same file) and whatever the parent of baz is would also be incorrect (since that would resolve the link). I think if you encounter colons you should manually handle them (instead of canonicalizing/calling dunce) by calling path.pop. If you reach a symlink that way you just keep the component instead (also you need to check if path.pop returns double colons in that case you had previously decided to keep these colons and need to keep the previous colons and the new colons (so that smth/foo/bar/../... also canonicalizes correctly/stays unchanged).

@woojiq woojiq force-pushed the symlink_extension_fallback branch 4 times, most recently from 340ede9 to 27c054c Compare August 1, 2023 18:41
@woojiq woojiq marked this pull request as draft August 1, 2023 19:51
@woojiq
Copy link
Contributor Author

woojiq commented Aug 2, 2023

Okay, I think I'm done with this. If you have corner cases that would be easy to test, I can add them for testing.
Added tempfile to helix-core dev-deps.

@woojiq woojiq changed the title Change path normalization strategy fix(path): change normalization strategy Aug 2, 2023
@woojiq woojiq marked this pull request as ready for review August 2, 2023 12:03
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this lookgs good to me. Ideally we would reduce the number of cannonivalize calls a bit and try to sk.how do case folding on windows for symlinks too but that would involve pulling in the windows API and make the code more complex.

I think the edgecase of case folding symlinks on windows is rare enough to ignore (symlinks are rare on windows anyway) and the couple extra realpath syscalls shouldnt matter much

helix-core/tests/path.rs Outdated Show resolved Hide resolved
@woojiq
Copy link
Contributor Author

woojiq commented Sep 2, 2023

Rebased on master.

@the-mikedavis the-mikedavis added the A-core Area: Helix core improvements label Dec 21, 2023
the-mikedavis
the-mikedavis previously approved these changes Jan 2, 2024
@the-mikedavis the-mikedavis added the S-waiting-on-review Status: Awaiting review from a maintainer. label Jan 2, 2024
@pascalkuthe
Copy link
Member

One more edgecase for windows: You must use std cannonicalize instead of dunce and then only simplify the final path with https://docs.rs/dunce/latest/dunce/fn.simplified.html otherwise stripping the prefix may fail.

I think for unix I would actually prefer if we just reverted to the old handling (no cannonicalization at all) to avoid the (on slow FS potentially significant) overhead. Other unix programs (like vim, nvim or emacs) don't handle this either. I think on unix the cases where this would be useful are extremely nieche. This is mainly important for windows (casefolding)

@pascalkuthe pascalkuthe added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from a maintainer. labels Jan 8, 2024
@woojiq
Copy link
Contributor Author

woojiq commented Jan 8, 2024

Damn, Windows... I suggest adding #[cfg(windows)] for this block https://github.com/helix-editor/helix/blob/master/helix-core/src/path.rs#L43 and be happy 🙂
This PR looks like it shouldn't be solved like that, there may be something simpler.

@pascalkuthe
Copy link
Member

pascalkuthe commented Jan 8, 2024

I considered that but I do think the points I made to motivate this PR still apply for windows. We want helix to correctly handle symlinks on windows too (or more accurately similarly as unix). I do think what you implemented here is good enough for windows (except the detail I mentioned). You just need to feature gate the new parent and normal match arms and add the old implementation back (plus the use of different function I mentioned above)

@woojiq
Copy link
Contributor Author

woojiq commented Jan 13, 2024

Wait what. How did I do it? I just pulled from master and rebased as usual 😅
My master-branch somehow contained these changes, so I probably messed up rebase or smth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
3 participants