feat: Find diagnostics for both `C:` and `c:` forms of a path #3347

poliorcetics · 2022-08-06T23:32:06Z

Document::url() uses Document::path() which is just a reference to Document.path. Since LSPs
can send either a path with an uppercase or lowercase drive letter, we need to check for the other
option when the first fails.

We do this throughout Helix by introducing NormalizedUrl which takes care of the conversion and
delegates to its base Url when needed. If future conversions are needed, they can be added here
instead of having to repeat them all over the place.

Closes #3267

poliorcetics · 2022-08-06T23:35:33Z

I checked all uses of Document::url(), the main other case is for symbols, and I don't think it will be affected then ? Or at least, it won't lose symbols, only show a path when it could avoid it (because it's for the current file), but I'm not certain.

Frojdholm · 2022-08-07T13:41:36Z

I think this is a more general problem than just this instance. Anytime URLs are compared you risk issues, for example in diag_picker the current_path is compared again with the URLs from the LSP server.

My idea would be to create a newtype NormalizedUrl (naming to be determined) that wraps the URL and does the normalisation. This type will uphold the invariant that the drive letter is upper case, but will otherwise be a pretty thin wrapper around Url. A bonus is that if any new normalisations are discovered they can done here as well.

We would then replace all usages of Url in helix with NormalizedUrl. Code using for example Eq will then just work basically unmodified.

Frojdholm · 2022-08-09T16:43:46Z

@poliorcetics This is a draft of what I've been working on. It's quite clunky since URLs are often used to interface with other types in the lsp_types crate. The normalizing function is also a bit of a hack at the moment, but it works on my machine. Either way we will need a newtype if we want to store the URL in a hash map since the hash and eq methods are used for finding the keys.

23f4a40...Frojdholm:helix:draft-3267-normalized-url

poliorcetics · 2022-08-09T22:26:14Z

@Frojdholm I had already started on your suggestion on my side, I looked at yours in diagonal, it seems like we got similar solutions, except for the C:/c: check.

Edit: thanks anyway for the work and the suggestion, both are appreciated 👍

Frojdholm

Great work, I haven't had the opportunity to try it out yet, but it seems fairly similar to what I've implemented 👍

I have two design related comments:

I don't think this problem is restricted to Windows. Imagine for example accessing files on a Windows server through a Linux host.
I think having the extra field in NormalizedUrl complicates the design somewhat. I guess you did it for performance to avoid having to always calculate the new path? Have you measured the performance differences between the two approaches? I don't know how slow/performance critical the URL handling is, but if it's not critical I would personally prefer a simpler implementation.

helix-core/src/normalized_url.rs

poliorcetics · 2022-08-14T14:06:14Z

I have not yet pushed my changes, I was benchmarking both approach (normalizing at init and storing the result vs normalizing during comparison).

The worst case in both approach is a Windows path URL of the form file://C:/my/neat/path or file:///<same here>, because this is the form that has to be normalized for later comparison. All other forms of comparison, notably those with different schemes for the URLs, are very fast, less than a hundred (100) nanoseconds at worst, in picoseconds for the best case on my machine (when the first letter of the scheme is different, e.g, https and file).

So, worst case: file://C:/my/neat/path. This is still very fast, despite needing to .clone() for the benchmarks because Urls are not Copy.

When normalizing during construction, cloning+normalizing (which needs to generate a new URL with c:, different from the base one), takes ~315ns.
When normalizing during comparison, the same cloning and normalizing takes ~350ns, and that's because my test urls are small, if I add a few levels of depth to my test paths, it gets worse (still in the hundreds of nanoseconds, but worse all the same).

I expect performances to be worse on Windows because the allocator is (reputedly) not as good and most machines are not a Macbook Pro with the biggest M1 Pro, but even at twice the time, this is still good.

I'll continue on the road of normalizing at construction because overall all the comparisons are faster with it and the code is clearer IMO (e.g the comparison function for the normalize-during-compare case needs a little help to be faster, and it still does not reach the pre-normalized case by ~10%). On a macOS install, with local paths (so no drive letter C:), constructing, even with normalizing takes about 40ns and so shouldn't affect helix's performance at all, especially compared to reading the file itself.

Of course my benchmarks are probably not that representative, but they at least show both approach are sufficiently fast and the first one will do much less small and repeated allocations.

Frojdholm

Very thorough benchmarking 👏

helix-core/src/normalized_url.rs

poliorcetics · 2022-09-04T15:29:08Z

Rebased on recent master, it was conflicting

`Document::url()` uses `Document::path()` which is just a reference to `Document.path`. Since LSPs can send either a path with an uppercase or lowercase drive letter, we need to check for the other option when the first fails. We do this throughout Helix by introducing `NormalizedUrl` which takes care of the conversion and delegates to its base `Url` when needed. If future conversions are needed, they can be added here instead of having to repeat them all over the place.

…ndows path too

…se()` where possible. Also, remove one usage point of `NormalizedUrl` because it was constructed only to be immediately destroyed.

archseer · 2023-02-17T06:21:43Z

\cc @pascalkuthe

Sorry for the delay on this PR, I've been hoping to find a way to do this without an additional wrapper type.

pascalkuthe · 2023-02-18T01:44:21Z

The official URI reference implementation for the LSP spec is aware of this problem and even has tests for it.
They seem to simply provide an option to normalize the drive letter when converting a URI to a path.

More importantly when an URI that is encoded to a string (which is done when it's sent server -> client/client -> serve) the driver litter is always converted to lower case:

https://github.com/microsoft/vscode-uri/blob/1af8b2576342c7e6deff4d4a2c2e6cdfda170ff2/src/uri.ts#L637

Sadly the rust URL crate exposed by LSP types doesn't seem to perform any kind of normalization like this at any point.
That is probably because this specific property is specific to LSP URIs but lsp_types just uses the URL crate regardless.
There seem to be mysterious historic reasons for this: microsoft/vscode#42159 (comment)

Given that the driver letters sent by any LSP conformant server (or at-least those that work with VSCode?) should always be lowercase it should hopefully be enough for us to be careful to always lowercase the driver letters of any paths before turning them into an URL (probably enough to have a utility function for that in helix-core.

I sincerely hope that all LS actually stick to this send everything lowercase rule (but at-least all LS that are written in nodejs should as they would be using the official LSP uri library). Otherwise we need to somehow intercept messages somewhere early in the LSP client and lowercase the driver letter.

Regardless I am not a fan of introducing a new normalized path type that stores two versions of the URI. We should just normalize all our URIs to be lower-cased to match VSCode and the spec here

poliorcetics · 2023-02-21T15:24:18Z

I sincerely hope that all LS actually stick to this send everything lowercase rule (but at-least all LS that are written in nodejs should as they would be using the official LSP uri library). Otherwise we need to somehow intercept messages somewhere early in the LSP client and lowercase the driver letter.

Yeah that won't happen. Relying on that when the rest of the computing world uses uppercase letters was a mistake from the LSP spec IMO. We can avoid the problem if we lowercase drive letters everywhere but we absolutely shouldn't rely on LSPs doing the right thing

pascalkuthe · 2023-02-22T13:33:09Z

Actually an even easier solution: Do we really need to store an URI instead of just a Path? Rust PathBuf and Path correctly handles this case. Is there any reason to store an URL over just converting everything to PathBuf (we still need to normalize before sending ofcourse)?

poliorcetics · 2023-02-22T15:47:15Z

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

pascalkuthe · 2023-02-22T15:50:01Z

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

I think @archseer sentiment on the topic (#3721 (comment)) was mostly to use sshfs or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability

poliorcetics · 2023-02-22T15:58:34Z

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

I think @archseer sentiment on the topic (#3721 (comment)) was mostly to use sshfs or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability

The arguments on the issue are pretty much what I would have said about sshfs, the idea of it is nice, in practice it's not that useful because of its inherent limitations.

Lowercasing drive letters all the time seems more forward-compatible than using PathBuf here, I can change NormalizedUrl to do that instead (and so not allocate twice)

pascalkuthe · 2023-02-22T16:57:32Z

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

I think @archseer sentiment on the topic (#3721 (comment)) was mostly to use sshfs or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability

The arguments on the issue are pretty much what I would have said about sshfs, the idea of it is nice, in practice it's not that useful because of its inherent limitations.

Lowercasing drive letters all the time seems more forward-compatible than using PathBuf here, I can change NormalizedUrl to do that instead (and so not allocate twice)

I am not convinced that remote editing belongs in the core editor (that issue is labeled with A-plugin) and even if it does supporting that will require much larger changes to the way we represent buffer paths anyway. I don't think it makes sense to introduce a bunch of code for a new wrapper type (#3347 (comment)) for a hypothetical future feature right now.

The future proofing argument doesn't really make sense here as helix fundamentally only deals in Paths anyway and it's (in my opinion) a bug rather than a feature that we compared URIs at all. If you look at how we actually use URIs in the codebase there are currently exactly three operations that we perform on URIs:

send them to the LS server, URIs are always generated from Document::path (which is a PathBuf). This usecase is what Docuemnt::uri() was intended for.
Convert a URI from the server to a Path with to_file_path for local use.
Comparison with another URI. However that other URI is always generated from Document::path and hence a PathBuf/file:// URI

All usecases only work with file::// URIs anyway so we might aswell switch all occurrences of the third case to the second case to be consistent and to fix problems like #3267 (in general I think a simple URL comparison will fail). I think there will actually be more edgecases like this

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from c0c75a0 to 6d7fa93 Compare August 9, 2022 22:24

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 6d7fa93 to 33bf867 Compare August 11, 2022 21:44

Frojdholm reviewed Aug 12, 2022

View reviewed changes

helix-core/src/normalized_url.rs Outdated Show resolved Hide resolved

helix-core/src/normalized_url.rs Outdated Show resolved Hide resolved

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 33bf867 to c6b2b15 Compare August 14, 2022 14:42

Frojdholm previously approved these changes Aug 14, 2022

View reviewed changes

helix-core/src/normalized_url.rs Show resolved Hide resolved

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 400700d to 2f55acb Compare August 16, 2022 06:50

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 2f55acb to bc13b64 Compare September 4, 2022 15:28

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from bc13b64 to d7c4876 Compare September 6, 2022 06:53

kirawi added A-language-server Area: Language server client S-waiting-on-review Status: Awaiting review from a maintainer. labels Sep 13, 2022

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch 2 times, most recently from 3807b7c to d9f565f Compare October 1, 2022 16:28

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch 2 times, most recently from 8d8fc48 to 9d4203d Compare October 13, 2022 06:58

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch 2 times, most recently from 1801051 to 27216c0 Compare October 21, 2022 05:34

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 27216c0 to 286f08e Compare November 3, 2022 08:50

archseer added this to the 22.11 milestone Nov 17, 2022

archseer removed this from the 22.12 milestone Dec 3, 2022

poliorcetics added 3 commits January 9, 2023 23:33

fix: Make normalized urls work correctly on non-windows when using wi…

2ff808e

…ndows path too

fix: Reduce wasteful clones with NormalizedUrls by adding `.into_ba…

f1ceeaa

…se()` where possible. Also, remove one usage point of `NormalizedUrl` because it was constructed only to be immediately destroyed.

poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 286f08e to f1ceeaa Compare January 9, 2023 22:34

the-mikedavis dismissed Frojdholm’s stale review via f1ceeaa February 8, 2023 16:09

archseer requested a review from pascalkuthe February 17, 2023 06:20

pascalkuthe added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from a maintainer. labels Feb 18, 2023

poliorcetics closed this Apr 7, 2023

crtado mentioned this pull request May 27, 2023

Looking up diagnostics by URL is ambigous #3267

Closed

poliorcetics deleted the 3267-fix-local-diagnostic-compare branch February 12, 2024 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Find diagnostics for both `C:` and `c:` forms of a path #3347

feat: Find diagnostics for both `C:` and `c:` forms of a path #3347

poliorcetics commented Aug 6, 2022 •

edited

Loading

poliorcetics commented Aug 6, 2022

Frojdholm commented Aug 7, 2022

Frojdholm commented Aug 9, 2022

poliorcetics commented Aug 9, 2022 •

edited

Loading

Frojdholm left a comment

poliorcetics commented Aug 14, 2022

Frojdholm left a comment

poliorcetics commented Sep 4, 2022

archseer commented Feb 17, 2023

pascalkuthe commented Feb 18, 2023 •

edited

Loading

poliorcetics commented Feb 21, 2023

pascalkuthe commented Feb 22, 2023 •

edited

Loading

poliorcetics commented Feb 22, 2023

pascalkuthe commented Feb 22, 2023 •

edited

Loading

poliorcetics commented Feb 22, 2023

pascalkuthe commented Feb 22, 2023 •

edited

Loading

feat: Find diagnostics for both C: and c: forms of a path #3347

feat: Find diagnostics for both C: and c: forms of a path #3347

Conversation

poliorcetics commented Aug 6, 2022 • edited Loading

poliorcetics commented Aug 6, 2022

Frojdholm commented Aug 7, 2022

Frojdholm commented Aug 9, 2022

poliorcetics commented Aug 9, 2022 • edited Loading

Frojdholm left a comment

Choose a reason for hiding this comment

poliorcetics commented Aug 14, 2022

Frojdholm left a comment

Choose a reason for hiding this comment

poliorcetics commented Sep 4, 2022

archseer commented Feb 17, 2023

pascalkuthe commented Feb 18, 2023 • edited Loading

poliorcetics commented Feb 21, 2023

pascalkuthe commented Feb 22, 2023 • edited Loading

poliorcetics commented Feb 22, 2023

pascalkuthe commented Feb 22, 2023 • edited Loading

poliorcetics commented Feb 22, 2023

pascalkuthe commented Feb 22, 2023 • edited Loading

feat: Find diagnostics for both `C:` and `c:` forms of a path #3347

feat: Find diagnostics for both `C:` and `c:` forms of a path #3347

poliorcetics commented Aug 6, 2022 •

edited

Loading

poliorcetics commented Aug 9, 2022 •

edited

Loading

pascalkuthe commented Feb 18, 2023 •

edited

Loading

pascalkuthe commented Feb 22, 2023 •

edited

Loading

pascalkuthe commented Feb 22, 2023 •

edited

Loading

pascalkuthe commented Feb 22, 2023 •

edited

Loading