Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Find diagnostics for both C: and c: forms of a path #3347

Conversation

poliorcetics
Copy link
Contributor

@poliorcetics poliorcetics commented Aug 6, 2022

Document::url() uses Document::path() which is just a reference to Document.path. Since LSPs
can send either a path with an uppercase or lowercase drive letter, we need to check for the other
option when the first fails.

We do this throughout Helix by introducing NormalizedUrl which takes care of the conversion and
delegates to its base Url when needed. If future conversions are needed, they can be added here
instead of having to repeat them all over the place.

Closes #3267

@poliorcetics
Copy link
Contributor Author

I checked all uses of Document::url(), the main other case is for symbols, and I don't think it will be affected then ? Or at least, it won't lose symbols, only show a path when it could avoid it (because it's for the current file), but I'm not certain.

@Frojdholm
Copy link
Contributor

I think this is a more general problem than just this instance. Anytime URLs are compared you risk issues, for example in diag_picker the current_path is compared again with the URLs from the LSP server.

My idea would be to create a newtype NormalizedUrl (naming to be determined) that wraps the URL and does the normalisation. This type will uphold the invariant that the drive letter is upper case, but will otherwise be a pretty thin wrapper around Url. A bonus is that if any new normalisations are discovered they can done here as well.

We would then replace all usages of Url in helix with NormalizedUrl. Code using for example Eq will then just work basically unmodified.

@Frojdholm
Copy link
Contributor

@poliorcetics This is a draft of what I've been working on. It's quite clunky since URLs are often used to interface with other types in the lsp_types crate. The normalizing function is also a bit of a hack at the moment, but it works on my machine. Either way we will need a newtype if we want to store the URL in a hash map since the hash and eq methods are used for finding the keys.

23f4a40...Frojdholm:helix:draft-3267-normalized-url

@poliorcetics poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from c0c75a0 to 6d7fa93 Compare August 9, 2022 22:24
@poliorcetics
Copy link
Contributor Author

poliorcetics commented Aug 9, 2022

@Frojdholm I had already started on your suggestion on my side, I looked at yours in diagonal, it seems like we got similar solutions, except for the C:/c: check.

Edit: thanks anyway for the work and the suggestion, both are appreciated 👍

@poliorcetics poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 6d7fa93 to 33bf867 Compare August 11, 2022 21:44
Copy link
Contributor

@Frojdholm Frojdholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, I haven't had the opportunity to try it out yet, but it seems fairly similar to what I've implemented 👍

I have two design related comments:

  • I don't think this problem is restricted to Windows. Imagine for example accessing files on a Windows server through a Linux host.
  • I think having the extra field in NormalizedUrl complicates the design somewhat. I guess you did it for performance to avoid having to always calculate the new path? Have you measured the performance differences between the two approaches? I don't know how slow/performance critical the URL handling is, but if it's not critical I would personally prefer a simpler implementation.

helix-core/src/normalized_url.rs Outdated Show resolved Hide resolved
helix-core/src/normalized_url.rs Outdated Show resolved Hide resolved
@poliorcetics
Copy link
Contributor Author

I have not yet pushed my changes, I was benchmarking both approach (normalizing at init and storing the result vs normalizing during comparison).

The worst case in both approach is a Windows path URL of the form file://C:/my/neat/path or file:///<same here>, because this is the form that has to be normalized for later comparison. All other forms of comparison, notably those with different schemes for the URLs, are very fast, less than a hundred (100) nanoseconds at worst, in picoseconds for the best case on my machine (when the first letter of the scheme is different, e.g, https and file).

So, worst case: file://C:/my/neat/path. This is still very fast, despite needing to .clone() for the benchmarks because Urls are not Copy.

  • When normalizing during construction, cloning+normalizing (which needs to generate a new URL with c:, different from the base one), takes ~315ns.
  • When normalizing during comparison, the same cloning and normalizing takes ~350ns, and that's because my test urls are small, if I add a few levels of depth to my test paths, it gets worse (still in the hundreds of nanoseconds, but worse all the same).

I expect performances to be worse on Windows because the allocator is (reputedly) not as good and most machines are not a Macbook Pro with the biggest M1 Pro, but even at twice the time, this is still good.

I'll continue on the road of normalizing at construction because overall all the comparisons are faster with it and the code is clearer IMO (e.g the comparison function for the normalize-during-compare case needs a little help to be faster, and it still does not reach the pre-normalized case by ~10%). On a macOS install, with local paths (so no drive letter C:), constructing, even with normalizing takes about 40ns and so shouldn't affect helix's performance at all, especially compared to reading the file itself.

Of course my benchmarks are probably not that representative, but they at least show both approach are sufficiently fast and the first one will do much less small and repeated allocations.

@poliorcetics poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 33bf867 to c6b2b15 Compare August 14, 2022 14:42
Frojdholm
Frojdholm previously approved these changes Aug 14, 2022
Copy link
Contributor

@Frojdholm Frojdholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very thorough benchmarking 👏

helix-core/src/normalized_url.rs Show resolved Hide resolved
@poliorcetics poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 400700d to 2f55acb Compare August 16, 2022 06:50
@poliorcetics poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 2f55acb to bc13b64 Compare September 4, 2022 15:28
@poliorcetics
Copy link
Contributor Author

Rebased on recent master, it was conflicting

@poliorcetics poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from bc13b64 to d7c4876 Compare September 6, 2022 06:53
@kirawi kirawi added A-language-server Area: Language server client S-waiting-on-review Status: Awaiting review from a maintainer. labels Sep 13, 2022
@poliorcetics poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch 2 times, most recently from 3807b7c to d9f565f Compare October 1, 2022 16:28
@poliorcetics poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch 2 times, most recently from 8d8fc48 to 9d4203d Compare October 13, 2022 06:58
@poliorcetics poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch 2 times, most recently from 1801051 to 27216c0 Compare October 21, 2022 05:34
@poliorcetics poliorcetics force-pushed the 3267-fix-local-diagnostic-compare branch from 27216c0 to 286f08e Compare November 3, 2022 08:50
@archseer archseer added this to the 22.11 milestone Nov 17, 2022
@archseer archseer removed this from the 22.12 milestone Dec 3, 2022
`Document::url()` uses `Document::path()` which is just a reference to `Document.path`. Since LSPs
can send either a path with an uppercase or lowercase drive letter, we need to check for the other
option when the first fails.

We do this throughout Helix by introducing `NormalizedUrl` which takes care of the conversion and
delegates to its base `Url` when needed. If future conversions are needed, they can be added here
instead of having to repeat them all over the place.
…se()` where possible.

Also, remove one usage point of `NormalizedUrl` because it was constructed only to be immediately
destroyed.
@archseer
Copy link
Member

\cc @pascalkuthe

Sorry for the delay on this PR, I've been hoping to find a way to do this without an additional wrapper type.

@pascalkuthe
Copy link
Member

pascalkuthe commented Feb 18, 2023

The official URI reference implementation for the LSP spec is aware of this problem and even has tests for it.
They seem to simply provide an option to normalize the drive letter when converting a URI to a path.

More importantly when an URI that is encoded to a string (which is done when it's sent server -> client/client -> serve) the driver litter is always converted to lower case:

https://github.com/microsoft/vscode-uri/blob/1af8b2576342c7e6deff4d4a2c2e6cdfda170ff2/src/uri.ts#L637

Sadly the rust URL crate exposed by LSP types doesn't seem to perform any kind of normalization like this at any point.
That is probably because this specific property is specific to LSP URIs but lsp_types just uses the URL crate regardless.
There seem to be mysterious historic reasons for this: microsoft/vscode#42159 (comment)

Given that the driver letters sent by any LSP conformant server (or at-least those that work with VSCode?) should always be lowercase it should hopefully be enough for us to be careful to always lowercase the driver letters of any paths before turning them into an URL (probably enough to have a utility function for that in helix-core.

I sincerely hope that all LS actually stick to this send everything lowercase rule (but at-least all LS that are written in nodejs should as they would be using the official LSP uri library). Otherwise we need to somehow intercept messages somewhere early in the LSP client and lowercase the driver letter.

Regardless I am not a fan of introducing a new normalized path type that stores two versions of the URI. We should just normalize all our URIs to be lower-cased to match VSCode and the spec here

@pascalkuthe pascalkuthe added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from a maintainer. labels Feb 18, 2023
@poliorcetics
Copy link
Contributor Author

I sincerely hope that all LS actually stick to this send everything lowercase rule (but at-least all LS that are written in nodejs should as they would be using the official LSP uri library). Otherwise we need to somehow intercept messages somewhere early in the LSP client and lowercase the driver letter.

Yeah that won't happen. Relying on that when the rest of the computing world uses uppercase letters was a mistake from the LSP spec IMO. We can avoid the problem if we lowercase drive letters everywhere but we absolutely shouldn't rely on LSPs doing the right thing

@pascalkuthe
Copy link
Member

pascalkuthe commented Feb 22, 2023

Actually an even easier solution: Do we really need to store an URI instead of just a Path? Rust PathBuf and Path correctly handles this case. Is there any reason to store an URL over just converting everything to PathBuf (we still need to normalize before sending ofcourse)?

@poliorcetics
Copy link
Contributor Author

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

@pascalkuthe
Copy link
Member

pascalkuthe commented Feb 22, 2023

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

I think @archseer sentiment on the topic (#3721 (comment)) was mostly to use sshfs or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability

@poliorcetics
Copy link
Contributor Author

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

I think @archseer sentiment on the topic (#3721 (comment)) was mostly to use sshfs or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability

The arguments on the issue are pretty much what I would have said about sshfs, the idea of it is nice, in practice it's not that useful because of its inherent limitations.

Lowercasing drive letters all the time seems more forward-compatible than using PathBuf here, I can change NormalizedUrl to do that instead (and so not allocate twice)

@pascalkuthe
Copy link
Member

pascalkuthe commented Feb 22, 2023

Does helix intend to support remote editing one day (like hx scp://user@remote[:port]//path/to/file.txt) ? If so, using PathBuf would be detrimental

I think @archseer sentiment on the topic (#3721 (comment)) was mostly to use sshfs or similar so I am inclined to say that is a problem we can putoff until helix actually gains that capability

The arguments on the issue are pretty much what I would have said about sshfs, the idea of it is nice, in practice it's not that useful because of its inherent limitations.

Lowercasing drive letters all the time seems more forward-compatible than using PathBuf here, I can change NormalizedUrl to do that instead (and so not allocate twice)

I am not convinced that remote editing belongs in the core editor (that issue is labeled with A-plugin) and even if it does supporting that will require much larger changes to the way we represent buffer paths anyway. I don't think it makes sense to introduce a bunch of code for a new wrapper type (#3347 (comment)) for a hypothetical future feature right now.

The future proofing argument doesn't really make sense here as helix fundamentally only deals in Paths anyway and it's (in my opinion) a bug rather than a feature that we compared URIs at all. If you look at how we actually use URIs in the codebase there are currently exactly three operations that we perform on URIs:

  • send them to the LS server, URIs are always generated from Document::path (which is a PathBuf). This usecase is what Docuemnt::uri() was intended for.
  • Convert a URI from the server to a Path with to_file_path for local use.
  • Comparison with another URI. However that other URI is always generated from Document::path and hence a PathBuf/file:// URI

All usecases only work with file::// URIs anyway so we might aswell switch all occurrences of the third case to the second case to be consistent and to fix problems like #3267 (in general I think a simple URL comparison will fail). I think there will actually be more edgecases like this

@poliorcetics poliorcetics deleted the 3267-fix-local-diagnostic-compare branch February 12, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-language-server Area: Language server client S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Looking up diagnostics by URL is ambigous
5 participants