Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve textDocument/definition ordering #2792

Merged
merged 3 commits into from
Aug 15, 2024

Conversation

lewis6991
Copy link
Contributor

@lewis6991 lewis6991 commented Aug 11, 2024

E.g. if the current document has path c/file.lua and the results have paths:

  • c/file.lua
  • a/file.lua

Prefer c/file.lua since that is most similar to the current document.


Real world example used these paths:

From /Users/lewis/projects/neovim/runtime/lua/vim/lsp/handlers.lua with results (in order):

  • /Users/lewis/.cache/lua-language-server/meta/LuaJIT en-us utf8/builtin.lua
  • /Users/lewis/projects/neovim/runtime/lua/vim/lsp/handlers.lua

changelog.md Outdated Show resolved Hide resolved
@lewis6991 lewis6991 force-pushed the feat/deforder branch 2 times, most recently from 5925607 to aace084 Compare August 12, 2024 13:29
changelog.md Show resolved Hide resolved
@sumneko
Copy link
Collaborator

sumneko commented Aug 14, 2024

PleaseUseCamelCaseNamingConvention (<- Translate by ChatGPT, I tried a lot)

@lewis6991 lewis6991 force-pushed the feat/deforder branch 2 times, most recently from d57f537 to 444848c Compare August 14, 2024 11:05
script/core/definition.lua Outdated Show resolved Hide resolved
--- @param a string[]
--- @param b string[]
--- @return number
local function levenshteinDistance(a, b)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, your requirement is to find the length of the common prefix of two strings. Why did you use such a complex algorithm?
Also, considering the application scenario, there's a good chance that two strings might be entirely identical, so a fast handling branch for this case should be implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not perfect, but is a lot of better than u1 < u2.

I'm happy to optimise this more, but keep in mind results <= 2, so there is little benefit in optimizing this too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, your requirement is to find the length of the common prefix of two strings. Why did you use such a complex algorithm?

This is a standard algorithm and implementation for finding the differences between two strings. I adapted it slightly to work on lists.

Copy link
Contributor

@tomlau10 tomlau10 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your requirement is to find the length of the common prefix of two strings

This just wakes me up 😮. Although this is a standard algorithm for finding diff between 2 strings, this is not the same as finding common prefix, and I think this algorithm is not suitable for the current situation. Lets me use some test codes:

local currentFile = "/path/to/some/project/current/file/directroy/test.lua"
local uris = {
    "/path/to/some/project/current/file/directroy/test.lua",
    "/path/to/some/project/current/file/directroy/file.lua",
    "/path/to/some/project/current/file/dir2/test.lua",
    "/path/to/some/project/current/file/other/file.lua",
    "/path2/to/some/project/current/file/directroy/test.lua",
}

for i, uri in ipairs(uris) do
    print(i, ("%.6f"):format(pathSimilarityRatio(currentFile, uri)), uri)
end
  • this prints
1	0.000000	/path/to/some/project/current/file/directroy/test.lua
2	0.250000	/path/to/some/project/current/file/directroy/file.lua
3	0.250000	/path/to/some/project/current/file/dir2/test.lua
4	0.375000	/path/to/some/project/current/file/other/file.lua
5	0.250000	/path2/to/some/project/current/file/directroy/test.lua
  • 1 has a distance of 0.0 => expected ✅
  • but 2, 3, 5 all have the same distance 0.25 => I think this is not we want? ❌
  • the example 5 is the most problematic, it's difference starts with the root path, but still has the same score as 2 / 3 => I don't think this is a desirable behavior either ❌
  • definitely we prefer 2 more, because it has greater length of common prefixes

This is not about whether the algorithm is standard or not, it's just unsuitable for this situation. And if all we want is to find common prefix, we need not to use this algorithm with high complexity (just unnecessary, even though #results is 95% time <= 2).

One more edge case, when the distance score (or the common prefix length) is the same, I guess they should be ordered alphabetically? Otherwise they may be in random orders each time when we check for definitions. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common prefix isn't always suitable.

The aim wasn't to provide something that's perfect. It was to provide something that was an improvement over < which isn't suitable in any way.

Please let's stop trying to find 0.0001% edge cases, it just isn't worth the effort.

Before the ordering for me was always 0% correct as it always favoured the builtin, and now it's 99% correct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you carefully consider whether you want to prioritize files in the nearby directory or files with similar paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 😅 😅

The common prefix isn't always suitable.

Can you provide some examples where common prefix is not suitable?
By sorting with greatest common prefix length, it can solve the examples you provided in your initial description as well.
And if common prefix isn't always suitable, then I think the Levenshtein distance isn't always suitable either. And I think it is worse than common prefix in a few major aspects:

  • the code complexity for future maintenance
  • the time complexity of the algorithm
  • cannot handle the cases that I provided (this can be countered by providing examples that common prefix cannot handle)

As you have said, the aim wasn't to provide something that's perfect. I agree. But then when both common prefix and Levenshtein distance are not perfect, why not use the simpler one instead of a complex one? This is just "using a sledgehammer to crack a nut" (殺雞用牛刀). And even then the nut is not cracked perfectly 😂 .

Please let's stop trying to find 0.0001% edge cases, it just isn't worth the effort.

I don't agree that edge case I mentioned is a 0.0001% edge case. It's very often that, a value's definition is just one folder level above. The is related to the completeness of the logic. For any comparator function that are sorting records, most of the time we want to compare those major columns, and fallback to compare the id field as a last resort. In this case it is the u1 < u2 => this fallback comparison is always deterministic.

simularity_cache[u1] = simularity_cache[u1] or pathSimilarityRatio(uri, u1)
simularity_cache[u2] = simularity_cache[u2] or pathSimilarityRatio(uri, u2)
if simularity_cache[u1] ~= simularity_cache[u2] then
  return simularity_cache[u1] < simularity_cache[u2]
end
return u1 < u2

Of course I'm no maintainer and you may just ignore my suggestion if you disagree with me. 😅

Copy link
Contributor Author

@lewis6991 lewis6991 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the simpler one instead of a complex one

Time. Which I have no more of for this pull request.

Spending more time on this, is unlikely to result in observable better functionality.

I'll just fix this on the client side.

@sumneko
Copy link
Collaborator

sumneko commented Aug 15, 2024

I don't have time to test, but I believe that this pull request has been completed.

EDIT:
Accidental operation.

@sumneko sumneko merged commit 1d515a0 into LuaLS:master Aug 15, 2024
11 checks passed
@sumneko
Copy link
Collaborator

sumneko commented Aug 15, 2024

While working on another PR, I mistakenly merged this PR. I'm now attempting to rebuild this PR.

@sumneko
Copy link
Collaborator

sumneko commented Aug 15, 2024

It looks like there's no need to attempt to rebuild the PR.

Please note that this PR was actually rejected.

@lewis6991 lewis6991 deleted the feat/deforder branch August 15, 2024 07:41
@mfussenegger
Copy link

Would you accept a version of this that simply adds a score+boost for any entry in results that matches uri, to ensure a local definition always comes first while otherwise preserving the current ordering?

I think this would solve the main issue/use case and the implementation would be simpler

@lewis6991
Copy link
Contributor Author

Would you accept a version of this that simply adds a score+boost for any entry in results that matches uri, to ensure a local definition always comes first while otherwise preserving the current ordering?

That wouldn't work for definitions in different files. The main problem is the builtins getting ordered first.

@mfussenegger
Copy link

Would you accept a version of this that simply adds a score+boost for any entry in results that matches uri, to ensure a local definition always comes first while otherwise preserving the current ordering?

That wouldn't work for definitions in different files. The main problem is the builtins getting ordered first.

I see. How about flipping it around and add a penalty for the builtins to boost project local definitions in general?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants