Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to influence sorting of user-supplied completions #4813

Closed
wants to merge 16 commits into from

Commits on Mar 1, 2023

  1. [better-rank-heuristic] Clarify that ranked matches prioritize word b…

    …oundaries only at word start
    
    This implies a bunch of renamings but the following commits will
    replace count_word_boundaries_match() anyway, so I'll do that on-the-fly.
    
    The only functional change in this patch is that something like "/"
    at the start of a string no longer counts as word start. That should
    hardly matter.
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    9e89f71 View commit details
    Browse the repository at this point in the history
  2. [better-rank-heuristic] Deduplicate ranked match tests

    We always use the same query string when comparing two candidate
    strings. Let's remove the redundant query literals before we add
    more tests.
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    ce68c93 View commit details
    Browse the repository at this point in the history
  3. [better-rank-heuristic] Add some tests for ranked matching

    I broke most of these while working on the new implementation.
    Some are redundant though.
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    5c294a2 View commit details
    Browse the repository at this point in the history
  4. [better-rank-heuristic] Use consistent naming for ranked match parame…

    …ters
    
    We use candidate/query in the constructor. The next patch adds the
    same names in the subsequence scoring function.  Let's use the same
    terminology in the subsequence matching function.
    
    While at it, rename the subsequence matching function to prepare
    for adding an optimal subsequence matching.
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    c89eb9d View commit details
    Browse the repository at this point in the history
  5. [better-rank-heuristic] Allow a custom index type for ArrayView

    This allows using CharCount in operator[] instead of casting to size_t
    every time. I'm afraid that some other operations like "end()" are
    not supported yet with types like CharCount; for some reason adding
    a cast there didn't work. I got some error about not being able to
    convert CharCount to size_t.
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    5c7403f View commit details
    Browse the repository at this point in the history
  6. [better-rank-heuristic] Use optimal subsequence match to rank complet…

    …ions
    
    When using fuzzy completion with candidates from "git ls-files" in
    a project with many files I often get unexpected ordering even when
    one candidate is clearly the best.
    This usually happens if both candidates are the same kind of match
    (typically only a subsequence match), which means that the leftmost
    match wins (via max_index), which is often wrong.
    Also there are some other matching heuristics like FirstCharMatch
    that can lead to unexpected ordering by dominating everything else
    (for example with query "remcc" we rank "README.asciidoc" over
    "src/remote.cc").
    
    Fix these two issues by
    1. switching to the Gotoh algorithm[1], a refinement of the
       Smith-Waterman algorithm that is optimized for affine gap
       penalties[2].  This makes us  find an optimal match instead of
       the leftmost one.
    2. dropping old heuristics that are obsoleted by the new one.
    
    Optimality is defined by a new distance heuristic which favors longer
    matching subsequences (as opposed to multiple shorter ones) by adding
    a penalty to each new gap between matches.
    
    For most of our RankedMatch test cases, we match the behavior of
    other popular fuzzy matchers.
    
    The algorithm needs quadratic time/space but that's okay, since
    our input is expected to be small. For example, candidates for
    insert-mode completion are usually below max_word_len=50. Second,
    if there's ever a very large input candidate, only match against the
    first 1000 characters.  Third, a follow-up commit will switch to the
    linear space variant.
    
    Every successful match adds a temporary heap allocation.  This feels
    bad but it's still dominated by the number of string allocations,
    so there's not much reason to optimize that.
    
    Besides, with the following patches, it's fast enough for me.
    
    In src/diff.h we have an implementation of a similar algorithm, but
    that one is optimized for matching strings with small differences.
    It is also not guaranteed to find the optimal match.
    
    Closes mawww#3806
    
    [1]: as described in "An Improved Algorithm for Matching Biological Sequences", see
         https://courses.cs.duke.edu/spring21/compsci260/resources/AlignmentPapers/1982.gotoh.pdf
    [2]: https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm#Affine
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    122db3f View commit details
    Browse the repository at this point in the history
  7. [better-rank-heuristic] Speed up ranked matching by caching iswlower …

    …and towlower
    
    Our fuzzy-matching algorithm uses iswlower and towlower to implement
    smartcase matching.  Sadly, when using glibc these functions are quite
    slow, even for ASCII inputs.  With the new fuzzy-matching algorithm,
    this shows up prominently in a CPU profile. We tirelessly call iswlower
    and towlower on the query chars, so just cache them in a prepared query
    called "RankedMatchQuery", which allows to implement smartcase_eq()
    more efficiently.
    
    When matching query "clang" against 100k files in the LLVM monorepo
    (of which 30k match), this commit makes us go from 1.8 billion cycles
    to just 1.2 (same as the old fuzzy-matching algorithm).
    
    The implementation is a bit ugly because the RankedMatchQuery object
    needs to outlive all uses of RankedMatch::operator<.  I guess we
    could try to create a type that contains both the query and query
    results to make this less ugly.  We could use the same type to get
    rid of the allocations in subsequence_match_scores(), though I don't
    know if that matters.
    
    A previous approach added a fast path for ASCII input to our wrappers
    like is_lower and to_lower. This was not ideal because it doesn't
    solve the problem for non-ASCII input.
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    e71591f View commit details
    Browse the repository at this point in the history
  8. [better-rank-heuristic] Don't allocate full fuzzy scoring matrix when…

    … only ranking matches
    
    We can compute the fuzzy matching score without keeping in memory the
    scores of all prefixes.
    Currently we only use the full score matrix for debugging.  In future
    I want to use it to get the positions of matched characters (which
    could be underlined in the UI).
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    fc10aa8 View commit details
    Browse the repository at this point in the history
  9. [menu-for-doc] rc tools doc: use menu behavior for completion again

    A recent commit moved "doc" completions to use the new
    "complete-command" but forgot to carry over the "-menu" flag. Fix that.
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    adb2a5f View commit details
    Browse the repository at this point in the history
  10. [menu-for-doc] Allow "define-command -menu" only if a completer is pa…

    …ssed as well
    
    The "define-command -menu" flag does not do anything unless there is
    a completer.  Let's forbid it so there is no confusion about whether
    this can be mixed with "complete-command".
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    649c3a5 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    be8eaa8 View commit details
    Browse the repository at this point in the history
  12. Support opt-in priority field in shell script candidates

    This is inspired by a recent change in [Helix] that fixes sorting of
    code actions.
    We have the same problem because kak-lsp uses ":prompt
    -shell-script-candidates" to show code actions.
    For example, on this Rust file:
    
    	fn main() {
    	    let f: FnOnce(HashMap<i32, i32>);
    	}
    
    with the cursor on "HashMap", a ":lsp-code-actions" will offer two
    code actions (from rust-analyzer):
    
    	Extract type as type alias"
    	Import `std::collections::HashMap`
    
    The first one is a refactoring and the second one is a quickfix.
    
    If fuzzy match scores are equal, Kakoune sorts completions
    lexicographically, which is suboptimal because the user will almost
    always want to run the quickfix first.
    
    Allow users to influence the order via a new  "-priority" switch.
    When this switch is used, Kakoune expects a second field in
    shell-script-candidates completions, like so:
    
    	Extract type as type alias"|2
    	Import `std::collections::HashMap`|1
    
    The priority field is taken into account when computing fuzzy match
    scores. Due to the lack of test cases, the math to do so does not
    have a solid footing yet. Here's how it works for now.
    
    - "distance" is the fuzzy match score (lower is better)
    - "priority" is the new user-specificed ranking, a positive integer (lower is better)
    - "query_length" is the length of the string that is used to filter completions
    
    	effective_priority = priority ^ (1 / query_length) if query_length != 0 else priority
    	prioritized_distance = distance * (effective_priority ^ sign(distance))
    
    The ideas are that
    1. A priority of 1 is neutral. Higher values increase the distance (making it worse).
    2. The longer the query, the lower the impact of "priority".
    
    ---
    
    Used by kakoune-lsp/kakoune-lsp#657
    
    [Helix]: helix-editor/helix#4134
    Part of mawww#1709
    krobelus committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    15cdb2b View commit details
    Browse the repository at this point in the history

Commits on Mar 5, 2023

  1. ranges.hh: simplify static_gather

    This function has a loop that suggests it can create a sparsely
    populated array, but if we peek at the array constructor below, we
    see that this is not actually the case.  Given that we always pass a
    std::index_sequence with adjacent indices, we don't need this anyway,
    so let's simplify.
    krobelus committed Mar 5, 2023
    Configuration menu
    Copy the full SHA
    f661c3f View commit details
    Browse the repository at this point in the history
  2. ranges.hh: more specific parameter name

    The static_gather function has a template parameter that decides if
    it shall fail when there are *more* elements than expected.
    
    The next commit wants to add another template parameter that decides
    if static_gather shall succeed when there are *fewer* elements than
    expected.
    
    Let's rename the first template parameter to avoid a perceived overlap.
    krobelus committed Mar 5, 2023
    Configuration menu
    Copy the full SHA
    01a4629 View commit details
    Browse the repository at this point in the history
  3. Support optional priority in completions options

    Closes mawww#1709
    
    This also allows supporting LSP's sortText (to a reasonable extent).
    krobelus committed Mar 5, 2023
    Configuration menu
    Copy the full SHA
    b9c0d83 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1078e67 View commit details
    Browse the repository at this point in the history