New scoring weights should prefer beginning of line words more #2909

quyykk · 2022-08-06T10:52:17Z

I have read through the manual page (man fzf)
I have the latest version of fzf
I have searched through the existing issues

Info

OS
- Linux
- Mac OS X
- Windows
- Etc.
Shell
- bash
- zsh
- fish

Problem / Steps to reproduce

As stated in the release notes for 0.32.0, the scoring weights were updated. However, this changed the behavior in previous versions that preferred words at the beginning:

fzf --query=bar --height=4 << EOF
foo bar.sh
foo/bar.sh
EOF

gives:

foo/bar.sh
foo bar.sh

whereas in previous versions it would be the other way around. I definitely prefer it to match foo/bar.sh first. The current behavior makes it a pain to search for files whose names are also part of other files' name, like in the example posted above where I want to search for bar.sh.

Thanks!

The text was updated successfully, but these errors were encountered:

junegunn · 2022-08-06T11:42:35Z

fzf is a general-purpose text filter, so the scoring algorithm isn't specifically designed for file paths. In this particular context, foo bar.sh may be the better match, but in other contexts, for example with a list of commands (e.g. CTRL-R), it may make sense to prefer git add something over rm foo/add on add.

fzf --query=add --height=4 << EOF
rm doc/addendum/patch1
git add somedir/patch1
EOF

What I presumed was that file paths with whitespace are not very common.

jg@:~/github> ls | wc -l
     164

jg@:~/github> fd --type f | wc -l
  247108

jg@:~/github> fd --type f | grep ' ' | wc -l
    1159

(EDIT: updated not to include .git files)

Do you have many files with spaces in their paths?

quyykk · 2022-08-07T13:59:03Z

Yeah I do have many files with spaces. Makes sense though. So how about this: A special "path" mode for fzf that enables specific path behavior. For example, it would also be useful if files were sorted before folders (since you usually always search for files)

junegunn · 2022-08-08T02:48:32Z

For example, it would also be useful if files were sorted before folders (since you usually always search for files)

fzf, being a text filter, does not know which entry is a file or a directory; everything is a text. The user should just feed a list of files excluding directories to fzf.

# Only files
fd --type f --strip-cwd-prefix | fzf

Yeah I do have many files with spaces.

Can you post a few screenshots where the scoring algorithm of fzf isn't working as desired?

quyykk · 2022-08-08T05:41:48Z

The user should just feed a list of files excluding directories to fzf.

Ah thanks that works.

In each case I'm looking for the source files. While I could filter out the image files, I can't filter the text files because I might want to open them too.

junegunn · 2022-08-08T05:48:46Z

I see. The bonus points for the boundary characters are currently hard-coded.

fzf/src/algo/algo.go

Lines 115 to 127 in 779d8e1

    
           // We prefer matches at the beginning of a word, but the bonus should not be 
        
           // too great to prevent the longer acronym matches from always winning over 
        
           // shorter fuzzy matches. The bonus point here was specifically chosen that 
        
           // the bonus is cancelled when the gap between the acronyms grows over 
        
           // 8 characters, which is approximately the average length of the words found 
        
           // in web2 dictionary and my file system. 
        
           bonusBoundary = scoreMatch / 2 
        
           // Extra bonus for word boundary after whitespace character or beginning of the string 
        
           bonusBoundaryWhite = bonusBoundary + 2 
        
           // Extra bonus for word boundary after slash, colon, semi-colon, and comma 
        
           bonusBoundaryDelimiter = bonusBoundary + 1

But maybe we could consider adding an option to tweak the scores.

A workaround for now would be to prefix your query with / (e.g. /weap). You can even start fzf with / as the default query. (fzf --query=/)

james64 · 2022-08-23T08:04:38Z

I do not consider this a bug. But I would love the option to disable bonus points for boundary characters.

Among other things I use fzf for bash history. There I sometimes find myself in situation where I am looking for command where I remember only part of it. For simple example let's say I have forgot exact name of tcpdump. I just know its something with dump in it. Searching for dump gives me a lot of other entries like less dump_file, pr_restore smthg smthg smthg product_dev_dump_date.tar or rm dump.tgz etc. These are relevant but I need to revert to less .bash_history for simpler search to find it. I tried fzf --exact but bonus points for boundary seem to be used there as well.

If I can dream a bit then ability to toggle this boundary bonus points on/off during search session would be amazing. Then I can switch between "prefer older exact matches" and "prefer newer substring matches" live... but feel free to ground me back to earth.

junegunn · 2022-08-23T08:34:39Z

@james64 Have you tried (dynamically) disabling sort by pressing CTRL-R again when that happens? Still not satisfied with the result?

james64 · 2022-08-23T08:39:02Z

@junegunn I have not found this feature so far and ou man yes it helps. Together with using or not using ' in front of query this gives me all the flexibility I want. Thanks a lot!

Without the option, you may get suboptimal results if you have many paths with spaces in their names. e.g. #2909 (comment) Close #3433

junegunn mentioned this issue Aug 19, 2022

0.32.1 sorting issue #2930

Closed

10 tasks

junegunn added the discussion label Aug 19, 2022

junegunn closed this as completed in 6fb41a2 Aug 28, 2022

PatrickF1 mentioned this issue Aug 4, 2023

scoring scheme documentation confusing #3387

Closed

10 tasks

junegunn added a commit that referenced this issue Sep 21, 2023

[shell] Use --scheme=path when appropriate

2bed7d3

Without the option, you may get suboptimal results if you have many paths with spaces in their names. e.g. #2909 (comment) Close #3433

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New scoring weights should prefer beginning of line words more #2909

New scoring weights should prefer beginning of line words more #2909

quyykk commented Aug 6, 2022

junegunn commented Aug 6, 2022 •

edited

Loading

quyykk commented Aug 7, 2022

junegunn commented Aug 8, 2022 •

edited

Loading

quyykk commented Aug 8, 2022 •

edited

Loading

junegunn commented Aug 8, 2022

james64 commented Aug 23, 2022

junegunn commented Aug 23, 2022 •

edited

Loading

james64 commented Aug 23, 2022

New scoring weights should prefer beginning of line words more #2909

New scoring weights should prefer beginning of line words more #2909

Comments

quyykk commented Aug 6, 2022

Info

Problem / Steps to reproduce

junegunn commented Aug 6, 2022 • edited Loading

quyykk commented Aug 7, 2022

junegunn commented Aug 8, 2022 • edited Loading

quyykk commented Aug 8, 2022 • edited Loading

junegunn commented Aug 8, 2022

james64 commented Aug 23, 2022

junegunn commented Aug 23, 2022 • edited Loading

james64 commented Aug 23, 2022

junegunn commented Aug 6, 2022 •

edited

Loading

junegunn commented Aug 8, 2022 •

edited

Loading

quyykk commented Aug 8, 2022 •

edited

Loading

junegunn commented Aug 23, 2022 •

edited

Loading