Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New scoring weights should prefer beginning of line words more #2909

Closed
4 of 10 tasks
quyykk opened this issue Aug 6, 2022 · 8 comments
Closed
4 of 10 tasks

New scoring weights should prefer beginning of line words more #2909

quyykk opened this issue Aug 6, 2022 · 8 comments

Comments

@quyykk
Copy link

quyykk commented Aug 6, 2022

  • I have read through the manual page (man fzf)
  • I have the latest version of fzf
  • I have searched through the existing issues

Info

  • OS
    • Linux
    • Mac OS X
    • Windows
    • Etc.
  • Shell
    • bash
    • zsh
    • fish

Problem / Steps to reproduce

As stated in the release notes for 0.32.0, the scoring weights were updated. However, this changed the behavior in previous versions that preferred words at the beginning:

fzf --query=bar --height=4 << EOF
foo bar.sh
foo/bar.sh
EOF

gives:

foo/bar.sh
foo bar.sh

whereas in previous versions it would be the other way around. I definitely prefer it to match foo/bar.sh first. The current behavior makes it a pain to search for files whose names are also part of other files' name, like in the example posted above where I want to search for bar.sh.

Thanks!

@junegunn
Copy link
Owner

junegunn commented Aug 6, 2022

fzf is a general-purpose text filter, so the scoring algorithm isn't specifically designed for file paths. In this particular context, foo bar.sh may be the better match, but in other contexts, for example with a list of commands (e.g. CTRL-R), it may make sense to prefer git add something over rm foo/add on add.

fzf --query=add --height=4 << EOF
rm doc/addendum/patch1
git add somedir/patch1
EOF

What I presumed was that file paths with whitespace are not very common.

jg@:~/github> ls | wc -l
     164

jg@:~/github> fd --type f | wc -l
  247108

jg@:~/github> fd --type f | grep ' ' | wc -l
    1159

(EDIT: updated not to include .git files)

Do you have many files with spaces in their paths?

@quyykk
Copy link
Author

quyykk commented Aug 7, 2022

Yeah I do have many files with spaces. Makes sense though. So how about this: A special "path" mode for fzf that enables specific path behavior. For example, it would also be useful if files were sorted before folders (since you usually always search for files)

@junegunn
Copy link
Owner

junegunn commented Aug 8, 2022

For example, it would also be useful if files were sorted before folders (since you usually always search for files)

fzf, being a text filter, does not know which entry is a file or a directory; everything is a text. The user should just feed a list of files excluding directories to fzf.

# Only files
fd --type f --strip-cwd-prefix | fzf

Yeah I do have many files with spaces.

Can you post a few screenshots where the scoring algorithm of fzf isn't working as desired?

@quyykk
Copy link
Author

quyykk commented Aug 8, 2022

The user should just feed a list of files excluding directories to fzf.

Ah thanks that works.

In each case I'm looking for the source files. While I could filter out the image files, I can't filter the text files because I might want to open them too.

image

image

image

@junegunn
Copy link
Owner

junegunn commented Aug 8, 2022

I see. The bonus points for the boundary characters are currently hard-coded.

fzf/src/algo/algo.go

Lines 115 to 127 in 779d8e1

// We prefer matches at the beginning of a word, but the bonus should not be
// too great to prevent the longer acronym matches from always winning over
// shorter fuzzy matches. The bonus point here was specifically chosen that
// the bonus is cancelled when the gap between the acronyms grows over
// 8 characters, which is approximately the average length of the words found
// in web2 dictionary and my file system.
bonusBoundary = scoreMatch / 2
// Extra bonus for word boundary after whitespace character or beginning of the string
bonusBoundaryWhite = bonusBoundary + 2
// Extra bonus for word boundary after slash, colon, semi-colon, and comma
bonusBoundaryDelimiter = bonusBoundary + 1

But maybe we could consider adding an option to tweak the scores.

A workaround for now would be to prefix your query with / (e.g. /weap). You can even start fzf with / as the default query. (fzf --query=/)

@james64
Copy link

james64 commented Aug 23, 2022

I do not consider this a bug. But I would love the option to disable bonus points for boundary characters.

Among other things I use fzf for bash history. There I sometimes find myself in situation where I am looking for command where I remember only part of it. For simple example let's say I have forgot exact name of tcpdump. I just know its something with dump in it. Searching for dump gives me a lot of other entries like less dump_file, pr_restore smthg smthg smthg product_dev_dump_date.tar or rm dump.tgz etc. These are relevant but I need to revert to less .bash_history for simpler search to find it. I tried fzf --exact but bonus points for boundary seem to be used there as well.

If I can dream a bit then ability to toggle this boundary bonus points on/off during search session would be amazing. Then I can switch between "prefer older exact matches" and "prefer newer substring matches" live... but feel free to ground me back to earth.

@junegunn
Copy link
Owner

junegunn commented Aug 23, 2022

@james64 Have you tried (dynamically) disabling sort by pressing CTRL-R again when that happens? Still not satisfied with the result?

@james64
Copy link

james64 commented Aug 23, 2022

@junegunn I have not found this feature so far and ou man yes it helps. Together with using or not using ' in front of query this gives me all the flexibility I want. Thanks a lot!

junegunn added a commit that referenced this issue Sep 21, 2023
Without the option, you may get suboptimal results if you have many
paths with spaces in their names.

e.g. #2909 (comment)

Close #3433
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants