-
-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced Regexp Search | Permutations, Fuzzy matching #404
Comments
We are storing a lot of text and I know users having up to 40K bookmarks. I would leave fuzzy search out of the equation. The search is case-insensitive - https://github.com/jarun/Buku/wiki/Operational-notes#search Please go through the Search section. We support regex searches. And the searches are independent of the order of the tokens (try |
It may be the case. Please, try adding the following three bookmarks:
Now, say you'd like to extract the bookmarks containing each and everyone of the substrings:
You might think this is not a big deal. After all both
Unless something is eluding me, currently there's no way to get just the bookmarks containing all the requested substrings regardless both the order and the case. If so, please consider the possibility to add a search option like |
Yes, you didn't try regex. $ buku -r monkey[s]
That was a simple one but I think you can have a more complex regex pattern that would match all the string and return just what you want. Probably you are looking for lookahead. https://www.regular-expressions.info/completelines.html Coming to the deep search results. I see:
So the desired result is at the top because of the ranking algorithm (basically max matches). You can limit the number of results per page as well. So you can figure out the results you need from the topmost ones and discard the later ones. At the current stage of the project, that's the best we can offer. There's literally no RoI in implementing anything too heavy. |
I think I missed this in my earlier note - if you really need it, please feel free to implement this and raise a PR. We would be glad to help with anything you need. |
It would be great to enhance the search capabilities with features such as:
I'm not big on fuzzy searching, but let me make an example about permutations.
Say the user enters the search terms:
xxx yyy zzz www
That would be 4 different terms, that can be ordered in 4! (that is, 24) different ways:
xxx yyy zzz www
xxx yyy www zzz
xxx www zzz yyy
www yyy zzz xxx
zzz yyy xxx www
(and so on and so forth, up till the 24th possible permutation.)
Each permutation should be converted to regexp:
/([\s\S]*)xxx([\s\S]*)yyy([\s\S]*)zzz([\s\S]*)www([\s\S]*)/i
/([\s\S]*)xxx([\s\S]*)yyy([\s\S]*)www([\s\S]*)zzz([\s\S]*)/i
/([\s\S]*)xxx([\s\S]*)www([\s\S]*)zzz([\s\S]*)yyy([\s\S]*)/i
/([\s\S]*)www([\s\S]*)yyy([\s\S]*)zzz([\s\S]*)xxx([\s\S]*)/i
/([\s\S]*)zzz([\s\S]*)yyy([\s\S]*)xxx([\s\S]*)www([\s\S]*)/i
Then each bookmark matching (in either title or description/comments) at least one of the 24 regexps should be added to results.
As you can check on RegExr, such a bookmark would be —for instance— one containing the substrings:
foobar--xxXxX----foobar foobar foobar...YyYy...foobar foobar foobar****ZZZzzz**foobar foobar foobar__wWWW____foobar
regardless both the order and the case, just as long as each and every search term is there.
I have already proposed the feature on BukuBrow, but on second thought it would make more sense for it to be built directly in Buku.
The text was updated successfully, but these errors were encountered: