-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add segments of words with punctuation to index #225
Comments
Addition from #267 — we should ideally apply this to a range of punctuation characters that make sense, so (NB: We want to avoid indexing |
Firstly, amazing project! Found it through Astro's Starlight and love it! Do you have any thoughts around similar situations with |
Hey @lorenzolewis 👋 Ooh, I haven't yet given it thought but it seems fine! I don't think I'm too worried about over-splitting. In that example it isn't a big deal if you can search for "donald" and get a result for "McDonalds" — perhaps having a minimum length on splitting words would help (3+ characters?). I think the main thing I would like to do for segmented words is de-rank the partial matches. Currently results are ranked by how close your search word is, so if you're searching for I think this could tap into the new weighting feature, though, and make these partial matches weaker than they would otherwise be. In which case, over-indexing is a negligible problem. I'll keep you posted — I'll actually look at this next 👀 |
RE: 3+ characters: I know some Apple naming conventions use something like this: But this also opens up another question: Would this be picked up since it's |
I'll likely use directly, or base this handling on, https://github.com/withoutboats/heck as it has pretty robust splitting: t!(test8: "this-contains_ ALLKinds OfWord_Boundaries" => "This Contains All Kinds Of Word Boundaries") Looking at their implementation, RE: RE: 3+ characters: |
Hey all ! 👋 Good news — this has landed in Pagefind v1.0.0! ✨ See the full release notes here: https://github.com/CloudCannon/pagefind/releases/tag/v1.0.0 💙 Ping me here if you have any questions about the implementation! |
Discussion in #215
Perhaps Pagefind should automatically index a word like
color-accent
as[color-accent, color, accent]
The text was updated successfully, but these errors were encountered: