Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lunr-gr.js and fix lunr-en.js #1445

Merged
merged 3 commits into from
Jan 3, 2018
Merged

Conversation

nickgarlis
Copy link
Contributor

screenshot from 2018-01-03 18-12-40

@mmistakes
Copy link
Owner

This looks good to me. Question on stemmers. Is it your intention to add ones from that repo you shared before? Or does it make sense to provide some documentation and how to build one, and then others provide PR's or their own as needed?

@nickgarlis
Copy link
Contributor Author

nickgarlis commented Jan 3, 2018

@mmistakes It would make sense but I personally find stemmers kind of complicated to make. I forgot to mention that I got the Greek one from this repository. I am planning on adding the ones from lunr-languages however, I haven't tried using them yet.

@mmistakes mmistakes merged commit 6fe3b9c into mmistakes:master Jan 3, 2018
@mmistakes
Copy link
Owner

I'm going to partially roll this one back. If the user sets their locale to one we don't support (which currently is a lot except en and gr) this logic is going to try and load a JavaScript file that doesn't exist. Breaking search.

<script src="{{ '/assets/js/lunr-' | append: lang | append: '.js' | absolute_url }}"></script>

Example:

locale: es-ES

Will produce and fail.

<script src="{{ '/assets/js/lunr-es.js' }}"></script>

Not sure if there is a clean way of only appending lang for stemmed versions of Lunr that exist in the theme... and falling back to -en if the user has a locale set for one that doesn't.

Might have to go back to drawing board on this one. How bad is Lunr's search without using any of the language/stemm stuff? Can we get away with just going vanilla?

The theme does support overriding the scripts, so in theory we provide a basic English version. And if you're competent enough you build your own language version as you have and load that instead.

I'm most interested in making the theme as simple as possible. It should be something where you install the theme and that's it. Anything involving complicated configs and other nonsense is out. I've already failed at that in a few places... cough cough archives.

mmistakes added a commit that referenced this pull request Jan 4, 2018
@mmistakes
Copy link
Owner

Came up with something by using Liquid's case to control the value of lang. Will just need to add more cases as languages are added. When one doesn't exist it defaults back to en.

{% if site.search == true or page.layout == "search" %}
  {% assign lang = site.locale | slice: 0,2 | default: "en" %}
  {% case lang %}
  {% when "gr" %}
    {% assign lang = "gr" %}
  {% else %}
    {% assign lang = "en" %}
  {% endcase %}
  <script src="{{ '/assets/js/lunr.min.js' | absolute_url }}"></script>
  <script src="{{ '/assets/js/lunr-' | append: lang | append: '.js' | absolute_url }}"></script>
{% endif %}

mmistakes added a commit that referenced this pull request Jan 4, 2018
@nickgarlis
Copy link
Contributor Author

nickgarlis commented Jan 4, 2018

@mmistakes Oops I forgot to set the default value.... Your solution looks great to me I totally missed that scenario.

As far as languages are concerned, Lunr doesn't work at all with Greek or any other non-latin alphabet by default. Languages that use latin characters might work but not well enough since they are most likely to use some extra characters too.

@nickgarlis
Copy link
Contributor Author

@mmistakes Ok so while I was experimenting with Russian, I came to the conclusion that Stemmers, are useful but not as necessary as I presented them to be. So my sincere apologies for that. I will try to explain what is going on.

First of all it was Lunr trimmer that blocked non-latin characters.

While we were using idx.search(query) Stemmers were kind of necessary, but now that we use idx.query things are different. I replaced idx.search(query) because after the upgrade to the newest version of Lunr wildcards were not automatically added. That means that simply typing 'c' would give no results anymore. We would either have to type in a whole word for something to appear e.g. "Cookies", or the root of that word which would be "Cook". Root searching only works thanks to stemming.

Things are a bit different now because we use

idx.query(function (q) {
  query.split(lunr.tokenizer.separator).forEach(function (term) {
    q.term(term, { boost: 100 })
       if(query.lastIndexOf(" ") != query.length-1){
        q.term(term, {  usePipeline: false, wildcard: lunr.Query.wildcard.TRAILING, boost: 10 })
       }
       if (term != ""){
         q.term(term, {  usePipeline: false, editDistance: 1, boost: 1 })
       }
  })
});

Which means that:

  • we give great priority to exact word or root matching q.term(term, { boost: 100 }) .
  • we give some priority to a query that starts with the same characters with a string in our dataset by using a trailing wildcard q.term(term, { usePipeline: false, wildcard: lunr.Query.wildcard.TRAILING, boost: 10 })( Can often be inaccurate that's why it's inside of an if statement)
  • we give a little priority to a query that needs one edit to match with a word in our dataset q.term(term, { usePipeline: false, editDistance: 1, boost: 1 }) e.g "ede" with "edge".

Simply put, stemming is only useful in exact matching q.term(term, { boost: 100 }) because that way stemmed words get greater priority but other than that wildcards work okay for other languages now that Lunr trimmer is removed. I am still willing to work on other languages but I thought this is something you wanted to know.

@mmistakes
Copy link
Owner

That makes sense. I'm cool simplifying things and leaving out the stemmers.

Search is a tricky thing which is why I held off for a long time adding it, knowing there would be a lot of nuance to it and/or users having different needs and expectations with it.

Whatever we can do to give a good vanilla experience that covers the basics and worry less about edge cases I'm all for that. If someone wants to enhance search and tweak the priorities they can do that by overriding the scripts or using one of the other search providers we'll eventually support.

@nickgarlis
Copy link
Contributor Author

@mmistakes I guess we can start messing with Algolia now

@mmistakes
Copy link
Owner

@nickgarlis I'll need a good rainy day for that 😄. I tried quickly to test it out with my own personal site, but it's so big it hit their free tier limits so I gave up.

@nickgarlis
Copy link
Contributor Author

@mmistakes hahahah thankfully I have got plenty of those where I live 😜

snayak04 pushed a commit to snayak04/OLD that referenced this pull request Jan 8, 2019
snayak04 pushed a commit to snayak04/OLD that referenced this pull request Jan 8, 2019
jpogorzelski pushed a commit to jpogorzelski/jpogorzelski.github.io.old that referenced this pull request Feb 11, 2019
kkunapuli pushed a commit to kkunapuli/kkunapuli.github.io that referenced this pull request May 30, 2019
* Remove Lunr trimmer & bring back colons
* Add Greek Stemmer
* Translate `search_placeholder_text` and `results_found` to Greek
kkunapuli pushed a commit to kkunapuli/kkunapuli.github.io that referenced this pull request May 30, 2019
kkunapuli pushed a commit to kkunapuli/kkunapuli.github.io that referenced this pull request May 30, 2019
sumeetmondal pushed a commit to sumeetmondal/sumeetmondal.github.io that referenced this pull request Sep 10, 2019
* Remove Lunr trimmer & bring back colons
* Add Greek Stemmer
* Translate `search_placeholder_text` and `results_found` to Greek
sauleh pushed a commit to sauleh/personal_website_template that referenced this pull request Oct 13, 2019
sauleh pushed a commit to sauleh/personal_website_template that referenced this pull request Oct 13, 2019
npflueger pushed a commit to npflueger/npflueger.github.io that referenced this pull request Dec 30, 2020
npflueger pushed a commit to npflueger/npflueger.github.io that referenced this pull request Dec 30, 2020
jchwenger pushed a commit to jchwenger/jchwenger.github.io that referenced this pull request May 5, 2023
* Remove Lunr trimmer & bring back colons
* Add Greek Stemmer
* Translate `search_placeholder_text` and `results_found` to Greek
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants