Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The integrated elasticlunr search can't work with language zh #1349

Closed
t3link opened this issue Feb 12, 2021 · 12 comments
Closed

The integrated elasticlunr search can't work with language zh #1349

t3link opened this issue Feb 12, 2021 · 12 comments
Labels
documentation done in pr Already done in a PR

Comments

@t3link
Copy link

t3link commented Feb 12, 2021

Bug Report

Hello, I'm a Chinese user and fresh to Zola. According to the document, I build a Zola-zh from source (tag v0.13.0) locally to enable Chinese index. And I'm using a theme DeepThought which already support search function. However the javascript code runs error.

Uncaught Error: Cannot load un-registered function: trimmer-zh
    at elasticlunr.min.js:10
    at Array.forEach (<anonymous>)
    at Function.t.Pipeline.load (elasticlunr.min.js:10)
    at Function.t.Index.load (elasticlunr.min.js:10)
    at search (site.js:140)
    at HTMLInputElement.<anonymous> (site.js:212)
    at HTMLInputElement.dispatch (jquery-3.5.1.min.js:2)
    at HTMLInputElement.v.handle (jquery-3.5.1.min.js:2)

There are some bad things for elasticlunr.

  1. don't support Chinese and no longer be maintained.
  2. the offline search result seems kind of inaccurate...
  3. build time cost much. I've tested 10,000 short Chinese articles locally and the build progress even hanged.

Is there any idea to support other search service like algolia? Zola can just generate a simple structured json data filled with page title, content, tags and so on. Then on the user side, we import the json data to suitable search service using a deploy script or manually...

Thanks :d

@Keats
Copy link
Collaborator

Keats commented Feb 12, 2021

Chinese/Japanese search has been disabled by default in https://github.com/getzola/zola/blob/master/components/search/Cargo.toml#L8 because it inflated the binary size a lot (I guess shipping a dictionary or something? The binary went up to something like 90MB when they were included).

Elasticlunr is definitely not the best solution when you have a lot of data. Does Algolia have some defined data format they need?

@t3link
Copy link
Author

t3link commented Feb 12, 2021

@Keats https://www.algolia.com/doc/guides/sending-and-managing-data/prepare-your-data/
Here is an example. =v=

[
    {
        "objectID":"a unique string id",
        "title":"${page.title}",
        "description":"${page.description}",
        "content":"${page.content}",
        "created":"${page.date}",
        "updated":"${page.updated}",
        "categories":"${page.taxonomies.categories}",
        "tags":"${page.taxonomies.tags}",
        "permalink":"${page.permalink}"
    }
]
  • objectID: used to create or update index.If null, Algolia server will auto generate one. If existed, Algolia will do update.
  • title description content : for searching.
  • created updated categories tags : for filtering or customizing ranking.
  • permalink : for displaying
  • date attributes should be formatted to unix timestamp.
  • json array is for bulk request.

@gicrisf
Copy link

gicrisf commented Jun 23, 2021

I have the same issue with italian language:

Uncaught Error: Cannot load un-registered function: trimmer-it
    load http://127.0.0.1:1111/elasticlunr.min.js:10
    load http://127.0.0.1:1111/elasticlunr.min.js:10
    load http://127.0.0.1:1111/elasticlunr.min.js:10
    initSearch http://127.0.0.1:1111/assets/js/search.js:145

I really don't get why this is happening. I tried with es and it
gives me the same error for the equivalent missing function: trimmer-es.
I tried with a tiny site and a large one, truncating the content with full content
and without the content. It just complain about this unregistered function.

Any ideas?
Thanks a lot for your work

@gicrisf
Copy link

gicrisf commented Jun 23, 2021

Oh, I got it: it's all explained in the official documentation of elasticlunr. Language support requires two more js files as explained here.

Anyone can add the files this way:

<script src="{{ get_url(path='assets/js/lunr.stemmer.support.js', trailing_slash=false) | safe }}"></script>
<script src="{{ get_url(path='assets/js/lunr.$LANG.js', trailing_slash=false) | safe }}"></script>

Better deferring:

<script defer src="{{ get_url(path='assets/js/lunr.stemmer.support.js', trailing_slash=false) | safe }}"></script>
<script defer src="{{ get_url(path='assets/js/lunr.$LANG.js', trailing_slash=false) | safe }}"></script>

We could add this to zola docs, maybe.

@Keats
Copy link
Collaborator

Keats commented Jun 24, 2021

Hmm I'm not using those in the docs? https://github.com/getzola/zola/blob/master/docs/templates/index.html#L105-L107

Ah it looks required for languages other than English?

@gicrisf
Copy link

gicrisf commented Jun 25, 2021

Yes, those files are are mandatory for other languages. I extended my theme like this:

{% if config.build_search_index %}
<script src="{{ get_url(path='assets/js/search.js', trailing_slash=false) | safe }}"></script>
<script defer src="{{ get_url(path='elasticlunr.min.js', trailing_slash=false) | safe }}"></script>
{%- if config.default_language or config.default_language != "en" -%}
{%- set search_index_file = "search_index." ~ config.default_language ~ ".js" %}
{%- set lunr_lang_file = "assets/js/lunr-languages/lunr." ~ config.default_language ~ ".min.js" -%}
<script defer src="{{ get_url(path='assets/js/lunr-languages/lunr.stemmer.support.min.js', trailing_slash=false) | safe }}"></script>
<script defer src="{{ get_url(path=lunr_lang_file, trailing_slash=false) | safe }}"></script>
<script defer src="{{ get_url(path=search_index_file, trailing_slash=false) | safe }}"></script>
{%- else -%}
<script defer src="{{ get_url(path='search_index.en.js', trailing_slash=false) | safe }}"></script>
{%- endif -%}
{% endif %}
{% endmacro script %}

Here the commit with the new assets for my theme. I can make a pull request with all the useful changes.

@Keats
Copy link
Collaborator

Keats commented Jul 19, 2021

@mr-chrome can you do a PR fo the docs?

@gaxxx
Copy link

gaxxx commented Nov 5, 2021

I've the same issue and fix that in my blog.

use this one , https://blog.gaxxx.me/js/lunr.zh.js , modified from MihaiValentin's version

remember to add lunr.stemmer.support.js as well, something like this

 <script src="https://blog.gaxxx.me/js/elasticlunr.min.js"></script>
  <script src="https://blog.gaxxx.me/search_index.zh.js"></script>
 <script src="https://blog.gaxxx.me/js/lunr.stemmer.support.js"></script>
 <script src="https://blog.gaxxx.me/js/lunr.zh.js"></script>
 <script src="https://blog.gaxxx.me/js/search.js"></script>

Hope this could help you out.

@azzamsa
Copy link

azzamsa commented Dec 1, 2021

Spend an hour just to decide to use Zola or Docosaurus. The only reason not to choose Zola is the docsearch support.

I saw some projects also had similar issue codewars/docs#248

Will Zola support Algolia Docsearch?

@Keats
Copy link
Collaborator

Keats commented Dec 1, 2021

I would take a PR to emit the search index data in the algolia format instead of elasticlunr

@Keats
Copy link
Collaborator

Keats commented Jan 23, 2022

Can someone interested open either a new issue or a PR for the Algolia support? I'll close that one once the docs are updated to fix the original issue.

@Keats Keats added the done in pr Already done in a PR label Jan 23, 2022
Keats added a commit that referenced this issue Jan 23, 2022
@azzamsa
Copy link

azzamsa commented Jan 26, 2022

Can someone interested open either a new issue or a PR for the Algolia support? I'll close that one once the docs are updated to fix the original issue.

#1745

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation done in pr Already done in a PR
Projects
None yet
Development

No branches or pull requests

5 participants