Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: Highlight exact matches improvements #5270

Closed
humitos opened this issue Feb 12, 2019 · 12 comments
Closed

Search: Highlight exact matches improvements #5270

humitos opened this issue Feb 12, 2019 · 12 comments
Assignees
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code Needed: design decision A core team decision is required

Comments

@humitos
Copy link
Member

humitos commented Feb 12, 2019

In #4292 exact match search was implemented, but I'm not sure it is working as expected or at least it could be improved in some ways.

Following this link, we search for "404 not found"

  • I expect to get the whole phrase highlighted
  • only get results where the whole text appears in that specific order
  • there are results where not is highlighted alone, which makes me confuse

In this another example, I search for "results not found here"

  • do not fallback on sphinx search when using exact match

Another search for "custom domains"

@humitos humitos added Improvement Minor improvement to code Accepted Accepted issue on our roadmap labels Feb 12, 2019
@humitos humitos added this to the Search improvements milestone Feb 12, 2019
@ericholscher ericholscher self-assigned this Feb 27, 2019
@ericholscher
Copy link
Member

This is using the ES Simple String Query -- but it has almost no documentation on how it works. Quoting the search does result in a smaller set of results than not quoting, but I don't fully understand what is happening.

@ericholscher
Copy link
Member

ericholscher commented Feb 27, 2019

It seems that search is properly searching for only "custom domains" for example, but once it hits the highlighter, we're getting highlighted results that only contain part of the queried term.

@ericholscher
Copy link
Member

ericholscher commented Feb 27, 2019

You can see the results better on site search: https://readthedocs.org/search/?q=%22custom+domains%22&type=file&version=latest&project=docs

@ericholscher
Copy link
Member

Looks like a known issue: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html

Highlighters don’t reflect the boolean logic of a query when extracting terms to highlight. Thus, for some complex boolean queries (e.g nested boolean queries, queries using minimum_should_match etc.), parts of documents may be highlighted that don’t correspond to query matches.

@dojutsu-user
Copy link
Member

@humitos
We have removed the highlight url param, is this issue still valid?

@stsewd
Copy link
Member

stsewd commented May 7, 2020

The highlight is back, but not sure if we can do much about this.

@stsewd stsewd added the Needed: design decision A core team decision is required label May 13, 2020
@stsewd stsewd changed the title Exact matching search improvements Search: Highlight exact matches improvements May 13, 2020
@stsewd
Copy link
Member

stsewd commented May 20, 2020

From the docs https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-highlighting.html

fragmenter

    Specifies how text should be broken up in highlight snippets: simple or span. Only valid for the plain highlighter. Defaults to span.

    simple
        Breaks up text into same-sized fragments. 
    span
        Breaks up text into same-sized fragments, but tries to avoid breaking up text between highlighted terms. This is helpful when you’re querying for phrases. Default. 

Didn't test how it looks like, but maybe I'll give it a try later.

@stsewd
Copy link
Member

stsewd commented Nov 4, 2020

I expect to get the whole phrase highlighted

I'll see if I can play around operators and see if there is any improvement, but this again falls to try to parse the query.

do not fallback on sphinx search when using exact match

We always need to fallback to the original search when there aren't results, this way we guarantee we don't miss anything. The search extension doesn't fallback.

which contains ?highlight="custom domains" which makes the JS code to get confused and do not highlight anything

That may be tricky to solve, we could analyze the original query and return the intended text without any operators (but this could result in dropping some original content from the query). And we would need to implement a mini-parser for the single simple query (probably not that hard).

@stsewd
Copy link
Member

stsewd commented Nov 5, 2020

I'll see if I can play around operators and see if there is any improvement, but this again falls to try to parse the query.

Nope, using just the and operator didn't solve this.

@humitos
Copy link
Member Author

humitos commented Sep 15, 2023

This doesn't seem to be a problem with the highlight only. If you take a look at the example that I shared before, https://docs.readthedocs.io/en/stable/search.html?q=%22results+not+found+here%22&check_keywords=yes&area=default you will notice that "results not found here" does not appear in the resulting page.

@stsewd
Copy link
Member

stsewd commented Sep 18, 2023

@humitos what you are looking there is the default Sphinx search, our search returns zero results for that query.

@humitos
Copy link
Member Author

humitos commented Feb 21, 2024

Closing in favor of readthedocs/addons#36

@humitos humitos closed this as completed Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code Needed: design decision A core team decision is required
Projects
None yet
Development

No branches or pull requests

4 participants