Completion Suggester V2#11740
Completion Suggester V2#11740areek wants to merge 23 commits intoelastic:completion_suggester_v2from
Conversation
f0c74e1 to
55ecbf6
Compare
|
For backcompat, why can't the field type name stay the same, and the logic to decide which actual mapper/fieldtype to create be when parsing (where we have the index created version)? |
|
@rjernst I think that might be confusing to the users, you have the same field type that behaves differently w.r.t. when they were created? |
|
I think that is no more confusing that what I suggested since someone that doesn't know about the new type will get the "new" behavior (they will get a new completion field when they create another with type |
|
And I don't think we should allow creating the old type. |
|
I agree it being somewhat confusing, it can be solved through documentation (same can be done for your suggestion :)). To me, it is easier to reason about new behaviour tied to a different type name then that of the old? But if we don't allow creating the old type, what you are suggesting makes sense to me. |
|
The bwc decision is hard. I agree with @rjernst that we shouldn't allow users to create new indices with the old suggester, but that existing indices should still work. We can choose which suggester to use a query time based on index creation date/version. The behaviour will be different (e.g. with the old you get back payloads, with the new you get back documents). I think that almost everybody would use a suggester on a single index, rather than across multiple indices, so mixed behaviour probably isn't a problem (although perhaps we should not allow combining results from old and new indices?) If we leave the field type the same (ie |
|
@areek what is |
|
@clintongormley when instead of: Maybe we could have a better name for this option? thoughts? |
@areek maybe we use |
|
@areek I tested these changes and it feels wonderful to have such a great deal of flexibility. But I faced a few issues. As @clintongormley said
I couldn't find a way to get documents currently. If there are multiple inputs having multiple contexts and weights per doc for a |
|
@abhijitiitr Thanks for testing the new suggester. It would be awesome if you could share any details about the data set you tested it with :)
This feature is not implemented in this PR yet, but will be done as a subsequent PR
This is a known issue and will be fixed. The suggested inputs that correspond to the same doc will be deduplicated.
Once the document fetch phase is implemented, the associated document will be returned with the suggestion. |
|
I'm not entirely sure from reading this PR whether it will be possible but it seems that payloads are no longer supported in the new suggester. This would make me quite sad since I discovered a bug regarding payloads in the old suggester and was hoping it would be fixed with the new one. On the other hand I read the following comment: "(e.g. with the old you get back payloads, with the new you get back documents)". This would indeed solve my problem. Is it true that the new suggest API returns the complete document? |
@Lenniboy Yes, so instead of storing a value as a payload, you can store it as a field in the document. You will get back the document with the suggestions. |
|
That sounds like a horrible idea. Those two names are far too close to each other: very confusing. Why is it a problem to force the new field type on 2.0+ indexes? |
|
@rjernst more confusing than having the same name return different results, and no easy way to distinguish them, and having an exception thrown if you try to do a completion request across old and new indices? And what about if you try to add a completion field to an old index - what happens then? |
Yes.
The same thing that happens for other mapper changes we've done. The behavior stays the same on indexes before 2.0 (for new fields as well) and the new behavior happens with indexes created on or after 2.0. |
Today you can run a completion suggester across multiple indices. If you try this with what you are proposing, then you'll either (a) not be able to reduce the results or (b) get a mixed result set, some with payloads and some with docs. Plus the new suggester supports things that the old suggester doesn't, which could lead to more confusing exceptions. What I don't like about keeping the same name is that it is hard for the user to know which index is going to behave in what way. Changing the name resolves that: "no field of type |
74602bd to
ac77028
Compare
85e946b to
d4b4f91
Compare
|
Updated PR:
@clintongormley would be awesome to get some feedback on the API docs (linked above). |
7a44cae to
08a2a88
Compare
…suggestion is indexed
48494b1 to
2a75eed
Compare
bf66cca to
01f2196
Compare
|
This PR has been merged to a feature branch https://github.com/elastic/elasticsearch/tree/completion_suggester_v2 |
|
When will this be available in v2.0? |
Overview
The motivation behind the new completion field is to support auto-complete/search-as-you-type functionality. This is a navigational feature to guide users to relevant results as they are typing, improving search precision. Ideally, auto-complete functionality should be as fast as a user types to provide instant feedback relevant to what a user has already typed in.
This PR introduces a new completion field (#10746) using Lucene’s new suggest API. The new field has a superset of functionalities provided by the existing completion field (CompletionSuggester and ContextSuggester), namely near-real time search, document retrieval, support for multiple contexts and flexible scoring through index and query-time boosts.
Use Case
Guide users to relevant documents by suggesting song titles as they type. In this example, the suggestions to serve the user will be indexed as a completion field named
title_suggest.The mapping snippet below adds a completion field named
title_suggestTo add title suggestions, index documents with
title_suggestfield.{ "title_suggest" : "Californication", }You can also specify an array or an object to specify multiple values and/or configure index-time weights for completion values. Index-time Weights (maybe call this index-time boost?) are for sorting suggestions that match a prefix. For example, is "Californication" or "Can't Stop" more relevant for a prefix query of "ca"?
Querying
To query against a CompletionField, use
_suggestAPI (see #10746 for details).Apart from simple Prefix Query, you can use Fuzzy Prefix Query to get typo-tolerant suggestions. This queries a set of similar prefixes (based on edit distance) and scores the suggestions based on how similar they are to what the user has typed in. For example, a prefix query of "cale" would suggest "Californication" and "Can't Stop" in order.
There is support for expressing a query prefix as a Regular Expression. For example,
ca[l-n]will suggest titles starting with "cal", "cam" or "can".Scoring and Filtering
It is often desirable to serve suggestions filtered and/or boosted by some criteria. For example, you want to suggest song titles filtered by certain artists or you want to boost song titles based on their genre.
To achieve suggestion filtering and/or boosting, you can add contexts while configuring a completion field. You can define multiple contexts for a completion field (NOTE: adding contexts increases the index size). Every context has a unique name and a type. Currently, there are two types
categoryandgeo.To use genre as a context to song suggestion, a completion field is configured with a 'genre_context' context which indexes values from field 'genre' as shown below
Suggestions with contexts can be added by indexing a document as follows:
{ "title_suggest" : "Californication", "genre" : ["funk rock", "alterative_rock", "funk metal"], }The default behaviour for context-enabled fields is to suggest from all contexts. You can filter on defined context(s) at query time by adding query contexts. For example, to restrict song suggestions to only 'funk rock' and 'indie' genres.
You can boost suggestions for some genres more than others. For example, restrict song suggestions to all genres that start with "funk" but boost song suggestions with "funk rock" genre.
Configuring a completion field with multiple contexts, allows filtering/boosting suggestions by multiple criteria. For example, adding artist and genre contexts as follows:
You can restrict song suggestions to "funk rock" and "alternative rock" genre and boost songs by "rhcp" by adding the following query contexts.
The query response for any context-enabled field, will include all the associated context values for that entry, along with the suggestion and weight.
NOTE: Adding more contexts to a completion field will result in search performance degradation (when using match all context) and increase the index size.
PR status