Skip to content

Completion Suggester V2#11740

Closed
areek wants to merge 23 commits intoelastic:completion_suggester_v2from
areek:comp_suggester_v2
Closed

Completion Suggester V2#11740
areek wants to merge 23 commits intoelastic:completion_suggester_v2from
areek:comp_suggester_v2

Conversation

@areek
Copy link
Contributor

@areek areek commented Jun 18, 2015

Overview

The motivation behind the new completion field is to support auto-complete/search-as-you-type functionality. This is a navigational feature to guide users to relevant results as they are typing, improving search precision. Ideally, auto-complete functionality should be as fast as a user types to provide instant feedback relevant to what a user has already typed in.

This PR introduces a new completion field (#10746) using Lucene’s new suggest API. The new field has a superset of functionalities provided by the existing completion field (CompletionSuggester and ContextSuggester), namely near-real time search, document retrieval, support for multiple contexts and flexible scoring through index and query-time boosts.

Use Case

Guide users to relevant documents by suggesting song titles as they type. In this example, the suggestions to serve the user will be indexed as a completion field named title_suggest.

The mapping snippet below adds a completion field named title_suggest

...
"properties": {
 "title_suggest": {
  "type": "completion"
 }
}

To add title suggestions, index documents with title_suggest field.

{
 "title_suggest" : "Californication",
}

You can also specify an array or an object to specify multiple values and/or configure index-time weights for completion values. Index-time Weights (maybe call this index-time boost?) are for sorting suggestions that match a prefix. For example, is "Californication" or "Can't Stop" more relevant for a prefix query of "ca"?

Querying

To query against a CompletionField, use _suggest API (see #10746 for details).

Apart from simple Prefix Query, you can use Fuzzy Prefix Query to get typo-tolerant suggestions. This queries a set of similar prefixes (based on edit distance) and scores the suggestions based on how similar they are to what the user has typed in. For example, a prefix query of "cale" would suggest "Californication" and "Can't Stop" in order.

There is support for expressing a query prefix as a Regular Expression. For example, ca[l-n] will suggest titles starting with "cal", "cam" or "can".

Scoring and Filtering

It is often desirable to serve suggestions filtered and/or boosted by some criteria. For example, you want to suggest song titles filtered by certain artists or you want to boost song titles based on their genre.

To achieve suggestion filtering and/or boosting, you can add contexts while configuring a completion field. You can define multiple contexts for a completion field (NOTE: adding contexts increases the index size). Every context has a unique name and a type. Currently, there are two types category and geo.

To use genre as a context to song suggestion, a completion field is configured with a 'genre_context' context which indexes values from field 'genre' as shown below

...
"properties": {
 "title_suggest": {
  "type": "completion",
  "contexts" : [
   { "name": "genre_context", "type" : "category", "path": "genre" }
  ]
 }
}

Suggestions with contexts can be added by indexing a document as follows:

{
 "title_suggest" : "Californication",
 "genre" : ["funk rock", "alterative_rock", "funk metal"],
}

The default behaviour for context-enabled fields is to suggest from all contexts. You can filter on defined context(s) at query time by adding query contexts. For example, to restrict song suggestions to only 'funk rock' and 'indie' genres.

"suggest-namespace" : {
 ...
 "completion" : {
  "field" : "title_suggest",
  "contexts": {           
   "genre_context": ["funk rock", "indie"]
  }
 }
}

You can boost suggestions for some genres more than others. For example, restrict song suggestions to all genres that start with "funk" but boost song suggestions with "funk rock" genre.

...
"genre_context": [ 
 { "context": "funk", "prefix": true }, 
 { "context": "funk rock", "boost": 4 } 
]

Configuring a completion field with multiple contexts, allows filtering/boosting suggestions by multiple criteria. For example, adding artist and genre contexts as follows:

...
"title_suggest": {
 "type": "completion",
 "contexts" : [
  { "name": "genre_context", "type" : "category", "path": "genre" },
  { "name": "artist_context", "type" : "category", "path": "artist" }
 ]
}

You can restrict song suggestions to "funk rock" and "alternative rock" genre and boost songs by "rhcp" by adding the following query contexts.

"contexts": {           
 "genre_context": [ "funk rock", "alternative rock" ],
 "artist_context" : { "context" : "rhcp", "boost": 4 }
}

The query response for any context-enabled field, will include all the associated context values for that entry, along with the suggestion and weight.

NOTE: Adding more contexts to a completion field will result in search performance degradation (when using match all context) and increase the index size.

PR status

  • This PR excludes the Lucene related changes (LUCENE-6459 and smaller changes) to keep the PR es specific.
  • Retrieving documents with suggestions are not implemented in this PR, and will be done in a subsequent PR
  • Todo: Benchmark context suggester

@areek areek added v2.0.0-beta1 :Search Relevance/Suggesters "Did you mean" and suggestions as you type labels Jun 18, 2015
@areek areek force-pushed the comp_suggester_v2 branch 2 times, most recently from f0c74e1 to 55ecbf6 Compare June 18, 2015 04:39
@rjernst
Copy link
Member

rjernst commented Jun 18, 2015

For backcompat, why can't the field type name stay the same, and the logic to decide which actual mapper/fieldtype to create be when parsing (where we have the index created version)?

@areek
Copy link
Contributor Author

areek commented Jun 18, 2015

@rjernst I think that might be confusing to the users, you have the same field type that behaves differently w.r.t. when they were created?
My thoughts, we use the index created version, to map the the old field type from 'completion' -> 'completion_old' for older versions, and make it clear what is the expected behaviour. If you want to use the new type, you explicitly create a 'completion' field type or if you still rely on features such as attaching payloads (not supported for the new field type) you continue using 'completion_old'. Thoughts?

@rjernst
Copy link
Member

rjernst commented Jun 18, 2015

I think that is no more confusing that what I suggested since someone that doesn't know about the new type will get the "new" behavior (they will get a new completion field when they create another with type completion).

@rjernst
Copy link
Member

rjernst commented Jun 18, 2015

And I don't think we should allow creating the old type.

@areek
Copy link
Contributor Author

areek commented Jun 18, 2015

I agree it being somewhat confusing, it can be solved through documentation (same can be done for your suggestion :)). To me, it is easier to reason about new behaviour tied to a different type name then that of the old? But if we don't allow creating the old type, what you are suggesting makes sense to me.
So, we will check the index creation version and choose which field-type/suggester to be used. I will explore this, thanks for the suggestion, @rjernst!

@clintongormley
Copy link
Contributor

The bwc decision is hard. I agree with @rjernst that we shouldn't allow users to create new indices with the old suggester, but that existing indices should still work.

We can choose which suggester to use a query time based on index creation date/version. The behaviour will be different (e.g. with the old you get back payloads, with the new you get back documents). I think that almost everybody would use a suggester on a single index, rather than across multiple indices, so mixed behaviour probably isn't a problem (although perhaps we should not allow combining results from old and new indices?)

If we leave the field type the same (ie completion) there is no visual indication to the user that queries will behave differently, but I don't much like the completion_old name either...

@clintongormley
Copy link
Contributor

@areek what is exact? does it mean "not fuzzy"?

@areek
Copy link
Contributor Author

areek commented Jun 23, 2015

@clintongormley when exact is false for query context, the context in question is treated as a prefix of indexed contexts.
For example, if you have indexed contexts type1, type2 and type3, you could boost/filter by all three contexts by using a query context of

{ "value": "type", "exact": false, "boost": .. }

instead of:

{ "value": "type1", "boost": .. }
{ "value": "type2", "boost": .. }
{ "value": "type3", "boost": .. }

Maybe we could have a better name for this option? thoughts?

@clintongormley
Copy link
Contributor

Maybe we could have a better name for this option? thoughts?

@areek maybe we use context instead of value, and use prefix when it isn't exact?

@abhijitiitr
Copy link

@areek I tested these changes and it feels wonderful to have such a great deal of flexibility. But I faced a few issues. As @clintongormley said

(e.g. with the old you get back payloads, with the new you get back documents).

I couldn't find a way to get documents currently. If there are multiple inputs having multiple contexts and weights per doc for a completion type field, the suggester currently doesn't differentiate between inputs of same and different docs.
Can you suggest ways of differentiating output on the basis of their docs with these changes?

@areek
Copy link
Contributor Author

areek commented Jul 2, 2015

@abhijitiitr Thanks for testing the new suggester. It would be awesome if you could share any details about the data set you tested it with :)

I couldn't find a way to get documents currently.

This feature is not implemented in this PR yet, but will be done as a subsequent PR

If there are multiple inputs having multiple contexts and weights per doc for a completion type field, the suggester currently doesn't differentiate between inputs of same and different docs.

This is a known issue and will be fixed. The suggested inputs that correspond to the same doc will be deduplicated.

Can you suggest ways of differentiating output on the basis of their docs with these changes?

Once the document fetch phase is implemented, the associated document will be returned with the suggestion.

@leonardehrenfried
Copy link

I'm not entirely sure from reading this PR whether it will be possible but it seems that payloads are no longer supported in the new suggester. This would make me quite sad since I discovered a bug regarding payloads in the old suggester and was hoping it would be fixed with the new one.

On the other hand I read the following comment: "(e.g. with the old you get back payloads, with the new you get back documents)". This would indeed solve my problem. Is it true that the new suggest API returns the complete document?

@areek
Copy link
Contributor Author

areek commented Jul 2, 2015

Is it true that the new suggest API returns the complete document?

@Lenniboy Yes, so instead of storing a value as a payload, you can store it as a field in the document. You will get back the document with the suggestions.

@clintongormley
Copy link
Contributor

@areek @jpountz came up with a simple solution to the bwc issue. Call the new field type and suggester complete instead of completion. Problem solved

@rjernst
Copy link
Member

rjernst commented Jul 3, 2015

That sounds like a horrible idea. Those two names are far too close to each other: very confusing. Why is it a problem to force the new field type on 2.0+ indexes?

@clintongormley
Copy link
Contributor

@rjernst more confusing than having the same name return different results, and no easy way to distinguish them, and having an exception thrown if you try to do a completion request across old and new indices? And what about if you try to add a completion field to an old index - what happens then?

@rjernst
Copy link
Member

rjernst commented Jul 3, 2015

@rjernst more confusing than having the same name return different results, and no easy way to distinguish them, and having an exception thrown if you try to do a completion request across old and new indices?

Yes.

And what about if you try to add a completion field to an old index - what happens then?

The same thing that happens for other mapper changes we've done. The behavior stays the same on indexes before 2.0 (for new fields as well) and the new behavior happens with indexes created on or after 2.0.

@clintongormley
Copy link
Contributor

The same thing that happens for other mapper changes we've done. The behavior stays the same on indexes before 2.0 (for new fields as well) and the new behavior happens with indexes created on or after 2.0.

Today you can run a completion suggester across multiple indices. If you try this with what you are proposing, then you'll either (a) not be able to reduce the results or (b) get a mixed result set, some with payloads and some with docs.

Plus the new suggester supports things that the old suggester doesn't, which could lead to more confusing exceptions. What I don't like about keeping the same name is that it is hard for the user to know which index is going to behave in what way. Changing the name resolves that: "no field of type complete found in index foo"

@areek areek force-pushed the completion_suggester_v2 branch from 74602bd to ac77028 Compare July 21, 2015 22:15
@areek areek force-pushed the comp_suggester_v2 branch 2 times, most recently from 85e946b to d4b4f91 Compare July 22, 2015 06:48
@areek
Copy link
Contributor Author

areek commented Jul 22, 2015

Updated PR:

  • Backwards compatibility
    • completion field created on a pre 2.0 index will use the old implementation
    • completion field created on a post 2.0 index will use the new implementation
    • if new and old completion field is queried at the same time (e.g. same field name across different indices), then an IAE is thrown at the reduce phase
  • Old Completion suggester now resides in org.elasticsearch.search.suggest.completion.old
  • First draft of API docs for new suggester: Completion Suggester, Context Suggester
  • Increased test coverage
  • geo context mapping work

@clintongormley would be awesome to get some feedback on the API docs (linked above).
@rjernst @jpountz would be great to get some feedback on the new completion field

@areek areek force-pushed the comp_suggester_v2 branch 3 times, most recently from 7a44cae to 08a2a88 Compare July 22, 2015 07:03
@areek areek force-pushed the comp_suggester_v2 branch from 48494b1 to 2a75eed Compare August 6, 2015 17:00
@areek areek force-pushed the completion_suggester_v2 branch 2 times, most recently from bf66cca to 01f2196 Compare August 7, 2015 05:04
@areek
Copy link
Contributor Author

areek commented Aug 7, 2015

@jrots
Copy link

jrots commented Sep 21, 2015

When will this be available in v2.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :Search Relevance/Suggesters "Did you mean" and suggestions as you type

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants