Completion Suggester V2 by areek · Pull Request #11740 · elastic/elasticsearch

areek · 2015-06-18T03:52:00Z

Overview

The motivation behind the new completion field is to support auto-complete/search-as-you-type functionality. This is a navigational feature to guide users to relevant results as they are typing, improving search precision. Ideally, auto-complete functionality should be as fast as a user types to provide instant feedback relevant to what a user has already typed in.

This PR introduces a new completion field (#10746) using Lucene’s new suggest API. The new field has a superset of functionalities provided by the existing completion field (CompletionSuggester and ContextSuggester), namely near-real time search, document retrieval, support for multiple contexts and flexible scoring through index and query-time boosts.

Use Case

Guide users to relevant documents by suggesting song titles as they type. In this example, the suggestions to serve the user will be indexed as a completion field named title_suggest.

The mapping snippet below adds a completion field named title_suggest

...
"properties": {
 "title_suggest": {
  "type": "completion"
 }
}

To add title suggestions, index documents with title_suggest field.

{
 "title_suggest" : "Californication",
}

You can also specify an array or an object to specify multiple values and/or configure index-time weights for completion values. Index-time Weights (maybe call this index-time boost?) are for sorting suggestions that match a prefix. For example, is "Californication" or "Can't Stop" more relevant for a prefix query of "ca"?

Querying

To query against a CompletionField, use _suggest API (see #10746 for details).

Apart from simple Prefix Query, you can use Fuzzy Prefix Query to get typo-tolerant suggestions. This queries a set of similar prefixes (based on edit distance) and scores the suggestions based on how similar they are to what the user has typed in. For example, a prefix query of "cale" would suggest "Californication" and "Can't Stop" in order.

There is support for expressing a query prefix as a Regular Expression. For example, ca[l-n] will suggest titles starting with "cal", "cam" or "can".

Scoring and Filtering

It is often desirable to serve suggestions filtered and/or boosted by some criteria. For example, you want to suggest song titles filtered by certain artists or you want to boost song titles based on their genre.

To achieve suggestion filtering and/or boosting, you can add contexts while configuring a completion field. You can define multiple contexts for a completion field (NOTE: adding contexts increases the index size). Every context has a unique name and a type. Currently, there are two types category and geo.

To use genre as a context to song suggestion, a completion field is configured with a 'genre_context' context which indexes values from field 'genre' as shown below

...
"properties": {
 "title_suggest": {
  "type": "completion",
  "contexts" : [
   { "name": "genre_context", "type" : "category", "path": "genre" }
  ]
 }
}

Suggestions with contexts can be added by indexing a document as follows:

{
 "title_suggest" : "Californication",
 "genre" : ["funk rock", "alterative_rock", "funk metal"],
}

The default behaviour for context-enabled fields is to suggest from all contexts. You can filter on defined context(s) at query time by adding query contexts. For example, to restrict song suggestions to only 'funk rock' and 'indie' genres.

"suggest-namespace" : {
 ...
 "completion" : {
  "field" : "title_suggest",
  "contexts": {           
   "genre_context": ["funk rock", "indie"]
  }
 }
}

You can boost suggestions for some genres more than others. For example, restrict song suggestions to all genres that start with "funk" but boost song suggestions with "funk rock" genre.

...
"genre_context": [ 
 { "context": "funk", "prefix": true }, 
 { "context": "funk rock", "boost": 4 } 
]

Configuring a completion field with multiple contexts, allows filtering/boosting suggestions by multiple criteria. For example, adding artist and genre contexts as follows:

...
"title_suggest": {
 "type": "completion",
 "contexts" : [
  { "name": "genre_context", "type" : "category", "path": "genre" },
  { "name": "artist_context", "type" : "category", "path": "artist" }
 ]
}

You can restrict song suggestions to "funk rock" and "alternative rock" genre and boost songs by "rhcp" by adding the following query contexts.

"contexts": {           
 "genre_context": [ "funk rock", "alternative rock" ],
 "artist_context" : { "context" : "rhcp", "boost": 4 }
}

The query response for any context-enabled field, will include all the associated context values for that entry, along with the suggestion and weight.

NOTE: Adding more contexts to a completion field will result in search performance degradation (when using match all context) and increase the index size.

PR status

This PR excludes the Lucene related changes (LUCENE-6459 and smaller changes) to keep the PR es specific.
Retrieving documents with suggestions are not implemented in this PR, and will be done in a subsequent PR
Todo: Benchmark context suggester

rjernst · 2015-06-18T17:06:34Z

For backcompat, why can't the field type name stay the same, and the logic to decide which actual mapper/fieldtype to create be when parsing (where we have the index created version)?

areek · 2015-06-18T17:17:54Z

@rjernst I think that might be confusing to the users, you have the same field type that behaves differently w.r.t. when they were created?
My thoughts, we use the index created version, to map the the old field type from 'completion' -> 'completion_old' for older versions, and make it clear what is the expected behaviour. If you want to use the new type, you explicitly create a 'completion' field type or if you still rely on features such as attaching payloads (not supported for the new field type) you continue using 'completion_old'. Thoughts?

rjernst · 2015-06-18T17:20:31Z

I think that is no more confusing that what I suggested since someone that doesn't know about the new type will get the "new" behavior (they will get a new completion field when they create another with type completion).

rjernst · 2015-06-18T17:21:01Z

And I don't think we should allow creating the old type.

areek · 2015-06-18T17:47:13Z

I agree it being somewhat confusing, it can be solved through documentation (same can be done for your suggestion :)). To me, it is easier to reason about new behaviour tied to a different type name then that of the old? But if we don't allow creating the old type, what you are suggesting makes sense to me.
So, we will check the index creation version and choose which field-type/suggester to be used. I will explore this, thanks for the suggestion, @rjernst!

clintongormley · 2015-06-23T16:28:25Z

The bwc decision is hard. I agree with @rjernst that we shouldn't allow users to create new indices with the old suggester, but that existing indices should still work.

We can choose which suggester to use a query time based on index creation date/version. The behaviour will be different (e.g. with the old you get back payloads, with the new you get back documents). I think that almost everybody would use a suggester on a single index, rather than across multiple indices, so mixed behaviour probably isn't a problem (although perhaps we should not allow combining results from old and new indices?)

If we leave the field type the same (ie completion) there is no visual indication to the user that queries will behave differently, but I don't much like the completion_old name either...

clintongormley · 2015-06-23T16:28:46Z

@areek what is exact? does it mean "not fuzzy"?

areek · 2015-06-23T16:35:01Z

@clintongormley when exact is false for query context, the context in question is treated as a prefix of indexed contexts.
For example, if you have indexed contexts type1, type2 and type3, you could boost/filter by all three contexts by using a query context of

{ "value": "type", "exact": false, "boost": .. }

instead of:

{ "value": "type1", "boost": .. }
{ "value": "type2", "boost": .. }
{ "value": "type3", "boost": .. }

Maybe we could have a better name for this option? thoughts?

clintongormley · 2015-06-23T19:41:14Z

Maybe we could have a better name for this option? thoughts?

@areek maybe we use context instead of value, and use prefix when it isn't exact?

abhijitiitr · 2015-06-30T06:35:54Z

@areek I tested these changes and it feels wonderful to have such a great deal of flexibility. But I faced a few issues. As @clintongormley said

(e.g. with the old you get back payloads, with the new you get back documents).

I couldn't find a way to get documents currently. If there are multiple inputs having multiple contexts and weights per doc for a completion type field, the suggester currently doesn't differentiate between inputs of same and different docs.
Can you suggest ways of differentiating output on the basis of their docs with these changes?

areek · 2015-07-02T01:22:02Z

@abhijitiitr Thanks for testing the new suggester. It would be awesome if you could share any details about the data set you tested it with :)

I couldn't find a way to get documents currently.

This feature is not implemented in this PR yet, but will be done as a subsequent PR

If there are multiple inputs having multiple contexts and weights per doc for a completion type field, the suggester currently doesn't differentiate between inputs of same and different docs.

This is a known issue and will be fixed. The suggested inputs that correspond to the same doc will be deduplicated.

Can you suggest ways of differentiating output on the basis of their docs with these changes?

Once the document fetch phase is implemented, the associated document will be returned with the suggestion.

leonardehrenfried · 2015-07-02T16:04:50Z

I'm not entirely sure from reading this PR whether it will be possible but it seems that payloads are no longer supported in the new suggester. This would make me quite sad since I discovered a bug regarding payloads in the old suggester and was hoping it would be fixed with the new one.

On the other hand I read the following comment: "(e.g. with the old you get back payloads, with the new you get back documents)". This would indeed solve my problem. Is it true that the new suggest API returns the complete document?

areek · 2015-07-02T16:37:03Z

Is it true that the new suggest API returns the complete document?

@Lenniboy Yes, so instead of storing a value as a payload, you can store it as a field in the document. You will get back the document with the suggestions.

clintongormley · 2015-07-03T09:56:45Z

@areek @jpountz came up with a simple solution to the bwc issue. Call the new field type and suggester complete instead of completion. Problem solved

rjernst · 2015-07-03T10:16:25Z

That sounds like a horrible idea. Those two names are far too close to each other: very confusing. Why is it a problem to force the new field type on 2.0+ indexes?

clintongormley · 2015-07-03T10:19:08Z

@rjernst more confusing than having the same name return different results, and no easy way to distinguish them, and having an exception thrown if you try to do a completion request across old and new indices? And what about if you try to add a completion field to an old index - what happens then?

rjernst · 2015-07-03T10:21:43Z

@rjernst more confusing than having the same name return different results, and no easy way to distinguish them, and having an exception thrown if you try to do a completion request across old and new indices?

Yes.

And what about if you try to add a completion field to an old index - what happens then?

The same thing that happens for other mapper changes we've done. The behavior stays the same on indexes before 2.0 (for new fields as well) and the new behavior happens with indexes created on or after 2.0.

clintongormley · 2015-07-03T10:28:47Z

The same thing that happens for other mapper changes we've done. The behavior stays the same on indexes before 2.0 (for new fields as well) and the new behavior happens with indexes created on or after 2.0.

Today you can run a completion suggester across multiple indices. If you try this with what you are proposing, then you'll either (a) not be able to reduce the results or (b) get a mixed result set, some with payloads and some with docs.

Plus the new suggester supports things that the old suggester doesn't, which could lead to more confusing exceptions. What I don't like about keeping the same name is that it is hard for the user to know which index is going to behave in what way. Changing the name resolves that: "no field of type complete found in index foo"

areek · 2015-07-22T06:51:26Z

Updated PR:

Backwards compatibility
- completion field created on a pre 2.0 index will use the old implementation
- completion field created on a post 2.0 index will use the new implementation
- if new and old completion field is queried at the same time (e.g. same field name across different indices), then an IAE is thrown at the reduce phase
Old Completion suggester now resides in org.elasticsearch.search.suggest.completion.old
First draft of API docs for new suggester: Completion Suggester, Context Suggester
Increased test coverage
geo context mapping work

@clintongormley would be awesome to get some feedback on the API docs (linked above).
@rjernst @jpountz would be great to get some feedback on the new completion field

…suggestion is indexed

areek · 2015-08-07T05:10:50Z

This PR has been merged to a feature branch https://github.com/elastic/elasticsearch/tree/completion_suggester_v2

jrots · 2015-09-21T12:41:44Z

When will this be available in v2.0?

areek added v2.0.0-beta1 :Search Relevance/Suggesters "Did you mean" and suggestions as you type labels Jun 18, 2015

areek force-pushed the comp_suggester_v2 branch 2 times, most recently from f0c74e1 to 55ecbf6 Compare June 18, 2015 04:39

clintongormley mentioned this pull request Jul 2, 2015

Completion suggester payload is not being updated #11991

Closed

areek force-pushed the completion_suggester_v2 branch from 74602bd to ac77028 Compare July 21, 2015 22:15

areek force-pushed the comp_suggester_v2 branch 2 times, most recently from 85e946b to d4b4f91 Compare July 22, 2015 06:48

areek force-pushed the comp_suggester_v2 branch 3 times, most recently from 7a44cae to 08a2a88 Compare July 22, 2015 07:03

areek added 17 commits August 6, 2015 12:31

minor query context fixes

10d3bb8

added context mapping unit tests

916bf23

throw error on querying mixed completion suggester version

fac999c

add context suggest doc

4dcd7c8

add api docs

cce58cf

re-structure tests & fold in upstream changes

a1151b4

fix test issues; cleanup

1ac57d2

added more docs

d1a6a6f

change index_analyzer to analyzer

76d0921

incorporate docs feedback

db70e34

add back query-time precision setting for geo contexts

d4de260

sync index & query time precision setting for geo context

27b85c3

cut-over to using typed contexts; now internally a context value per …

68c7e4e

…suggestion is indexed

incorporate feedback

41abf8b

change limit of unique context types from 10 to 255

78edaa4

update to master

e6173ff

add rest tests

2a75eed

areek force-pushed the comp_suggester_v2 branch from 48494b1 to 2a75eed Compare August 6, 2015 17:00

areek force-pushed the completion_suggester_v2 branch 2 times, most recently from bf66cca to 01f2196 Compare August 7, 2015 05:04

areek closed this Aug 7, 2015

clintongormley added >feature and removed v2.0.0-beta1 labels Aug 7, 2015

areek mentioned this pull request Sep 15, 2015

Completion Suggester: Support returning documents with completions #13576

Closed

areek mentioned this pull request Sep 29, 2015

Completion Suggester V2 #10746

Closed

areek mentioned this pull request Oct 31, 2015

Add document-oriented completion suggester #14410

Merged

This was referenced Nov 21, 2015

Context Suggester: Context does not work with non strings in mapping #6512

Closed

Context Suggester: Need better error message if you omit category when indexing. #7388

Closed

Conversation

areek commented Jun 18, 2015

Overview

Use Case

Querying

Scoring and Filtering

PR status

Uh oh!

rjernst commented Jun 18, 2015

Uh oh!

areek commented Jun 18, 2015

Uh oh!

rjernst commented Jun 18, 2015

Uh oh!

rjernst commented Jun 18, 2015

Uh oh!

areek commented Jun 18, 2015

Uh oh!

clintongormley commented Jun 23, 2015

Uh oh!

clintongormley commented Jun 23, 2015

Uh oh!

areek commented Jun 23, 2015

Uh oh!

clintongormley commented Jun 23, 2015

Uh oh!

abhijitiitr commented Jun 30, 2015

Uh oh!

areek commented Jul 2, 2015

Uh oh!

leonardehrenfried commented Jul 2, 2015

Uh oh!

areek commented Jul 2, 2015

Uh oh!

clintongormley commented Jul 3, 2015

Uh oh!

rjernst commented Jul 3, 2015

Uh oh!

clintongormley commented Jul 3, 2015

Uh oh!

rjernst commented Jul 3, 2015

Uh oh!

clintongormley commented Jul 3, 2015

Uh oh!

areek commented Jul 22, 2015

Uh oh!

areek commented Aug 7, 2015

Uh oh!

jrots commented Sep 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants