-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version: 5.2.0
Plugins installed: []
JVM version (java -version): 1.8.0_131
OS version (uname -a if on a Unix-like system):
inux 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
Action: Attempt to highlight multiple fields separately, each having its own highlight_query definition, using Fast Vector Highlighter.
Expected Behaviour: Each higlight_query is used and only used for the field it's specified for.
Actual Behaviour: Only the first field's highlight_query is used, and it's used for all fields.
Steps to reproduce:
- Create index with two fields and "term_vector" : "with_positions_offsets", to use Fast Vector Highlighter
PUT highlighttest
{
"settings" : {
"index" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
}
},
"mappings": {
"doc" : {
"properties": {
"title" : {
"type": "text",
"term_vector" : "with_positions_offsets"
},
"body" : {
"type": "text",
"term_vector" : "with_positions_offsets"
}
}
}
}
}
- Add a document that shares content between fields
PUT highlighttest/doc/1
{
"title": "I love cake",
"body": "I love cake because it's amazing"
}
- Query both fields with a term that matches both fields, with a highlight definition that uses both fields, each having its own highlight_query on the field it highlights
GET highlighttest/doc/_search
{
"query": {
"bool": {
"should" :[
{
"match": {
"title":{
"query":"cake"
}
}
},
{
"match": {
"body":{
"query":"cake"
}
}
}
]
}
},
"highlight": {
"fields": {
"title":{
"highlight_query": {
"match": {
"title":{
"query":"cake"
}
}
}
},
"body":{
"highlight_query": {
"match": {
"body":{
"query":"cake"
}
}
}
}
}
}
}
- The query will return highlights, but will attempt to use the first highlight_query for all fields, which returns nothing for the second field, since it does not match.
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.52058303,
"hits": [
{
"_index": "highlighttest",
"_type": "doc",
"_id": "1",
"_score": 0.52058303,
"_source": {
"title": "I love cake",
"body": "I love cake because it's amazing"
},
"highlight": {
"title": [
"I love <em>cake</em>"
]
}
}
]
}
}
- This becomes even more obvious, if you "mix up" the highlight_query-es and use the wrong fields in the "match" expression:
"highlight": {
"fields": {
"title":{
"highlight_query": {
"match": {
"body":{
"query":"cake"
}
}
}
},
"body":{
"highlight_query": {
"match": {
"title":{
"query":"cake"
}
}
}
}
}
}
In which case the result will be the following, clearly showing that only the first highlight_query is used (with match on body), ignoring the other highlight_query definitions.
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.52058303,
"hits": [
{
"_index": "highlighttest",
"_type": "doc",
"_id": "1",
"_score": 0.52058303,
"_source": {
"title": "I love cake",
"body": "I love cake because it's amazing"
},
"highlight": {
"body": [
"I love <em>cake</em> because it's amazing"
]
}
}
]
}
}
Please note that this is a bare-down fabricated example. In our company we have a valid use case for this with complex highlight_query-es for each field and we need each highlight_query to work and only work for the field it's defined for.