Skip to content

Promote Elasticsearch scan performance with index sorting#13987

Merged
zhenxiao merged 2 commits intoprestodb:masterfrom
wuyunfeng:master
Jan 22, 2020
Merged

Promote Elasticsearch scan performance with index sorting#13987
zhenxiao merged 2 commits intoprestodb:masterfrom
wuyunfeng:master

Conversation

@wuyunfeng
Copy link
Contributor

@wuyunfeng wuyunfeng commented Jan 19, 2020

Scroll requests have optimizations that make them faster when the sort order is _doc. If you want to iterate over all documents regardless of the order, this is the most efficient option

[https://www.elastic.co/guide/en/elasticsearch/reference/7.5/search-request-body.html#request-body-search-scroll](scroll request)

When Elasticsearch process the scroll request with _doc sort, it will use MinDocQuery combined with the user's original query , which show as below:

https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/query/QueryPhase.java#L175-L182

This scan-scroll request we also used on Apache - incubator-doris:

https://github.com/apache/incubator-doris/blob/master/be/src/exec/es/es_scroll_query.cpp#L118-L120

This PR would promote about 30%+ scan-filter performance for presto-elsticsearch-connector

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 19, 2020

CLA Check
The committers are authorized under a signed CLA.

Copy link
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work @wuyunfeng
looks good to me, just one style about comments

.setFetchSource(fields.toArray(new String[0]), null)
.setQuery(buildSearchQuery())
.setPreference("_shards:" + shard)
// Scroll requests have optimizations that make them faster when the sort order is _doc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about move the comments above line 106, so that comments not interleave with code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wuyunfeng wuyunfeng requested a review from zhenxiao January 22, 2020 13:43
@wuyunfeng
Copy link
Contributor Author

@zhenxiao thanks for your review. Can you give me another review?

Copy link
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@zhenxiao zhenxiao merged commit cd41fa5 into prestodb:master Jan 22, 2020
@wuyunfeng
Copy link
Contributor Author

@zhenxiao Thanks for you review

@caithagoras caithagoras mentioned this pull request Feb 20, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants