New terms_enum API for discovering terms in the index.#66452
New terms_enum API for discovering terms in the index.#66452markharwood merged 31 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-search (Team:Search) |
mayya-sharipova
left a comment
There was a problem hiding this comment.
@markharwood Exciting API , I envision it to be very useful . I did an initial pass in review, mostly questions.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TermCount.java
Outdated
Show resolved
Hide resolved
...plugin/core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TermEnumResponse.java
Outdated
Show resolved
Hide resolved
...k/plugin/core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TermEnumAction.java
Outdated
Show resolved
Hide resolved
|
When it comes to the HLRC I'm unsure where best to place the logic:
|
jimczi
left a comment
There was a problem hiding this comment.
I did a first pass to review the main API and the options.
I think we're mixing multiple use cases and allow too many options. I left some comments to simplify the API. I'll make a second one to review the concrete actions in a follow up.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TermCount.java
Outdated
Show resolved
Hide resolved
0a1a2d3 to
b48a473
Compare
jimczi
left a comment
There was a problem hiding this comment.
Sorry for the delay.
I left another round of comments regarding the _terms action.
...in/core/src/main/java/org/elasticsearch/xpack/core/termenum/action/ShardTermEnumRequest.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
fa6eaf1 to
7e2ce35
Compare
eb50f69 to
9d955e1
Compare
|
@elastic-jb Just updated this PR if you want to AB test the sort-by-popularity with current Kibana terms-agg approach. This PR only considers up to a max of 10k matching terms on a node and returns Note, in the above example of music artists sorted by popularity, the results are incomplete because there are more than 10k bands starting with |
|
@jimczi @elastic-jb @lizozom @giladgal I tried this PR out compared with the existing Kibana impl (terms agg with I had plans to automate the comparisons but there were some interesting findings just from manually playing with some datasets. 1) The existing Kibana impl is case sensitiveUnless the user has chosen to add normalisation to the mapping at index-time, there is no case insensitivity to searches: 2) The existing Kibana impl misses stuffSampling the first 100k docs has no guarantee of finding the term the user is looking for as in this example: 3) The existing Kibana impl is slow in ways we didn't realiseOne of my datasets has many values per doc - person profiles have a list of band "likes" related to the user. This acts as big multiplier to the number of regex tests the 4) The new impl can be inaccurate (but there's a simple solution)For a high cardinality field and a short search string the results from the new impl can be inaccurate. In this example search for |
...core/src/main/java/org/elasticsearch/xpack/core/termenum/action/TransportTermEnumAction.java
Outdated
Show resolved
Hide resolved
8d528a9 to
6869b7f
Compare
|
Two things came out of discussions today
|
d481104 to
7bbf5b2
Compare
… core/src/yamlRestTest
…Added security tests
…to case insensitive search
…mRequest constructor to TransportTermsEnumAction#asyncNodeOperation
bcf76a4 to
22312cf
Compare







A search string is supplied which is used as prefix for matching terms found in a given field in the index.
A timeout can limit the amount of time spent looking for matches.
Designed for use in Kibana auto-complete use cases.
Kibana requests for this API would typically look like this:
The time range would avoid any indices that fall outside of the range but does not filter any doc values in overlapping indices. The tier clause would avoid hitting frozen/cold indices.
An optional
timeouttime value can also be passed (default is "1s", one second).The response looks like this:
Any requests that hit the
timeoutsetting will return"complete":false