ESQL - KNN function uses prefilters when pushed down to Lucene#131004
ESQL - KNN function uses prefilters when pushed down to Lucene#131004carlosdelest merged 31 commits intoelastic:mainfrom
Conversation
… to be generated with prefilters
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/querydsl/query/KnnQuery.java
🔍 Preview links for changed docs |
| # TODO We need kNN prefiltering here so we get more candidates that pass the filter | ||
| from colors metadata _score | ||
| | where (knn(rgb_vector, [0,255,255], 140) or knn(rgb_vector, [128, 0, 255], 140)) and primary == true | ||
| | where (knn(rgb_vector, [0,255,255], 140) or knn(rgb_vector, [128, 0, 255], 10)) and primary == true |
There was a problem hiding this comment.
Using prefilter means that we no longer need to retrieve all results, but can use k as inteneded as conjunction is pushed down as a prefilter now
| @Override | ||
| public Query asQuery(LucenePushdownPredicates pushdownPredicates, TranslatorHandler handler) { | ||
| return queryBuilder != null ? new TranslationAwareExpressionQuery(source(), queryBuilder) : translate(handler); | ||
| return queryBuilder != null ? new TranslationAwareExpressionQuery(source(), queryBuilder) : translate(pushdownPredicates, handler); |
There was a problem hiding this comment.
Added pushdownPredicates to translate method, as we need to evaluate the prefilters with them in order to decide
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Knn.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Show resolved
Hide resolved
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
...gin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/plugin/KnnFunctionIT.java
Show resolved
Hide resolved
| this.field = field; | ||
| this.k = k; | ||
| this.options = options; | ||
| this.filterExpressions = filterExpressions; |
There was a problem hiding this comment.
we need to add both filterExpressions and queryBuilder to the nodeInfo method:
my understanding is that if we don't add them, we risk losing both filterExpressions and queryBuilder during transforms.
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Knn.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/querydsl/query/KnnQuery.java
Outdated
Show resolved
Hide resolved
…sql-knn-prefilter
...rg/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownConjunctionsToKnnPrefilters.java
Outdated
Show resolved
Hide resolved
...rg/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownConjunctionsToKnnPrefilters.java
Show resolved
Hide resolved
|
|
||
| String query = """ | ||
| from test | ||
| | where knn(dense_vector, [0, 1, 2], 10) and integer > 10 |
There was a problem hiding this comment.
could we modify one of these tests such that we have 2 WHERE conditions, e.g. here:
| where knn(dense_vector, [0, 1, 2], 10)
| integer > 10
it would help validate that combining the filters is done before we push down the prefilters to the KNN functions
| */ | ||
| var query = """ | ||
| from test | ||
| | where ((knn(dense_vector, [0, 1, 2], 10) or integer > 10) and ((keyword == "test") or knn(dense_vector, [4, 5, 6], 10))) |
There was a problem hiding this comment.
can we get some tests where we use NOT.
for example:
((knn(dense_vector, [0, 1, 2], 10) or integer > 10) and NOT ((keyword == "test") or knn(dense_vector, [4, 5, 6], 10)))
…sql-knn-prefilter # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalPhysicalPlanOptimizerTests.java
# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/knn-function.csv-spec
| } | ||
|
|
||
| protected abstract Query translate(TranslatorHandler handler); | ||
| protected abstract Query translate(LucenePushdownPredicates pushdownPredicates, TranslatorHandler handler); |
There was a problem hiding this comment.
Would it be easier to have two signatures for this, for a smaller file change count?
There was a problem hiding this comment.
I think it's more confusing. LucenePushdownPredicates are part of the translation process even if they were not used until now.
| * Support knn function | ||
| */ | ||
| KNN_FUNCTION_V2(Build.current().isSnapshot()), | ||
| KNN_FUNCTION_V3(Build.current().isSnapshot()), |
There was a problem hiding this comment.
Super nitpick - I wonder if we could create an alias that this could reference (so when we go to V4 we only need to change one variable when testing for test compatibility)
There was a problem hiding this comment.
Can you provide an example of what it would look like?
There was a problem hiding this comment.
Something like KNN_FUNCTION_CURRENT = EsqlCapabilities.KNN_FUNCTION_V3?
There was a problem hiding this comment.
Not sue if that works with CSV tests 🤔 . I'd say let's keep this as it's an established pattern in ES|QL.
Maybe I'll give it a try for the next one and see what it looks like from a changes perspective 👍
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/ExpressionWritables.java
…sql-knn-prefilter
…king * upstream/main: (91 commits) Mute org.elasticsearch.packaging.test.DockerTests test130JavaHasCorrectOwnership elastic#131369 Add exception logging when interrupted (elastic#131153) Mute org.elasticsearch.packaging.test.DockerTests test140CgroupOsStatsAreAvailable elastic#131372 Mute org.elasticsearch.packaging.test.DockerTests test070BindMountCustomPathConfAndJvmOptions elastic#131366 Mute org.elasticsearch.xpack.test.rest.XPackRestIT test {p0=ml/delete_expired_data/Test delete expired data with body parameters} elastic#131364 Mute org.elasticsearch.xpack.esql.vector.VectorSimilarityFunctionsIT testSimilarityBetweenConstantVectorAndField {functionName=v_cosine similarityFunction=COSINE} elastic#131363 Mute org.elasticsearch.xpack.esql.vector.VectorSimilarityFunctionsIT testDifferentDimensions {functionName=v_cosine similarityFunction=COSINE} elastic#131362 Mute org.elasticsearch.xpack.esql.vector.VectorSimilarityFunctionsIT testSimilarityBetweenConstantVectors {functionName=v_cosine similarityFunction=COSINE} elastic#131361 Check SCORE_FUNCTION capability in VerifierTests (elastic#131352) Replace deprecated routingTable table call in tests (elastic#131005) [DOCS] Remove misused applies_to tag (elastic#131349) Adj ivf postings list building (elastic#130843) [Transform] Read metadata from Project State (elastic#131205) Add note on o11y to architecture guide (elastic#131291) Upgrade AWS Java SDK to 2.31.78 (elastic#131050) Support Fields API in conditional ingest processors (elastic#121914) ESQL - KNN function uses prefilters when pushed down to Lucene (elastic#131004) Add docs for ES|QL query logs (elastic#131287) Simplify `expectedFinalRegisterValue` computation (elastic#131274) Mute org.elasticsearch.test.rest.yaml.RcsCcsCommonYamlTestSuiteIT test {p0=search/110_field_collapsing/field collapsing, inner_hits and maxConcurrentGroupRequests} elastic#131348 ...
KNN function returned less results than expected when using conjunctions:
Before this change, knn retrieved the top k results, but then those would be post-filtered by the conjunction. That potentially retrieves in less than k results retrieved (see filters in knn).
This PR pushes down conjunctions as prefilters to KNN queries:
PushDownConjunctionsToKnnPrefilterslogical optimizer rule examines filters, and pushes conjunctions to knn functions as prefilters. It works by identifying knn functions on one side of a conjunction, and pushing the other side of the conjunction(s) that are parent to the knn function to it as prefilters.QueryBuilder.This PR still doesn't handle the case when KNN cannot be pushed down and has prefilters - that will come as a follow up PR.
knn functions won't have knn queries as prefilters to avoid circular dependencies.
Whit this change, a query with:
WHERE ((knn(A) AND B) OR C) AND (knn(D) OR (E AND F))will result in: