ES|QL: Optimize MMR by reducing cache size and lookup by ioanatia · Pull Request #145014 · elastic/elasticsearch

ioanatia · 2026-03-26T16:33:35Z

for the full explanation - #140710

elasticsearchmachine · 2026-03-30T13:01:14Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

markjhoy · 2026-03-30T19:48:35Z

server/src/main/java/org/elasticsearch/search/diversification/mmr/MMRResultDiversification.java

-                // compute MMR scores for remaining searchHits
-                float highestSimilarityScoreToSelected = getHighestSimilarityScoreToSelectedVectors(
-                    selectedDocRanks,
+                float similarityToLastSelected = getVectorComparisonScore(


I think I get at what the optimization is here -- you're only comparing the current document to the last selected document, correct? (the original implementation, and the implementation in the paper, computes MMR in respect to all the previously selected documents)...

I think this will work, but it's still unsure in my head if it will produce the most correct results...

We still compute MMR wrt all previously selected documents.

We keep an array of the computed max similarity between each doc and the selected set.
Then as we select new diversified docs and we iterate through the remaining docs to find a new candidate:

We compute the similarity between the current doc for which we calculate MMR and the doc that was added to the selected docs in the prev iteration.

We update the maxSimilarityToSelected[docRank - 1] for the current doc.

We compute the MMR score using the maxSimilarityToSelected value for the current doc.

mromaios · 2026-03-31T09:12:55Z

server/src/main/java/org/elasticsearch/search/diversification/mmr/MMRResultDiversification.java

-                highestScore = similarityScore;
-            }
-        }
-        return highestScore == Float.NEGATIVE_INFINITY ? 0.0f : highestScore;


💭 minor edge case, just from comparing implementations (not sure if it's a valid one). In the previous code if there was no valid similarity score, we would have 0.0f, but now we will get Float.NEGATIVE_INFINITY, that is if context.getFieldVector() returns null for every selected document.

We don't actually get to this path, because as we iterate through candidates, we skip those that don't have a vector value:

elasticsearch/server/src/main/java/org/elasticsearch/search/diversification/mmr/MMRResultDiversification.java

Lines 67 to 70 in dc43d5c

var thisDocVector = context.getFieldVector(docRank);

if (thisDocVector == null || thisDocVector.size() == 0) {

continue;

}

Oh right 🤦
Thanks!

mromaios

LGTM ✅

…rics * upstream/main: (428 commits) ESQL: DS: Add inference/RERANK tests (elastic#145229) Unmute MMR logical plan test (elastic#145311) Do not attempt marking store as corrupted if the check is rejected due to shutdown (elastic#145209) feat(tsdb): add pipeline runtime and rename stage interfaces (elastic#145175) Fix UnresolvedException on PromQL by(step) grouping (elastic#145307) ES|QL: Optimize MMR by reducing cache size and lookup (elastic#145014) Prometheus labels/series APIs: support multiple match[] selectors (elastic#145298) Move ClientScrollablePaginatedHitSource into Reindex Module (elastic#144100) mute test class for elastic#145277 CPS mode for ViewResolver (elastic#145219) [ESQL] Disables GroupedTopNBenchmark temporarily (elastic#145124) Make exponential_histogram the default histogram type for HTTP OTLP endpoint (elastic#145065) More tests requiring an explicit confidence interval (elastic#145232) ES|QL: Adding `USER_AGENT` command (elastic#144384) ESQL: enable Generative IT after more fixes (elastic#145112) Rework FieldMapper parameter tests to not use merge builders (elastic#145213) [ESQL] Fix ORC type support gaps (elastic#145074) [Test] Unmute FollowingEngineTests.testProcessOnceOnPrimary (elastic#145192) Add PrometheusSeriesRestAction for /_prometheus/api/v1/series endpoint (elastic#144494) Prometheus labels API: add rest action (elastic#144952) ...

Optimize MMR by reducing cache size and lookup

58a1be0

ioanatia added :Search Relevance/Ranking Scoring, rescoring, rank evaluation. >refactoring Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Mar 26, 2026

elasticsearchmachine added the v9.4.0 label Mar 26, 2026

ioanatia and others added 3 commits March 26, 2026 22:01

Use min instead of max

a04be0b

Merge branch 'main' into mmr_optimization

e75f379

Refactor

7e34522

ioanatia marked this pull request as ready for review March 30, 2026 13:00

ioanatia requested a review from markjhoy March 30, 2026 15:21

markjhoy reviewed Mar 30, 2026

View reviewed changes

mromaios reviewed Mar 31, 2026

View reviewed changes

mromaios approved these changes Mar 31, 2026

View reviewed changes

Merge branch 'main' into mmr_optimization

3bed78a

ioanatia merged commit 7e16024 into elastic:main Mar 31, 2026
35 checks passed

sachinnn99 pushed a commit to sachinnn99/elasticsearch that referenced this pull request Mar 31, 2026

ES|QL: Optimize MMR by reducing cache size and lookup (elastic#145014)

8f9c555

ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Apr 1, 2026

ES|QL: Optimize MMR by reducing cache size and lookup (elastic#145014)

a419559

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES|QL: Optimize MMR by reducing cache size and lookup#145014

ES|QL: Optimize MMR by reducing cache size and lookup#145014
ioanatia merged 5 commits intoelastic:mainfrom
ioanatia:mmr_optimization

ioanatia commented Mar 26, 2026

Uh oh!

elasticsearchmachine commented Mar 30, 2026

Uh oh!

markjhoy Mar 30, 2026

Uh oh!

ioanatia Mar 31, 2026

Uh oh!

mromaios Mar 31, 2026

Uh oh!

ioanatia Mar 31, 2026

Uh oh!

mromaios Mar 31, 2026

Uh oh!

mromaios left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	var thisDocVector = context.getFieldVector(docRank);
	if (thisDocVector == null \|\| thisDocVector.size() == 0) {
	continue;
	}

Conversation

ioanatia commented Mar 26, 2026

Uh oh!

elasticsearchmachine commented Mar 30, 2026

Uh oh!

markjhoy Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

ioanatia Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

mromaios Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

ioanatia Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

mromaios Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

mromaios left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants