Use high speed strategy for LuceneTopNSourceOperator by carlosdelest · Pull Request #142128 · elastic/elasticsearch

carlosdelest · 2026-02-09T12:18:03Z

Currently, LuceneTopNSourceOperator uses SHARD as auto strategy. This makes performance worse than the Query DSL when multiple segments are used, as SHARD does not parallelize queries.

This change uses LuceneSourceOperator::highSpeedAutoStrategy to allow parallelism based on the rules for high speed strategy described there:

Use SHARD for match none
Use DOC for match all
Use SEGMENT for other queries
Examine boolean queries, and use SHARD if any query should use it, SEGMENT if any query should use it and no SHARD queries exist, or DOC in case all queries should use it.

Closes #141770

elasticsearchmachine · 2026-02-09T17:37:20Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2026-02-09T17:37:20Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent · 2026-02-09T19:35:27Z

Drive by,

I don't know how the per-segment collection parallelism to work, but the query path will combine very tiny segments together as the cost of simply using threading for those tiny chunks of work isn't worth it.

Is something like that at work here? If not, expect the 1 worker == 1 segment logic to be harmful in indices with tiny segments (e.g. <100s of documents).

carlosdelest · 2026-02-10T06:45:15Z

I don't know how the per-segment collection parallelism to work, but the query path will combine very tiny segments together as the cost of simply using threading for those tiny chunks of work isn't worth it.

@benwtrent there is indeed a grouping of smaller segments, please check DataPartitioning:

    /**
     * Make one partition per shard. This is generally the slowest option, but it
     * has the lowest CPU overhead.
     */
    SHARD,

    /**
     * Partition on segment boundaries, this doesn't allow forking to as many CPUs
     * as {@link #DOC} but it has much lower overhead.
     * <p>
     * It packs segments smaller than {@link LuceneSliceQueue#MAX_DOCS_PER_SLICE}
     * docs together into a partition. Larger segments get their own partition.
     * Each slice contains no more than {@link LuceneSliceQueue#MAX_SEGMENTS_PER_SLICE}.
     */
    SEGMENT,

    /**
     * Partitions into dynamic-sized slices to improve CPU utilization while keeping overhead low.
     * This approach is more flexible than {@link #SEGMENT} and works as follows:
     *
     * <ol>
     *   <li>The slice size starts from a desired size based on {@code task_concurrency} but is capped
     *       at around {@link LuceneSliceQueue#MAX_DOCS_PER_SLICE}. This prevents poor CPU usage when
     *       matching documents are clustered together.</li>
     *   <li>For small and medium segments (less than five times the desired slice size), it uses a
     *       slightly different {@link #SEGMENT} strategy, which also splits segments that are larger
     *       than the desired size. See {@link org.apache.lucene.search.IndexSearcher#slices(List, int, int, boolean)}.</li>
     *   <li>For very large segments, multiple segments are not combined into a single slice. This allows
     *       one driver to process an entire large segment until other drivers steal the work after finishing
     *       their own tasks. See {@link LuceneSliceQueue#nextSlice(LuceneSlice)}.</li>
     * </ol>
     */
    DOC;

nik9000

If this is faster for y'all I'm all for it.

It's probably worth explaining you why never use the lowOverheadAutoStrategy. You don't want it because you have to scan all the documents with topn.

nik9000 · 2026-02-10T16:54:57Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java

    private static final Logger logger = LogManager.getLogger(EsPhysicalOperationProviders.class);

+    // LuceneTopNSourceOperator auto strategy
+    private static final DataPartitioning.AutoStrategy TOP_N_AUTO_STRATEGY = unusedLimit -> {


I'd probably make this a static method and reference it.

Done and documented using high speed in 45f70c3

…ne-top-n-data-partition-strategy

…p-n-data-partition-strategy' into enhancement/esql-lucene-top-n-data-partition-strategy

…lastic#142128)" This reverts commit f1ed358.

…142128)" (#142453) This reverts commit f1ed358.

…lastic#142128)" (elastic#142453) This reverts commit f1ed358.

…142128) * First version, use highSpeedAutoStrategy * Use highSpeedAutoStrategy * Fix tests to take into account new partitioning * Fix tests * [CI] Auto commit changes from spotless * Use a static method for strategy and document it --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>

carlosdelest added 2 commits February 9, 2026 12:47

First version, use highSpeedAutoStrategy

fa5c598

Use highSpeedAutoStrategy

55e0b9b

elasticsearchmachine added the v9.4.0 label Feb 9, 2026

Fix tests to take into account new partitioning

0ebd70b

carlosdelest added >enhancement :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/ES|QL Search functionality in ES|QL labels Feb 9, 2026

Fix tests

398c225

carlosdelest requested review from ioanatia and nik9000 February 9, 2026 17:36

carlosdelest marked this pull request as ready for review February 9, 2026 17:36

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 9, 2026

[CI] Auto commit changes from spotless

7dad5db

nik9000 approved these changes Feb 10, 2026

View reviewed changes

carlosdelest added 3 commits February 11, 2026 08:06

Use a static method for strategy and document it

45f70c3

Merge remote-tracking branch 'origin/main' into enhancement/esql-luce…

6c819de

…ne-top-n-data-partition-strategy

Merge remote-tracking branch 'carlosdelest/enhancement/esql-lucene-to…

7b94cdd

…p-n-data-partition-strategy' into enhancement/esql-lucene-top-n-data-partition-strategy

carlosdelest merged commit f1ed358 into elastic:main Feb 11, 2026
35 checks passed

carlosdelest added a commit to carlosdelest/elasticsearch that referenced this pull request Feb 13, 2026

Revert "ESQL - Use high speed strategy for LuceneTopNSourceOperator (e…

49d1f6d

…lastic#142128)" This reverts commit f1ed358.

This was referenced Feb 13, 2026

ES|QL - Improve data partition strategy for search use cases #141770

Open

ESQL - Revert Use high speed strategy for LuceneTopNSourceOperator #142453

Merged

carlosdelest added a commit that referenced this pull request Feb 13, 2026

Revert "ESQL - Use high speed strategy for LuceneTopNSourceOperator (#…

10edcad

…142128)" (#142453) This reverts commit f1ed358.

sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Feb 13, 2026

Revert "ESQL - Use high speed strategy for LuceneTopNSourceOperator (e…

27e1e87

…lastic#142128)" (elastic#142453) This reverts commit f1ed358.

carlosdelest mentioned this pull request Feb 25, 2026

Add wikipedia data partitioning for ES|QL profiled queries elastic/rally-tracks#989

Closed

This was referenced Feb 25, 2026

Add wikipedia data partitioning for ES|QL profiled queries elastic/rally-tracks#989

Closed

ES|QL - LuceneTopNSourceOperator data partition strategy uses ContextIndexSearcher slicing #143133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use high speed strategy for LuceneTopNSourceOperator#142128

Use high speed strategy for LuceneTopNSourceOperator#142128
carlosdelest merged 8 commits intoelastic:mainfrom
carlosdelest:enhancement/esql-lucene-top-n-data-partition-strategy

carlosdelest commented Feb 9, 2026 •

edited

Loading

Uh oh!

elasticsearchmachine commented Feb 9, 2026

Uh oh!

elasticsearchmachine commented Feb 9, 2026

Uh oh!

benwtrent commented Feb 9, 2026

Uh oh!

carlosdelest commented Feb 10, 2026 •

edited

Loading

Uh oh!

nik9000 left a comment

Uh oh!

nik9000 Feb 10, 2026

Uh oh!

carlosdelest Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

carlosdelest commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 9, 2026

Uh oh!

elasticsearchmachine commented Feb 9, 2026

Uh oh!

benwtrent commented Feb 9, 2026

Uh oh!

carlosdelest commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

carlosdelest Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

carlosdelest commented Feb 9, 2026 •

edited

Loading

carlosdelest commented Feb 10, 2026 •

edited

Loading