Skip to content

Reduce LuceneOperator.Status memory consumption with large QueryDSL queries#143175

Merged
craigtaverner merged 5 commits intoelastic:mainfrom
craigtaverner:reduce_luceneoperator_status_memory
Mar 2, 2026
Merged

Reduce LuceneOperator.Status memory consumption with large QueryDSL queries#143175
craigtaverner merged 5 commits intoelastic:mainfrom
craigtaverner:reduce_luceneoperator_status_memory

Conversation

@craigtaverner
Copy link
Copy Markdown
Contributor

As reported in #143164 we've seen users writing extremely large QueryDSL queries (of the order of many megabytes), and the LuceneOperator.Status keeps a HashSet of the toString of these queries, which is very large. This fix just truncates the string to 200 characters.

Fixes #143164

@craigtaverner craigtaverner added >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Feb 26, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

Copy link
Copy Markdown
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right to me. I think you can test this with the profile tests.

protected Status(LuceneOperator operator) {
processedSlices = operator.processedSlices;
processedQueries = operator.processedQueries.stream().map(Query::toString).collect(Collectors.toCollection(TreeSet::new));
processedQueries = operator.processedQueries.stream().map(Status::queryString).collect(Collectors.toCollection(TreeSet::new));
Copy link
Copy Markdown
Member

@dnhatn dnhatn Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the queries themselves are consuming a significant amount of memory, based on the heap dump screenshot. Should we convert processQueries to use String instead and apply a limit there?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So:

-final Set<Query> processedQueries = new HashSet<>();
+final Set<String> processedQueries = new HashSet<>();

Hey, if we do that and we limit the size, could we make the output:

queryString.substring(0, QUERY_STRING_TRUNCATION)
  + "...("
  + (queryString.length() - QUERY_STRING_TRUNCATION)
  + "more characters)["
  + queryString.hashcode()
  + "]

Just some extra paranoia about the hash of the query. If the queries are

Bool[SomeBigShape, SomethingImportant]
Bool[SomeBigShape, SomethingElseImportant]

Then we'll at least see that there were two.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the hash getting big.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your proposal is good.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking at the heap dump myself, I cannot find any excessive memory usage by the processedQueries inside the operator, only the string version in the status. But since it only exists to pass to the status, we might as well truncate early, as suggested.

@craigtaverner
Copy link
Copy Markdown
Contributor Author

OK. I've made the changes. Let me know what you think.

  • Convert and truncate earlier (in operator, not status)
  • Increased size because I imagine sometimes the information is useful (200 seems to small for many of the cases I've seen, and the issue we're mitigating had 80M strings here)
  • There were existing tests that needed fixing to cope with truncation

@craigtaverner craigtaverner marked this pull request as ready for review March 2, 2026 13:46
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@craigtaverner
Copy link
Copy Markdown
Contributor Author

I checked that PushQueriesIT.testEqualityTooBigToPush covers this case (ie. the assertion needed to be changed to verify the truncated string). Hopefully this is sufficient testing.

@craigtaverner craigtaverner added auto-backport Automatically create backport pull requests when merged branch:9.2 branch:9.3 labels Mar 2, 2026
@craigtaverner craigtaverner merged commit c47480d into elastic:main Mar 2, 2026
34 of 35 checks passed
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
9.3
9.2

craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Mar 2, 2026
craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Mar 2, 2026
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 2, 2026
…locations

* upstream/main: (94 commits)
  Mute org.elasticsearch.xpack.esql.qa.mixed.EsqlClientYamlIT test {p0=esql/40_tsdb/TS Command grouping on text field} elastic#142544
  Mute org.elasticsearch.index.store.StoreDirectoryMetricsIT testDirectoryMetrics elastic#143419
  Mute org.elasticsearch.xpack.esql.qa.multi_node.GenerativeIT test elastic#143023
  TS_INFO information retrieval command (elastic#142721)
  ESQL: External source parallel execution and distribution (elastic#143349)
  Mute org.elasticsearch.index.mapper.blockloader.FlattenedFieldRootBlockLoaderTests testBlockLoaderForFieldInObject {preference=Params[syntheticSource=false, preference=DOC_VALUES]} elastic#143414
  Mute org.elasticsearch.index.mapper.blockloader.FlattenedFieldRootBlockLoaderTests testBlockLoaderForFieldInObject {preference=Params[syntheticSource=false, preference=NONE]} elastic#143413
  Mute org.elasticsearch.index.mapper.blockloader.FlattenedFieldRootBlockLoaderTests testBlockLoaderForFieldInObject {preference=Params[syntheticSource=false, preference=STORED]} elastic#143412
  Removing ingest random sampling (elastic#143289)
  Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#143023
  [Transform] Clean up internal tests (elastic#143246)
  Skip time series field type merge for non-TS agg queries (elastic#143262)
  Enable zero-copy SIMD vector scoring on searchable snapshots (frozen tier) (elastic#141718)
  Mute org.elasticsearch.xpack.search.CrossClusterAsyncSearchIT testCancelViaExpirationOnRemoteResultsWithMinimizeRoundtrips elastic#143407
  Fix MemorySegmentUtilsTests (elastic#143391)
  Unmute testWorkflowsRestrictionAllowsAccess (elastic#143308)
  Cancel async query on expiry (elastic#143016)
  ESQL: Finish migrating error testing (elastic#143322)
  Reduce LuceneOperator.Status memory consumption with large QueryDSL queries (elastic#143175)
  ESQL: Generative testing with full text functions (elastic#142961)
  ...
tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
elasticsearchmachine pushed a commit that referenced this pull request Mar 18, 2026
…yDSL queries (#143175) (#143403)

* Reduce LuceneOperator.Status memory consumption with large QueryDSL queries (#143175)

* Backporting to 9.3 required inlining the constant
elasticsearchmachine pushed a commit that referenced this pull request Mar 18, 2026
…yDSL queries (#143175) (#143401)

* Reduce LuceneOperator.Status memory consumption with large QueryDSL queries (#143175)

* Backporting to 9.3 required inlining the constant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.7 v9.3.2 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ES|QL with large QueryDSL filters can OOM

4 participants