Reduce LuceneOperator.Status memory consumption with large QueryDSL queries#143175
Conversation
|
Hi @craigtaverner, I've created a changelog YAML for you. |
nik9000
left a comment
There was a problem hiding this comment.
Looks right to me. I think you can test this with the profile tests.
| protected Status(LuceneOperator operator) { | ||
| processedSlices = operator.processedSlices; | ||
| processedQueries = operator.processedQueries.stream().map(Query::toString).collect(Collectors.toCollection(TreeSet::new)); | ||
| processedQueries = operator.processedQueries.stream().map(Status::queryString).collect(Collectors.toCollection(TreeSet::new)); |
There was a problem hiding this comment.
I think the queries themselves are consuming a significant amount of memory, based on the heap dump screenshot. Should we convert processQueries to use String instead and apply a limit there?
There was a problem hiding this comment.
So:
-final Set<Query> processedQueries = new HashSet<>();
+final Set<String> processedQueries = new HashSet<>();
Hey, if we do that and we limit the size, could we make the output:
queryString.substring(0, QUERY_STRING_TRUNCATION)
+ "...("
+ (queryString.length() - QUERY_STRING_TRUNCATION)
+ "more characters)["
+ queryString.hashcode()
+ "]
Just some extra paranoia about the hash of the query. If the queries are
Bool[SomeBigShape, SomethingImportant]
Bool[SomeBigShape, SomethingElseImportant]
Then we'll at least see that there were two.
There was a problem hiding this comment.
Good call on the hash getting big.
There was a problem hiding this comment.
While looking at the heap dump myself, I cannot find any excessive memory usage by the processedQueries inside the operator, only the string version in the status. But since it only exists to pass to the status, we might as well truncate early, as suggested.
…gtaverner/elasticsearch into reduce_luceneoperator_status_memory
|
OK. I've made the changes. Let me know what you think.
|
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
I checked that |
…locations
* upstream/main: (94 commits)
Mute org.elasticsearch.xpack.esql.qa.mixed.EsqlClientYamlIT test {p0=esql/40_tsdb/TS Command grouping on text field} elastic#142544
Mute org.elasticsearch.index.store.StoreDirectoryMetricsIT testDirectoryMetrics elastic#143419
Mute org.elasticsearch.xpack.esql.qa.multi_node.GenerativeIT test elastic#143023
TS_INFO information retrieval command (elastic#142721)
ESQL: External source parallel execution and distribution (elastic#143349)
Mute org.elasticsearch.index.mapper.blockloader.FlattenedFieldRootBlockLoaderTests testBlockLoaderForFieldInObject {preference=Params[syntheticSource=false, preference=DOC_VALUES]} elastic#143414
Mute org.elasticsearch.index.mapper.blockloader.FlattenedFieldRootBlockLoaderTests testBlockLoaderForFieldInObject {preference=Params[syntheticSource=false, preference=NONE]} elastic#143413
Mute org.elasticsearch.index.mapper.blockloader.FlattenedFieldRootBlockLoaderTests testBlockLoaderForFieldInObject {preference=Params[syntheticSource=false, preference=STORED]} elastic#143412
Removing ingest random sampling (elastic#143289)
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#143023
[Transform] Clean up internal tests (elastic#143246)
Skip time series field type merge for non-TS agg queries (elastic#143262)
Enable zero-copy SIMD vector scoring on searchable snapshots (frozen tier) (elastic#141718)
Mute org.elasticsearch.xpack.search.CrossClusterAsyncSearchIT testCancelViaExpirationOnRemoteResultsWithMinimizeRoundtrips elastic#143407
Fix MemorySegmentUtilsTests (elastic#143391)
Unmute testWorkflowsRestrictionAllowsAccess (elastic#143308)
Cancel async query on expiry (elastic#143016)
ESQL: Finish migrating error testing (elastic#143322)
Reduce LuceneOperator.Status memory consumption with large QueryDSL queries (elastic#143175)
ESQL: Generative testing with full text functions (elastic#142961)
...
As reported in #143164 we've seen users writing extremely large QueryDSL queries (of the order of many megabytes), and the LuceneOperator.Status keeps a HashSet of the toString of these queries, which is very large. This fix just truncates the string to 200 characters.
Fixes #143164