Skip to content

ES|QL: fix sentinel values leaking from SearchContextStats min/max#142726

Closed
salvatore-campagna wants to merge 4 commits intoelastic:mainfrom
salvatore-campagna:fix/sentinel-values-in-search-context-stats
Closed

ES|QL: fix sentinel values leaking from SearchContextStats min/max#142726
salvatore-campagna wants to merge 4 commits intoelastic:mainfrom
salvatore-campagna:fix/sentinel-values-in-search-context-stats

Conversation

@salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Feb 19, 2026

SearchContextStats.min()/max() can return sentinel values (Long.MIN_VALUE/Long.MAX_VALUE) instead of real date/timestamp values. This causes Rounding.prepare(min, max) to overflow, failing ES|QL queries that access date fields on wide index patterns mixing TSDB and non-TSDB indices (e.g. apm-*, logs-*,...).

Two code paths are affected:

PointValues path: when a date field is mapped but no documents contain date values, PointValues.getMinPackedValue() returns null and minValue stays at its Long.MAX_VALUE initializer. The old comparison minValue <= min[0] evaluates to Long.MAX_VALUE <= Long.MAX_VALUE which is true, so the sentinel leaks as a real value and is potentially used by the rounding logic.

DocValuesSkipper path (mixed TSDB): when the first matched index is TSDB, hasDocValueSkipper is set to true no matter if all segments have a skipper or not. Segment readers from non-TSDB indices have the date field indexed (via PointValues) but no skipper, so DocValuesSkipper.globalMinValue() returns Long.MIN_VALUE. As a result, a sentinel value leaks as a real value and is potentially used by the rounding logic.

The fix adds a hasMin/hasMax boolean that tracks whether each code path produces a real value (not Long.MIN_VALUE or Long.MAX_VALUE) before updating the accumulator. In the PointValues path, it is set to true only when getMinPackedValue() returns non-null. In the DocValuesSkipper path, sentinel values are filtered explicitly.

Closes #142725

@salvatore-campagna salvatore-campagna self-assigned this Feb 19, 2026
@salvatore-campagna salvatore-campagna added the :StorageEngine/TSDB You know, for Metrics label Feb 19, 2026
Filter out Lucene sentinel values (Long.MIN_VALUE/Long.MAX_VALUE) in
SearchContextStats.min() and max() to prevent Rounding.prepare() from
overflowing on wide index patterns mixing TSDB and non-TSDB indices.
@salvatore-campagna salvatore-campagna force-pushed the fix/sentinel-values-in-search-context-stats branch from fb733ce to 0e9800f Compare February 19, 2026 18:52
@salvatore-campagna salvatore-campagna marked this pull request as ready for review February 19, 2026 18:53
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@salvatore-campagna
Copy link
Contributor Author

@fang-xing-esql FYI. I targeted release versions 8.19.12, 9.2.6 and 9.3.1 even though the corresponding release branches have been created already. Let me know if you prefer me targeting the next release instead.

@fang-xing-esql
Copy link
Member

@fang-xing-esql FYI. I targeted release versions 8.19.12, 9.2.6 and 9.3.1 even though the corresponding release branches have been created already. Let me know if you prefer me targeting the next release instead.

Thank you for taking care of this @salvatore-campagna. If this is related to replacing date_trunc/bucket with round_to, it was introduced to 9.2 by #128639, 8.19 does not have it.

Fixing it in 9.4 is good, but I'm not quite sure about the urgency of backport, perhaps it is ok to wait for the next branch if the current ones on 9.2 and 9.3 are frozen.

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Feb 20, 2026

@fang-xing-esql @romseygeek

Note: today I investigated a bit more this is and I think this fix is more a workaround. There is a better way to fix this.

The root cause is that hasDocValueSkipper is derived once from the first shard's DateFieldType (as a global true / false value) and then applied globally to all shards in doWithContexts. When a query spans indices with different modes (e.g. TSDB + standard), the physical storage of date fields (especially for timestamp) might not reflect the hasDocValueSkipper boolean:

  • TSDB/LogsDB shards may use doc values skippers or BKD trees depending on settings (index.mapping.use_doc_values_skipper, index version). When using skippers: doc values with a skip index, no BKD tree.
  • Standard shards use LongField with BKD tree and doc values, no skip index.

As a result, these require different Lucene APIs to read min/max (DocValuesSkipper.globalMinValue vs PointValues.getMinPackedValue). Calling the wrong one doesn't error because it silently returns sentinel values (Long.MIN_VALUE / Long.MAX_VALUE) or null.

The hasMin/hasMax filtering in this PR prevents the sentinels from leaking into Rounding.prepare(), which fixes the immediate overflow. But the proper fix should check per shard whether:

  1. The field exists in that shard
  2. The field has a doc values skipper then use DocValuesSkipper
  3. The field has point values then use PointValues

This would require either passing the SearchExecutionContext into the doWithContexts consumer, or having min()/max() iterate contexts directly instead of delegating to doWithContexts. The mixedFieldType guard in makeFieldConfig doesn't help here because all DateFieldType instances return "date" as their typeName() regardless of the underlying IndexType (skippers vs points).

I am working on another fix to do this properly.

@salvatore-campagna
Copy link
Contributor Author

Closing this in favor of: #142752

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SearchContextStats.min()/max() leak sentinel values causing overflow in Rounding

3 participants