Skip to content

[ES|QL] Run aggregations on aggregate metric double with default metric#138647

Merged
limotova merged 14 commits intoelastic:mainfrom
limotova:agg-metric-fuse-default-values
Dec 17, 2025
Merged

[ES|QL] Run aggregations on aggregate metric double with default metric#138647
limotova merged 14 commits intoelastic:mainfrom
limotova:agg-metric-fuse-default-values

Conversation

@limotova
Copy link
Contributor

@limotova limotova commented Nov 26, 2025

This commit adds support for using the default metric of an aggregate metric double on aggregations.
For example in the query FROM datastream | STATS std_dev(metric_field), the std_dev aggregation does not have native support for aggregate metric double, so we use the default metric instead (which is determined by the field mapping).

Closes #136297

Known limitations/bugs:

  • Count as a default metric is not supported (mostly because for numerics, with the exception of calling Avg and Count directly, we always return the original numeric, converted to a double)
  • While the above example query works, the following will fail saying std_dev does not support aggregate_metric_double:
FROM datastream
| EVAL agg_metric = TO_AGGREGATE_METRIC_DOUBLE(metric_field)
| STATS std_dev(agg_metric)

@limotova limotova force-pushed the agg-metric-fuse-default-values branch 2 times, most recently from 81ff8bb to 39e5e69 Compare December 4, 2025 10:27
@limotova limotova force-pushed the agg-metric-fuse-default-values branch from 39e5e69 to 1af0a96 Compare December 4, 2025 10:31
@dnhatn dnhatn self-requested a review December 5, 2025 00:41
;

DefaultMetricWithFrom
required_capability: ts_command_v0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't use the TS command but tests with this capability only get run as EsqlSpecIT, and not as CSV tests, and I was running into issues running these tests through the CSV framework and did not think it was worth it to debug, so I added this capability.

}

// break down Avg and AvgOverTime so we grab the correct submetrics
if (aggFunc instanceof Avg avg) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, AvgOverTime, and the Count/CountOverTime replacements below are essentially the surrogate path if the datatype is AggregateMetricDouble. However, because we are trying to fuse the function into the field, what happens is the datatype goes from Invalid (because multi-type) to Double/Integer (because Avg/Count), and therefore we skip the path that would normally correctly replace Avg with Div(Sum, Count) and Count with Sum(count), so I had to manually make that substitution here.
I'm not sure what the alternative might be

}

private static AggregateMetricDoubleBlockBuilder.Metric getMetric(AggregateFunction aggFunc) {
private Expression countConvert(UnaryScalarFunction convert, Source source, DataType type, InvalidMappedField imf) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is practically identical to ResolveUnionType's typeSpecificConvert(...), except we can't use that method because we're trying to pass in MvCount as convert, and MvCount is not a ConvertFunction. I wasn't sure if it made sense to modify the original method.

@limotova limotova requested review from kkrik-es and nik9000 December 16, 2025 09:02
Block resultBlock = ((AggregateMetricDoubleBlock) block).getMetricBlock(subFieldIndex);
int actualSubfieldIndex = subFieldIndex;
if (subFieldIndex == AggregateMetricDoubleBlockBuilder.Metric.DEFAULT.getIndex()) {
actualSubfieldIndex = AggregateMetricDoubleBlockBuilder.Metric.MAX.getIndex();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we look this up in the mapper and retrieve the configured default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to pass that information here because I think the only place it could be stored is inside the AggregateMetricDoubleBlock (and we don't do that)
But also because the block here could be a numeric that was converted into an aggregate metric double block and that doesn't have a default value

All that being said, I believe this chunk of code shouldn't be getting hit (anymore) and I forgot to remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right it'd be tricky.. Remind me, is there a place where we look up what's the default value? Where does this currently happen? Or it's hardcoded to max?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do i read this right that it happens in AggregateMetricDoubleFieldMapper, with the change in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the way the code works right now is we never run the original FromAggregateMetricDouble with default value; we always fuse the function into the blockloader. So we are reading it from AggregateMetricDoubleFieldMapper yes (and I think that's the only place that knows about it).

It might be possible to add this info somewhere in the EsRelation, but I don't know how we would forward that information from the EsRelation to functions that require it (furthermore, I don't really know how we would distinguish between aggregate metric doubles vs numerics if we're handling both at the same time, like in implicit casting)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is actually correct - we should be retrieving this info from the mapper.

Maybe follow up with a test where the default is set to min - and verify that it's used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test (and manually verified that it's the right number)

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if @nik9000 can also take a look, at least at the Analyzer changes.

@limotova limotova added >feature :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL labels Dec 16, 2025
@limotova limotova marked this pull request as ready for review December 16, 2025 19:24
@limotova limotova added the test-release Trigger CI checks against release build label Dec 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @limotova, I've created a changelog YAML for you.

@limotova limotova changed the title [ES|QL] Run aggregations on agg_metric with default metric [ES|QL] Run aggregations on aggregate metric double with default metric Dec 16, 2025
@limotova
Copy link
Contributor Author

It looks like Nik is going to be unavailable for a few days so I will try to merge this and hopefully we can iterate on it in the following days if necessary.

@limotova limotova removed the test-release Trigger CI checks against release build label Dec 16, 2025
@limotova limotova merged commit f50da31 into elastic:main Dec 17, 2025
35 checks passed
@limotova limotova deleted the agg-metric-fuse-default-values branch December 17, 2025 00:28
szybia added a commit to szybia/elasticsearch that referenced this pull request Dec 17, 2025
…-err-message

* upstream/main: (45 commits)
  Add sort field usage to telemetry (elastic#139530)
  ES|QL: Release CCS support for FORK (elastic#139630)
  Parameterize VectorSimilarityFunctionsTests on the similarity function (elastic#139516)
  fix broken links from ccs file move (elastic#139655)
  Update docs for v9.2.3 release (elastic#139479)
  Move tsdb bwc tests to x-pack/logsdb (elastic#139671)
  Fix FirstDocIdGroupingAggregatorFunction (elastic#139619)
  Fix release test for node_reduction profiling (elastic#139515)
  Unmute RestClientSingleHostIntegTests.testRequestResetAndAbort (elastic#139656)
  Unmute VerifyVersionConstantsIT.testLuceneVersionConstant (elastic#139644)
  Mute org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStoreRepositoryTests testReadNonExistingPath elastic#139665
  Mute org.elasticsearch.smoketest.WatcherYamlRestIT test {p0=mustache/10_webhook/Test webhook action with mustache integration} elastic#139663
  [ES|QL] Run aggregations on aggregate metric double with default metric (elastic#138647)
  Enable TDigest field mapper and ES|QL type (elastic#139607)
  Mute org.elasticsearch.index.mapper.HalfFloatSyntheticSourceNativeArrayIntegrationTests testSynthesizeArrayRandom elastic#139658
  Explicitly cleanup test index with shared data path (elastic#136048)
  Mute org.elasticsearch.xpack.core.action.XPackUsageResponseTests testVersionDependentSerializationWriteToOldStream elastic#139576
  Make XPackUsageResponseTests a wire serializing test case (elastic#139643)
  Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT test {p0=mixed_cluster/90_ml_data_frame_analytics_crud/Start and stop old regression job} elastic#139654
  Relax error bounds for RandomizedTimeSeriesIT (elastic#139641)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL Team:StorageEngine v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use aggregate_metric_double.default_value in ES|QL

3 participants

Comments