[ES|QL] Run aggregations on aggregate metric double with default metric#138647
[ES|QL] Run aggregations on aggregate metric double with default metric#138647limotova merged 14 commits intoelastic:mainfrom
Conversation
81ff8bb to
39e5e69
Compare
39e5e69 to
1af0a96
Compare
| ; | ||
|
|
||
| DefaultMetricWithFrom | ||
| required_capability: ts_command_v0 |
There was a problem hiding this comment.
This test doesn't use the TS command but tests with this capability only get run as EsqlSpecIT, and not as CSV tests, and I was running into issues running these tests through the CSV framework and did not think it was worth it to debug, so I added this capability.
| } | ||
|
|
||
| // break down Avg and AvgOverTime so we grab the correct submetrics | ||
| if (aggFunc instanceof Avg avg) { |
There was a problem hiding this comment.
This, AvgOverTime, and the Count/CountOverTime replacements below are essentially the surrogate path if the datatype is AggregateMetricDouble. However, because we are trying to fuse the function into the field, what happens is the datatype goes from Invalid (because multi-type) to Double/Integer (because Avg/Count), and therefore we skip the path that would normally correctly replace Avg with Div(Sum, Count) and Count with Sum(count), so I had to manually make that substitution here.
I'm not sure what the alternative might be
| } | ||
|
|
||
| private static AggregateMetricDoubleBlockBuilder.Metric getMetric(AggregateFunction aggFunc) { | ||
| private Expression countConvert(UnaryScalarFunction convert, Source source, DataType type, InvalidMappedField imf) { |
There was a problem hiding this comment.
This is practically identical to ResolveUnionType's typeSpecificConvert(...), except we can't use that method because we're trying to pass in MvCount as convert, and MvCount is not a ConvertFunction. I wasn't sure if it made sense to modify the original method.
| Block resultBlock = ((AggregateMetricDoubleBlock) block).getMetricBlock(subFieldIndex); | ||
| int actualSubfieldIndex = subFieldIndex; | ||
| if (subFieldIndex == AggregateMetricDoubleBlockBuilder.Metric.DEFAULT.getIndex()) { | ||
| actualSubfieldIndex = AggregateMetricDoubleBlockBuilder.Metric.MAX.getIndex(); |
There was a problem hiding this comment.
Can we look this up in the mapper and retrieve the configured default value?
There was a problem hiding this comment.
I'm not sure how to pass that information here because I think the only place it could be stored is inside the AggregateMetricDoubleBlock (and we don't do that)
But also because the block here could be a numeric that was converted into an aggregate metric double block and that doesn't have a default value
All that being said, I believe this chunk of code shouldn't be getting hit (anymore) and I forgot to remove this
There was a problem hiding this comment.
Right it'd be tricky.. Remind me, is there a place where we look up what's the default value? Where does this currently happen? Or it's hardcoded to max?
There was a problem hiding this comment.
Do i read this right that it happens in AggregateMetricDoubleFieldMapper, with the change in this PR?
There was a problem hiding this comment.
So the way the code works right now is we never run the original FromAggregateMetricDouble with default value; we always fuse the function into the blockloader. So we are reading it from AggregateMetricDoubleFieldMapper yes (and I think that's the only place that knows about it).
It might be possible to add this info somewhere in the EsRelation, but I don't know how we would forward that information from the EsRelation to functions that require it (furthermore, I don't really know how we would distinguish between aggregate metric doubles vs numerics if we're handling both at the same time, like in implicit casting)
There was a problem hiding this comment.
No this is actually correct - we should be retrieving this info from the mapper.
Maybe follow up with a test where the default is set to min - and verify that it's used.
There was a problem hiding this comment.
Added a test (and manually verified that it's the right number)
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Show resolved
Hide resolved
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Hi @limotova, I've created a changelog YAML for you. |
|
It looks like Nik is going to be unavailable for a few days so I will try to merge this and hopefully we can iterate on it in the following days if necessary. |
…-err-message * upstream/main: (45 commits) Add sort field usage to telemetry (elastic#139530) ES|QL: Release CCS support for FORK (elastic#139630) Parameterize VectorSimilarityFunctionsTests on the similarity function (elastic#139516) fix broken links from ccs file move (elastic#139655) Update docs for v9.2.3 release (elastic#139479) Move tsdb bwc tests to x-pack/logsdb (elastic#139671) Fix FirstDocIdGroupingAggregatorFunction (elastic#139619) Fix release test for node_reduction profiling (elastic#139515) Unmute RestClientSingleHostIntegTests.testRequestResetAndAbort (elastic#139656) Unmute VerifyVersionConstantsIT.testLuceneVersionConstant (elastic#139644) Mute org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStoreRepositoryTests testReadNonExistingPath elastic#139665 Mute org.elasticsearch.smoketest.WatcherYamlRestIT test {p0=mustache/10_webhook/Test webhook action with mustache integration} elastic#139663 [ES|QL] Run aggregations on aggregate metric double with default metric (elastic#138647) Enable TDigest field mapper and ES|QL type (elastic#139607) Mute org.elasticsearch.index.mapper.HalfFloatSyntheticSourceNativeArrayIntegrationTests testSynthesizeArrayRandom elastic#139658 Explicitly cleanup test index with shared data path (elastic#136048) Mute org.elasticsearch.xpack.core.action.XPackUsageResponseTests testVersionDependentSerializationWriteToOldStream elastic#139576 Make XPackUsageResponseTests a wire serializing test case (elastic#139643) Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT test {p0=mixed_cluster/90_ml_data_frame_analytics_crud/Start and stop old regression job} elastic#139654 Relax error bounds for RandomizedTimeSeriesIT (elastic#139641) ...
This commit adds support for using the default metric of an aggregate metric double on aggregations.
For example in the query
FROM datastream | STATS std_dev(metric_field), thestd_devaggregation does not have native support for aggregate metric double, so we use the default metric instead (which is determined by the field mapping).Closes #136297
Known limitations/bugs: