Scale doubles to floats when necessary to match the field by not-napoleon · Pull Request #78344 · elastic/elasticsearch

not-napoleon · 2021-09-27T18:37:56Z

This fixes a bug where the range aggregation always treats the range end points as doubles, even if the field value doesn't have enough resolution to fill a double. This was creating issues where the range would have a "more precise" approximation of an unrepresentable number than the field, causing the value to fall in the wrong bucket.

Note 1: This does not resolve the case where we have a long value that is not precisely representable as a double. Since the wire format sends the range bounds as doubles, by the time we get to where this fix is operating, we've already lost the precision to act on a big long. Fixing that problem will require a wire format change, and I'm not convinced it's worth it right now.

Note 2: This is probably still broken for ScaledFloats, since they don't implement NumberFieldType.

Resolves #77033

elasticmachine · 2021-09-27T18:37:59Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.aggregation/40_range.yml

nik9000 · 2021-09-27T18:41:53Z

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.aggregation/40_range.yml

         - {}

+---
+# Regression test for 77033


When I see this comment in 9 months I'm super unlikely to look up the issue- I'll either git-blame the code or puzzle it out from the title of the test or body of it. I don't think the link adds anything here.

yeah, okay. Do you also think it's not worth having a similar comment in the unit tests I added?

I don't. Honestly if you like the comments you can keep 'em. But I tend to skip those sorts of comments when reading.

nik9000 · 2021-09-27T18:49:51Z

server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceConfig.java

+     * If this is a numeric field backed values source type, return the type of the numeric field backing it.
+     * Otherwise return null.
+     */
+    public NumberFieldMapper.NumberType getNumericType() {


Should this be getNumberType instead? I'm always confused by the difference between NumericType and NumberType.

Or, should this be double round(double)? I think, maybe, NumberType is sort of a mapping concept. Or, at least, a NumberFieldMapper concept. Also, that way maybe it'd be possible to apply the fix to scaled_float without forcing it to implement some of the internals of NumberFieldMapper.

Yeah, it should be getNumberType, if we keep it. You raise good points about why we might not want to keep it though. NumberType is a mapping concept, and I don't feel great exposing it here. Especially since we're weirdly abusing its parse function to make this work. But I do think we need to delegate this to the Field Type somehow, and this was the most likely candidate I saw for that.

We sort of already have a method for this - rangeQuery and termQuery. They do the rounding we want. They just return queries we can't do anything with. I'd be ok with a new method on MappedFieldType that could round numeric bounds.

It looks to me that rangeQuery and termQuery just call parse, same as I'm doing here. I'm hesitant to add this to mapped field type, because it is a fundamentally number-oriented concept. So you'd have one more method that throws UnsupportedOperationException all over the place, except for a few special cases. I hate throwing UnsupportedOperationException; I feel like it breaks the whole idea of polymorphism. In this case especially, it feels like we're going through a lot of effort to support keeping ScaledFloatFieldType from having to extend NumberFieldType, which maybe feels like not a great goal?

Having said that, I do think exposing the NumberType is heavy handed. How would you feel about a method on NumberFieldType called double coercePrecision(Object) or something like that? It'd just call parse, but the meaning is a little clearer, I think (or at least it gives me a spot to drop javadoc and make the meaning clearer). It doesn't solve the scaled float case, but I'm not sure how big a deal that is?

Anyway, if you really feel like this belongs on MappedFieldType, I can do that too. WDYT?

rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.aggregation/40_range.yml

not-napoleon · 2021-10-05T21:07:02Z

@elasticmachine update branch

…eon/elasticsearch into 77033-range-float-rounding-bug

not-napoleon · 2021-10-06T18:25:41Z

@elasticmachine update branch

not-napoleon · 2021-10-06T19:44:56Z

@elasticmachine run elasticsearch-ci/1

not-napoleon · 2021-10-07T13:45:26Z

@elasticmachine run elasticsearch-ci/part-1

nik9000 · 2021-10-11T13:26:20Z

server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

+         * on the double in the case that the backing number type would have parsed the value differently.  This is to address
+         * the problem where (e.g.) 0.04F &lt; 0.04D, which causes problems for range aggregations.
+         */
+        public double coerceToDouble(Double value) {


Maybe reduceToStoredPrecision or something? That gives us a hint about why it's important.

Also, it's kind of nice to explain the why in the first sentence of the javadoc because it's what I get when I mouse over. I get the sentiment that it's weird though.

Does it make sense for this to take a double? I know parse wants an Object when I see Double in the signature I tend to think it explicitly handles null somehow but this really doesn't.

nik9000 · 2021-10-11T13:30:02Z

...rc/main/java/org/elasticsearch/search/aggregations/bucket/range/RangeAggregationBuilder.java

-            Double from = range.from;
-            Double to = range.to;
+            // Trying to parse infinite values into ints/longs throws. Understandably.
+            Double from = Double.isFinite(range.from) ? fixPrecision.applyAsDouble(range.from) : range.from;


Would it be clearer to move isFinite "up"? Like into the field mapper or into the config or something? Or just into the ternary above?

I think the field mapper makes the most sense, looking at it now. I'll do that.

nik9000 · 2021-10-11T13:31:35Z

server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceConfig.java

+        if (fieldContext() != null && fieldType() instanceof NumberFieldMapper.NumberFieldType) {
+            return ((NumberFieldMapper.NumberFieldType) fieldType())::coerceToDouble;
+        }
+        return null;


I think it'd be cleaner if this returned the identity function. That way you don't have to deal with null on the caller. I don't think you ever need to know its identity?

nik9000 · 2021-10-11T13:32:17Z

...r/src/test/java/org/elasticsearch/search/aggregations/bucket/range/RangeAggregatorTests.java

        }, verify, fieldType);
    }

+    /*


Delete this now?

Thanks. I need to get into your NOCOMMIT habit.

not-napoleon · 2021-10-11T14:43:22Z

@elasticmachine update branch

) This fixes a bug where the range aggregation always treats the range end points as doubles, even if the field value doesn't have enough resolution to fill a double. This was creating issues where the range would have a "more precise" approximation of an unrepresentable number than the field, causing the value to fall in the wrong bucket. Note 1: This does not resolve the case where we have a long value that is not precisely representable as a double. Since the wire format sends the range bounds as doubles, by the time we get to where this fix is operating, we've already lost the precision to act on a big long. Fixing that problem will require a wire format change, and I'm not convinced it's worth it right now. Note 2: This is probably still broken for ScaledFloats, since they don't implement NumberFieldType. Resolves elastic#77033

…) (#78932) * Scale doubles to floats when necessary to match the field (#78344) This fixes a bug where the range aggregation always treats the range end points as doubles, even if the field value doesn't have enough resolution to fill a double. This was creating issues where the range would have a "more precise" approximation of an unrepresentable number than the field, causing the value to fall in the wrong bucket. Note 1: This does not resolve the case where we have a long value that is not precisely representable as a double. Since the wire format sends the range bounds as doubles, by the time we get to where this fix is operating, we've already lost the precision to act on a big long. Fixing that problem will require a wire format change, and I'm not convinced it's worth it right now. Note 2: This is probably still broken for ScaledFloats, since they don't implement NumberFieldType. Resolves #77033

not-napoleon added 5 commits September 20, 2021 16:18

Remove duplicate testcase method

3dc8009

Some tests for rounding errors

29a48e6

fix test data so it actually hits the bug

ae7b99b

fix test data so it actually hits the bug

68f06bc

fix parsed precision to match field when possible

fb1d2b7

not-napoleon added >bug :Analytics/Aggregations Aggregations v8.0.0 v7.16.0 auto-backport-and-merge labels Sep 27, 2021

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 27, 2021

not-napoleon changed the title ~~77033 range float rounding bug~~ Scale doubles to floats when necessary to match the field Sep 27, 2021

nik9000 reviewed Sep 27, 2021

View reviewed changes

not-napoleon added 7 commits September 28, 2021 15:02

Add tests for half float

f0473e4

Merge branch 'master' into 77033-range-float-rounding-bug

eddb120

Don't expose all of NumberType

ada9859

BWC dance

8da9d94

fix number range agg test

a4d9d10

fix half float values

d984761

BWC dance, this time with correct syntax

2dc5102

elasticmachine and others added 4 commits October 6, 2021 08:07

Merge branch 'master' into 77033-range-float-rounding-bug

70d9b48

fix spotless

739a57f

Merge branch '77033-range-float-rounding-bug' of github.com:not-napol…

a1b7273

…eon/elasticsearch into 77033-range-float-rounding-bug

Disable a test for now, for BWC

65d0369

Merge branch 'master' into 77033-range-float-rounding-bug

a781522

not-napoleon removed the auto-backport-and-merge label Oct 6, 2021

nik9000 reviewed Oct 11, 2021

View reviewed changes

response to PR feedback

b8d24c1

nik9000 approved these changes Oct 11, 2021

View reviewed changes

Merge branch 'master' into 77033-range-float-rounding-bug

2e11b89

not-napoleon merged commit b968fcb into elastic:master Oct 11, 2021

not-napoleon deleted the 77033-range-float-rounding-bug branch October 11, 2021 15:53

walterra mentioned this pull request Oct 13, 2021

[ML] APM Correlations: Round duration values to be used in range aggregations. elastic/kibana#114833

Merged

2 tasks

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

walterra mentioned this pull request Oct 28, 2021

[ML] APM Correlations: Fix percentiles values. elastic/kibana#116639

Merged

2 tasks

not-napoleon mentioned this pull request Dec 14, 2021

Long.MIN_VALUE considered out of range for type long in range aggregations #81529

Closed

Conversation

not-napoleon commented Sep 27, 2021

Uh oh!

elasticmachine commented Sep 27, 2021

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

not-napoleon commented Oct 5, 2021

Uh oh!

not-napoleon commented Oct 6, 2021

Uh oh!

not-napoleon commented Oct 6, 2021

Uh oh!

not-napoleon commented Oct 7, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

not-napoleon commented Oct 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants