Scale doubles to floats when necessary to match the field#78344
Scale doubles to floats when necessary to match the field#78344not-napoleon merged 19 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.aggregation/40_range.yml
Show resolved
Hide resolved
| - {} | ||
|
|
||
| --- | ||
| # Regression test for 77033 |
There was a problem hiding this comment.
When I see this comment in 9 months I'm super unlikely to look up the issue- I'll either git-blame the code or puzzle it out from the title of the test or body of it. I don't think the link adds anything here.
There was a problem hiding this comment.
yeah, okay. Do you also think it's not worth having a similar comment in the unit tests I added?
There was a problem hiding this comment.
I don't. Honestly if you like the comments you can keep 'em. But I tend to skip those sorts of comments when reading.
| * If this is a numeric field backed values source type, return the type of the numeric field backing it. | ||
| * Otherwise return null. | ||
| */ | ||
| public NumberFieldMapper.NumberType getNumericType() { |
There was a problem hiding this comment.
Should this be getNumberType instead? I'm always confused by the difference between NumericType and NumberType.
Or, should this be double round(double)? I think, maybe, NumberType is sort of a mapping concept. Or, at least, a NumberFieldMapper concept. Also, that way maybe it'd be possible to apply the fix to scaled_float without forcing it to implement some of the internals of NumberFieldMapper.
There was a problem hiding this comment.
Yeah, it should be getNumberType, if we keep it. You raise good points about why we might not want to keep it though. NumberType is a mapping concept, and I don't feel great exposing it here. Especially since we're weirdly abusing its parse function to make this work. But I do think we need to delegate this to the Field Type somehow, and this was the most likely candidate I saw for that.
There was a problem hiding this comment.
We sort of already have a method for this - rangeQuery and termQuery. They do the rounding we want. They just return queries we can't do anything with. I'd be ok with a new method on MappedFieldType that could round numeric bounds.
There was a problem hiding this comment.
It looks to me that rangeQuery and termQuery just call parse, same as I'm doing here. I'm hesitant to add this to mapped field type, because it is a fundamentally number-oriented concept. So you'd have one more method that throws UnsupportedOperationException all over the place, except for a few special cases. I hate throwing UnsupportedOperationException; I feel like it breaks the whole idea of polymorphism. In this case especially, it feels like we're going through a lot of effort to support keeping ScaledFloatFieldType from having to extend NumberFieldType, which maybe feels like not a great goal?
Having said that, I do think exposing the NumberType is heavy handed. How would you feel about a method on NumberFieldType called double coercePrecision(Object) or something like that? It'd just call parse, but the meaning is a little clearer, I think (or at least it gives me a spot to drop javadoc and make the meaning clearer). It doesn't solve the scaled float case, but I'm not sure how big a deal that is?
Anyway, if you really feel like this belongs on MappedFieldType, I can do that too. WDYT?
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.aggregation/40_range.yml
Show resolved
Hide resolved
|
@elasticmachine update branch |
…eon/elasticsearch into 77033-range-float-rounding-bug
|
@elasticmachine update branch |
|
@elasticmachine run elasticsearch-ci/1 |
|
@elasticmachine run elasticsearch-ci/part-1 |
| * on the double in the case that the backing number type would have parsed the value differently. This is to address | ||
| * the problem where (e.g.) 0.04F < 0.04D, which causes problems for range aggregations. | ||
| */ | ||
| public double coerceToDouble(Double value) { |
There was a problem hiding this comment.
Maybe reduceToStoredPrecision or something? That gives us a hint about why it's important.
Also, it's kind of nice to explain the why in the first sentence of the javadoc because it's what I get when I mouse over. I get the sentiment that it's weird though.
Does it make sense for this to take a double? I know parse wants an Object when I see Double in the signature I tend to think it explicitly handles null somehow but this really doesn't.
| Double from = range.from; | ||
| Double to = range.to; | ||
| // Trying to parse infinite values into ints/longs throws. Understandably. | ||
| Double from = Double.isFinite(range.from) ? fixPrecision.applyAsDouble(range.from) : range.from; |
There was a problem hiding this comment.
Would it be clearer to move isFinite "up"? Like into the field mapper or into the config or something? Or just into the ternary above?
There was a problem hiding this comment.
I think the field mapper makes the most sense, looking at it now. I'll do that.
| if (fieldContext() != null && fieldType() instanceof NumberFieldMapper.NumberFieldType) { | ||
| return ((NumberFieldMapper.NumberFieldType) fieldType())::coerceToDouble; | ||
| } | ||
| return null; |
There was a problem hiding this comment.
I think it'd be cleaner if this returned the identity function. That way you don't have to deal with null on the caller. I don't think you ever need to know its identity?
| }, verify, fieldType); | ||
| } | ||
|
|
||
| /* |
There was a problem hiding this comment.
Thanks. I need to get into your NOCOMMIT habit.
|
@elasticmachine update branch |
) This fixes a bug where the range aggregation always treats the range end points as doubles, even if the field value doesn't have enough resolution to fill a double. This was creating issues where the range would have a "more precise" approximation of an unrepresentable number than the field, causing the value to fall in the wrong bucket. Note 1: This does not resolve the case where we have a long value that is not precisely representable as a double. Since the wire format sends the range bounds as doubles, by the time we get to where this fix is operating, we've already lost the precision to act on a big long. Fixing that problem will require a wire format change, and I'm not convinced it's worth it right now. Note 2: This is probably still broken for ScaledFloats, since they don't implement NumberFieldType. Resolves elastic#77033
…) (#78932) * Scale doubles to floats when necessary to match the field (#78344) This fixes a bug where the range aggregation always treats the range end points as doubles, even if the field value doesn't have enough resolution to fill a double. This was creating issues where the range would have a "more precise" approximation of an unrepresentable number than the field, causing the value to fall in the wrong bucket. Note 1: This does not resolve the case where we have a long value that is not precisely representable as a double. Since the wire format sends the range bounds as doubles, by the time we get to where this fix is operating, we've already lost the precision to act on a big long. Fixing that problem will require a wire format change, and I'm not convinced it's worth it right now. Note 2: This is probably still broken for ScaledFloats, since they don't implement NumberFieldType. Resolves #77033
This fixes a bug where the range aggregation always treats the range end points as doubles, even if the field value doesn't have enough resolution to fill a double. This was creating issues where the range would have a "more precise" approximation of an unrepresentable number than the field, causing the value to fall in the wrong bucket.
Note 1: This does not resolve the case where we have a long value that is not precisely representable as a double. Since the wire format sends the range bounds as doubles, by the time we get to where this fix is operating, we've already lost the precision to act on a big long. Fixing that problem will require a wire format change, and I'm not convinced it's worth it right now.
Note 2: This is probably still broken for
ScaledFloats, since they don't implementNumberFieldType.Resolves #77033