ESQL: Correctly manage NULL data type for SUM#144942
Conversation
Double data type to the actual NULL data type.
| null | null | ||
| ; | ||
|
|
||
| multipleAggsOverNullExpressions |
There was a problem hiding this comment.
Tests starting with this one were inspired by my previous unmerged PR #112392.
| public DataType dataType() { | ||
| DataType dt = field().dataType(); | ||
| if (dt == DataType.NULL) { | ||
| return DataType.NULL; |
There was a problem hiding this comment.
IMHO, a NULL data type should produce a NULL result. If it provides a LONG, it could very well provide a DOUBLE, I see no difference. Plus that a NULL result is consistent with other (most) of the functions we have now.
There was a problem hiding this comment.
I thought about this in #142657.
Generally, I think a NULL input should produce an output that is compatible with ("narrower" than) any of the outputs created if the NULL input was replaced by an input with a proper data type. The NULL type is (or should be) a bottom type in ESQL, so it should always be acceptable as output for a NULL input.
For SUM specifically, the output being either long or double, having a long output may also be acceptable. I think NULL is better (and more consistent with SQL!), but long is also acceptable because it is "narrower" (can be used in more places) than double; for instance, CASE(predicate, 1::long, x) requires x to be long, whereas for CASE(predicate, 1::double, x) both a double and a long are okay because CASE implicitly does a little bit of auto-casting.
There was a problem hiding this comment.
The practical implication is that, for SUM(foo) used in a nullify query, some expressions depending on SUM(foo) start breaking if foo goes mapped->unmapped as long as SUM(null) is double. They shouldn't start breaking if SUM(null) is long or NULL.
|
Hi @astefan, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
alex-spies
left a comment
There was a problem hiding this comment.
LGTM, thanks @astefan !
| public DataType dataType() { | ||
| DataType dt = field().dataType(); | ||
| if (dt == DataType.NULL) { | ||
| return DataType.NULL; |
There was a problem hiding this comment.
I thought about this in #142657.
Generally, I think a NULL input should produce an output that is compatible with ("narrower" than) any of the outputs created if the NULL input was replaced by an input with a proper data type. The NULL type is (or should be) a bottom type in ESQL, so it should always be acceptable as output for a NULL input.
For SUM specifically, the output being either long or double, having a long output may also be acceptable. I think NULL is better (and more consistent with SQL!), but long is also acceptable because it is "narrower" (can be used in more places) than double; for instance, CASE(predicate, 1::long, x) requires x to be long, whereas for CASE(predicate, 1::double, x) both a double and a long are okay because CASE implicitly does a little bit of auto-casting.
| ; | ||
|
|
||
| cost:double | time_bucket:datetime | ||
| cost:null | time_bucket:datetime |
| s:long | r:long | ||
| null | null |
There was a problem hiding this comment.
Subtle, but looks right!
* upstream/main: (146 commits) Revert "[Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539)" Fix ArrayIndexOutOfBoundsException in fetch phase with partial results (elastic#144385) ESQL: Correctly manage NULL data type for SUM (elastic#144942) [ESQL] Fixes GroupedTopNBenchmark not executing (elastic#144944) Fix reader context leak when query response serialization fails (elastic#144708) Validate individual offset values in BULK_OFFSETS bounds checks (elastic#144643) Merge main21 source set into main in simdvec (elastic#144921) [TEST] Unmute TsidExtractingIdFieldMapperTests (elastic#144848) [Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539) Fix `ThreadedActionListenerTests#testRejectionHandling` (elastic#144795) Add new DLM Frozen Tier Transition execution plugin and service (elastic#144595) Prometheus: execute query_range via parsed EsqlStatement plan (elastic#144416) Investigate `testBulkIndexingRequestSplitting` failure (elastic#144766) Add test utility for wrapping directories in FilterDirectory layer (elastic#143563) Fix ES|QL decay tests with negative scale (elastic#144657) Fix circuit breaker leak in percolator query construction (elastic#144827) Use XPerFieldDocValuesFormat in AbstractTSDBSyntheticIdCodec (elastic#144744) [DOCS] Document how reindex work in CPS (elastic#144016) Fix Int4 vector library tests failing on Java 21 (elastic#144830) [DiskBBQ] Fix index sorting on flush (elastic#144938) ...
|
Nice. |
* Correctly manage NULL data type for SUM by switching from returning a Double data type to the actual NULL data type.
* Correctly manage NULL data type for SUM by switching from returning a Double data type to the actual NULL data type.
* Correctly manage NULL data type for SUM by switching from returning a Double data type to the actual NULL data type.
Right now SUM returns a DOUBLE when NULL is provided which is wrong, it should be NULL or LONG. This PR considers
NULLoutput as the right and consistent with the rest of the functions that don't have a special behavior (COUNTis never returningnullfor example).Fixes #144914
AI-assisted PR.