Skip to content

ESQL: Correctly manage NULL data type for SUM#144942

Merged
astefan merged 8 commits intoelastic:mainfrom
astefan:144914_fix
Mar 26, 2026
Merged

ESQL: Correctly manage NULL data type for SUM#144942
astefan merged 8 commits intoelastic:mainfrom
astefan:144914_fix

Conversation

@astefan
Copy link
Copy Markdown
Contributor

@astefan astefan commented Mar 25, 2026

Right now SUM returns a DOUBLE when NULL is provided which is wrong, it should be NULL or LONG. This PR considers NULL output as the right and consistent with the rest of the functions that don't have a special behavior (COUNT is never returning null for example).

Fixes #144914
AI-assisted PR.

Double data type to the actual NULL data type.
null | null
;

multipleAggsOverNullExpressions
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests starting with this one were inspired by my previous unmerged PR #112392.

public DataType dataType() {
DataType dt = field().dataType();
if (dt == DataType.NULL) {
return DataType.NULL;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, a NULL data type should produce a NULL result. If it provides a LONG, it could very well provide a DOUBLE, I see no difference. Plus that a NULL result is consistent with other (most) of the functions we have now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this in #142657.

Generally, I think a NULL input should produce an output that is compatible with ("narrower" than) any of the outputs created if the NULL input was replaced by an input with a proper data type. The NULL type is (or should be) a bottom type in ESQL, so it should always be acceptable as output for a NULL input.

For SUM specifically, the output being either long or double, having a long output may also be acceptable. I think NULL is better (and more consistent with SQL!), but long is also acceptable because it is "narrower" (can be used in more places) than double; for instance, CASE(predicate, 1::long, x) requires x to be long, whereas for CASE(predicate, 1::double, x) both a double and a long are okay because CASE implicitly does a little bit of auto-casting.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The practical implication is that, for SUM(foo) used in a nullify query, some expressions depending on SUM(foo) start breaking if foo goes mapped->unmapped as long as SUM(null) is double. They shouldn't start breaking if SUM(null) is long or NULL.

@astefan astefan requested review from alex-spies and bpintea March 26, 2026 05:55
@astefan astefan marked this pull request as ready for review March 26, 2026 05:55
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 26, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @astefan, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @astefan !

public DataType dataType() {
DataType dt = field().dataType();
if (dt == DataType.NULL) {
return DataType.NULL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this in #142657.

Generally, I think a NULL input should produce an output that is compatible with ("narrower" than) any of the outputs created if the NULL input was replaced by an input with a proper data type. The NULL type is (or should be) a bottom type in ESQL, so it should always be acceptable as output for a NULL input.

For SUM specifically, the output being either long or double, having a long output may also be acceptable. I think NULL is better (and more consistent with SQL!), but long is also acceptable because it is "narrower" (can be used in more places) than double; for instance, CASE(predicate, 1::long, x) requires x to be long, whereas for CASE(predicate, 1::double, x) both a double and a long are okay because CASE implicitly does a little bit of auto-casting.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks right to me.

;

cost:double | time_bucket:datetime
cost:null | time_bucket:datetime
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, thank you!

Comment on lines +619 to +620
s:long | r:long
null | null
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtle, but looks right!

@astefan astefan added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Mar 26, 2026
@astefan astefan merged commit e5aed38 into elastic:main Mar 26, 2026
37 checks passed
@astefan astefan deleted the 144914_fix branch March 26, 2026 11:45
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 26, 2026
* upstream/main: (146 commits)
  Revert "[Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539)"
  Fix ArrayIndexOutOfBoundsException in fetch phase with partial results (elastic#144385)
  ESQL: Correctly manage NULL data type for SUM (elastic#144942)
  [ESQL] Fixes GroupedTopNBenchmark not executing (elastic#144944)
  Fix reader context leak when query response serialization fails (elastic#144708)
  Validate individual offset values in BULK_OFFSETS bounds checks (elastic#144643)
  Merge main21 source set into main in simdvec (elastic#144921)
  [TEST] Unmute TsidExtractingIdFieldMapperTests (elastic#144848)
  [Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539)
  Fix `ThreadedActionListenerTests#testRejectionHandling` (elastic#144795)
  Add new DLM Frozen Tier Transition execution plugin and service (elastic#144595)
  Prometheus: execute query_range via parsed EsqlStatement plan (elastic#144416)
  Investigate `testBulkIndexingRequestSplitting` failure (elastic#144766)
  Add test utility for wrapping directories in FilterDirectory layer (elastic#143563)
  Fix ES|QL decay tests with negative scale (elastic#144657)
  Fix circuit breaker leak in percolator query construction (elastic#144827)
  Use XPerFieldDocValuesFormat in AbstractTSDBSyntheticIdCodec (elastic#144744)
  [DOCS] Document how reindex work in CPS (elastic#144016)
  Fix Int4 vector library tests failing on Java 21 (elastic#144830)
  [DiskBBQ] Fix index sorting on flush (elastic#144938)
  ...
@bpintea
Copy link
Copy Markdown
Contributor

bpintea commented Mar 26, 2026

Nice.

seanzatzdev pushed a commit to seanzatzdev/elasticsearch that referenced this pull request Mar 26, 2026
* Correctly manage NULL data type for SUM by switching from returning a
Double data type to the actual NULL data type.
seanzatzdev pushed a commit to seanzatzdev/elasticsearch that referenced this pull request Mar 27, 2026
* Correctly manage NULL data type for SUM by switching from returning a
Double data type to the actual NULL data type.
mamazzol pushed a commit to mamazzol/elasticsearch that referenced this pull request Mar 30, 2026
* Correctly manage NULL data type for SUM by switching from returning a
Double data type to the actual NULL data type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ESQL] SUM(null) optimized incorrectly

4 participants