ESLQ: Make functions behave well under NULL and narrower types#142657
ESLQ: Make functions behave well under NULL and narrower types#142657alex-spies wants to merge 32 commits intoelastic:mainfrom
Conversation
702e53e to
224846b
Compare
|
Ok, a bunch of functions don't satisfy co-/contra-variance, yet: |
CASE, GREATES, LEAST need to have matching output branches. Narrowing only needs to be supported to NULL, not in general.
Some functions want consistent types across several input args. We should still have co+contravariance when narrowing all inputs that have to have the same type. CASE and several others have an additional constraint, which is that even in case of the required uniformity, KEYWORD arguments can still be narrowed to TEXT.
| issue: https://github.com/elastic/elasticsearch/issues/141234 | ||
| - class: org.elasticsearch.xpack.esql.expression.function.aggregate.SumTests | ||
| method: testCoAndContraVariance* | ||
| issue: https://github.com/elastic/elasticsearch/issues/142537 |
There was a problem hiding this comment.
Before we merge this, I plan to open more specific issues and point the mutes to them.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| } | ||
|
|
||
| Set<DataType> nonTrivial = NARROWER_TYPES_MAP.getOrDefault(this, Set.of()); | ||
| return Stream.concat(nonTrivial.stream(), Stream.of(NULL)).collect(Collectors.toUnmodifiableSet()); |
There was a problem hiding this comment.
Imho, the use of Streams here is overkill.
Set<DataType> nonTrivial = NARROWER_TYPES_MAP.getOrDefault(this, Set.of());
nonTrivial.add(NULL);
return Collections.unmodifiableSet(nonTrivial);
There was a problem hiding this comment.
Also, this Set is created every time the method is called. Can we get away with some static Sets for each type instead?
|
|
||
| @Override | ||
| protected void filterCoAndContraVarianceNarrowing(Map<Integer, DataType> positionNarrowing, List<TestCaseSupplier.TypedData> data) { | ||
| positionNarrowing.entrySet().removeIf(e -> e.getKey() > 0 && e.getValue() == DataType.NULL); |
There was a problem hiding this comment.
Extract the content of removeIf in a separate method and, also, re-use that in PercentileTests.
|
|
||
| @Override | ||
| protected void filterCoAndContraVarianceNarrowing(Map<Integer, DataType> positionNarrowing, List<TestCaseSupplier.TypedData> data) { | ||
| positionNarrowing.entrySet().removeIf(e -> e.getKey() == 1 && e.getValue() == DataType.NULL); |
There was a problem hiding this comment.
Same here about reusing the common code inside removeIf (common with what's in Percentile that is).
| NAME_OR_ALIAS_TO_TYPE = Collections.unmodifiableMap(map); | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
I think it would help to document here somewhere why some other numerics are not considered in WIDENING_TO, like unsigned_long -> double, float -> double, byte, geo data types (if even the conceptual widening makes sense).
| * Randomly narrows one or more input types using the candidates returned by {@code narrowerTypes}, | ||
| * choosing independently for each argument position. | ||
| */ | ||
| private void checkCoAndContraVariance(java.util.function.Function<DataType, Set<DataType>> narrowerTypes) { |
There was a problem hiding this comment.
I am wondering why a random approach is used here. Are there so many combinations that we can't test all of them?
There was a problem hiding this comment.
I think it's indeed quite a few tests. The function tests have many cases and take 1-2min each on my machine, I didn't dare to try and go through all combinations.
For a function that takes 2 double arguments, being exhaustive would mean that we need to test an additional 15 cases (double->long->int->NULL in 2 positions). Since there are so many test cases for most functions, the randomized approach seems sufficient; and when a function didn't satisfy the condition, I saw test failures quite consistently.
| // We allow TEXT to be used where KEYWORD is expected because we load text fields without analysis, so they behave like keywords. | ||
| // This is only expected to be relevant for fields that are mapped as both text and keyword, | ||
| // but it is simpler to allow this in general than to special case it just for those fields. | ||
| TEXT, |
There was a problem hiding this comment.
It's surprising to me to see the "widening" term being used for non-numerics as well. I don't know enough about how these days TEXT and KEYWORD are handled and what their particularities are in the context of ESQL (a field can be indexed and not stored and vice-versa, source not stored, doc_values enabled, text analyzed, keyword normalized etc.) If you are really certain there are no fishy scenarios about these two interchangeably, it's ok with me, but I am not confident enough on my knowledge about these two not to add this comment here :-)
| * org.elasticsearch.xpack.esql.core.expression.FieldAttribute}, which is more appropriate when the value doesn't | ||
| * necessarily correspond to an Elasticsearch field. | ||
| */ | ||
| public Expression asReference() { |
There was a problem hiding this comment.
Why does it matter if this is a ReferenceAttribute or FieldAttribute? Actually, why our tests make (now) a distinction between the two? Is this relevant to the test itself?

Closes #142537 if we decide this is the right way to go.
SET unmapped_fields="nullify"makes it so that any function/operator that can take an ES field as input, can now also be passed aNULL-typed reference attribute. Practically, that happens by injecting an... | EVAL field = NULL | ...at the beginning of the query. The intent is that queries will still be valid and run fine even if a field is nullified because it's missing from the mapping.Well, that only works if our functions/operators respect that. Most of them implicitly do, but others deviate in weird ways.
The crux here is that we implicitly have a widening hierarchy between types and
NULLis the bottom type that can be widened to any other type.So, we should expect that we can take any valid function/operator expression and replace any* of its arguments by
NULLor a narrower type and still have a valid expression (contra-variance). To not break other expressions that consume the function/operator, the new output type must at most have become narrower or stayed the same, but never widened (co-variance).Most of our functions already respect that. Let's enforce this with an added test.
*there are 2 exceptions:
PERCENTILENULL, like inCOALESCETODO:
Expression#dataType().This was created with help from Cursor (Opus 4.6 (Thinking)).