Use distinct semantics for distincting array functions#22938
Use distinct semantics for distincting array functions#22938rschlussel merged 6 commits intoprestodb:masterfrom
Conversation
48599aa to
6f300e5
Compare
mbasmanova
left a comment
There was a problem hiding this comment.
It would be nice to update documentation to clarify these semantics.
6f300e5 to
f616b1f
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the doc! Only a few nits, looks good.
f616b1f to
8ee28ba
Compare
|
Thanks for the review @steveburnett. I've addressed your comments. |
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the doc update! Two, I think final, nits.
8ee28ba to
ae1f483
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local docs build, everything looks good. Thanks!
| this.blockBuilders[channelIndex] = type.createBlockBuilder(null, EXPECTED_BLOCK_BUILDER_SIZE); | ||
| this.valueSets[channelIndex] = new TypedSet( | ||
| type, | ||
| Optional.empty(), |
There was a problem hiding this comment.
Do we ever collect dynamic filters for complex types (e.g.: rows?)
There was a problem hiding this comment.
I'm not sure, but dynamic filter needs to match join semantics, and joins fail on indeterminate arrays/rows, so dynamic filter should do the same.
examples:
presto> with input as (SELECT * FROM (VALUES (ARRAY[null], 2), (ARRAY[null], 3))t(x, y)) SELECT * from input i1 join input i2 on i1.x = i2.x;
Query 20240614_131341_75681_9w2fv failed: ARRAY comparison not supported for arrays with null elements
presto> with input as (SELECT * FROM (VALUES (ROW(null), 2), (ROW(null), 3))t(x, y)) SELECT * from input i1 join input i2 on i1.x = i2.x;
Query 20240614_131414_75748_9w2fv failed: ROW comparison not supported for fields with null elements
| private static final SelectedPositions EMPTY_SELECTED_POSITIONS = positionsList(new int[0], 0, 0); | ||
|
|
||
| private final Type elementType; | ||
| private final Optional<MethodHandle> elementIsDistinctFrom; |
There was a problem hiding this comment.
How difficult would it be to make it required and always use isDistinctFrom?
There was a problem hiding this comment.
I didn't do it because this is also used for MapConcatFunction, and maps don't support indeterminate types, but it's pretty easy to change, and the maps should fail later anyway. I can make this change.
There was a problem hiding this comment.
actually, map_concat doesn't otherwise check that map keys are not indeterminate or null (caused failing tests because of this), and it's a bit challenging to get it to do that because it's a varargs function. I'm going to put this back to the way it was, and look into improving it as a follow up.
| MethodHandle elementIsDistinctFrom) | ||
| { | ||
| this.arrayType = new ArrayType(elementType); | ||
| this.elementIsDistinctFrom = elementIsDistinctFrom; |
| private final MethodHandle elementIsDistinctFrom; | ||
|
|
||
| public SetAggregationStateSerializer(@TypeParameter("T") Type elementType) | ||
| public SetAggregationStateSerializer(Type elementType, |
ae1f483 to
4d967ca
Compare
4d967ca to
5d8c61d
Compare
5d8c61d to
d09549d
Compare
|
We need a session param - this change is extensive and subtle NULL semantics that people might be depending on. So we need a way to use old semantics. |
Makes sense. I'll add a property The previous semantics were pretty weird, so I'll make the property true by default. (Previously, regular null values in the input were fine and would be deduplicated, but nulls nested inside an array or row would give an error) |
|
After further discussion with Sreeni, we agreed that a session property should not be necessary here, as the previous behavior was pretty strange and unlikely to be desired (the difference is previously we were failing for some cases of nulls in the input, and now we will handle those according to expected distinct semantics). there is nothing that was succeeding that will have different results. |
Use distinct semantics rather than equal to semantics for array_distinct. This fixes an issue where arrays of rows or arrays that had null fields/elements would fail with a message that comparison of rows with null elements was not allowed. Based on: trinodb/trino@d1a50dc co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Consolidate array_intersect tests into TestArrayIntersectFunction. Split up the giant test into smaller tests. There are no changes to the test cases themselves.
Makes it align with array_intersect and array_except so it's easy to find where the tests are.
Use distinct semantics rather than equal to semantics for array_intersect, array_except, and array_union. This fixes failures where Arrays of rows that had null fields would fail with errors that comparison of rows with null elements were not allowed
Use distinct semantics rather than equal to semantics for set_agg and set_union. This fixes an issue where arrays of rows or arrays that had null fields/elements would fail with a message that comparison of rows with null elements was not allowed. This change required generating the set_agg and set_union function directly rather than using annotation based generation because we needed to add an argument to SetAggregationStateSerializer
Add information about how some functions handle inputs from rows with null fields. Also fix a couple formatting issues.
d09549d to
600d474
Compare
Description
change the following functions to use distinct semantics for comparison rather than equal to
I did not change arrays_overlap as part of this PR because it has its own semantics where it returns null for any null input (but gives an error for null inside of rows), and so we probably need to think more about what behavior we want from that function.
Motivation and Context
Previously queries would fail with an error " ROW comparison not supported for fields with null elements" when there was an array of rows where any field was null or an array of arrays where any element was null. However, there would be no failure if nulls were included more directly (e.g. an array with nulls rather than an array of arrays with nulls). These functions are distincting functions, and so should be using full distinct semantics.
Fixes #22900
Impact
The above functions will no longer throw an error for nulls anywhere in the input. The functions will be using distinct semantics for comparison
Test Plan
unit tests
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.