Skip to content

Make map and array functions use IS DISTINCT semantics as appropriate#9841

Merged
losipiuk merged 5 commits intotrinodb:masterfrom
jirassimok:fix-array-functions-distinctness
Apr 7, 2022
Merged

Make map and array functions use IS DISTINCT semantics as appropriate#9841
losipiuk merged 5 commits intotrinodb:masterfrom
jirassimok:fix-array-functions-distinctness

Conversation

@jirassimok
Copy link
Member

@jirassimok jirassimok commented Nov 3, 2021

This addresses #560.

Release note:

# General
* Compare with "is distinct" semantics in `array_except`, `array_union`, `map_concat`, `map_from_entries`, `multimap_from_entries`, and `multimap_agg`. ({issue}`560`)

@cla-bot cla-bot bot added the cla-signed label Nov 3, 2021
@jirassimok jirassimok force-pushed the fix-array-functions-distinctness branch 4 times, most recently from efb047a to 8d227cc Compare November 7, 2021 22:17
@jirassimok jirassimok marked this pull request as ready for review November 8, 2021 14:38
@jirassimok
Copy link
Member Author

jirassimok commented Nov 8, 2021

I've split the refactor commits into #9901; only the last four commits here are part of this PR.

(The failing test is related to that PR, not this one.)

@jirassimok jirassimok mentioned this pull request Nov 8, 2021
@jirassimok jirassimok requested a review from findepi November 8, 2021 14:46
@findepi
Copy link
Member

findepi commented Nov 8, 2021

I've split the refactor commits into #9901; only the last four commits here are part of this PR.

thanks

@jirassimok
Copy link
Member Author

jirassimok commented Nov 8, 2021

I'm not actually sure we should make this change for some of the Map functions (unless we also define Maps as using distinctness instead of equality for key lookups, which I don't think is an especially good idea).

Alternatively, we could make the map functions check for not-equal-but-not-distinct values and fail on them, which I think might be better (though it would make the functions a bit slower to maintain a second TypedSet for that).

(Right now, the only not-equal-but-not-distinct value I know of is NaN, and floating-point types shouldn't really be used as map keys anyway because they don't do equality comparison well.)

@jirassimok jirassimok force-pushed the fix-array-functions-distinctness branch 2 times, most recently from 9a3407b to 09b2683 Compare November 16, 2021 19:15
@jirassimok jirassimok force-pushed the fix-array-functions-distinctness branch from 09b2683 to cc1722b Compare November 18, 2021 16:50
@findepi findepi force-pushed the fix-array-functions-distinctness branch from cc1722b to 4c61099 Compare November 23, 2021 09:09
@findepi
Copy link
Member

findepi commented Nov 23, 2021

rebased after #9905 merged.

@findepi findepi requested review from dain and martint November 23, 2021 09:09
@jirassimok
Copy link
Member Author

@martint @dain What do you think?

@dain
Copy link
Member

dain commented Dec 7, 2021

This looks directionally correct and nicely simple :)

@jirassimok jirassimok force-pushed the fix-array-functions-distinctness branch from 4c61099 to 87014bf Compare December 8, 2021 21:55
@jirassimok
Copy link
Member Author

Rebased and added a test for multimap_agg.

There still is no test for MapToMapCast because I can't think of a good way to test it—I don't know of any values that change distinctness when cast to another type.

@findepi
Copy link
Member

findepi commented Dec 9, 2021

I don't know of any values that change distinctness when cast to another type.

double to long ( input 1.1 and 1.2) or vice versa (input: certain consecutive values over 2^53)

@jirassimok
Copy link
Member Author

Casting to duplicate keys is already tested, so I guess I don't need to add a new test for that here.

@jirassimok jirassimok force-pushed the fix-array-functions-distinctness branch from 87014bf to 4da89e5 Compare December 16, 2021 18:35
@jirassimok
Copy link
Member Author

Rebased onto master.

@jirassimok
Copy link
Member Author

@martint Look good to you?

@jirassimok jirassimok force-pushed the fix-array-functions-distinctness branch from 4da89e5 to d234d77 Compare December 21, 2021 14:32
@jirassimok jirassimok force-pushed the fix-array-functions-distinctness branch from d234d77 to 16b1c21 Compare January 11, 2022 15:28
@joshthoward
Copy link
Member

@jirassimok I had an offline discussion with @dain and he said that while he thought it looked directionally correct he wouldn't have time to complete a full review.

@findepi or @losipiuk could either of you provide a review to close this issue?

CC: @martint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can somewhat test it with:

select cast(map(array['NaN', 'NaN '], array[1, 2]) as map<double, bigint>);

It will fail now. It used to not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test.

@losipiuk
Copy link
Member

Code looks good for me. I would love to learn more about motivation. It is not obvious to me what semantics should be actually, hence why is this a fix for "correctness" issue.
Also: #560 (comment)

@findepi please comment.

@findepi
Copy link
Member

findepi commented Jan 27, 2022

Code looks good for me. I would love to learn more about motivation. It is not obvious to me what semantics should be actually, hence why is this a fix for "correctness" issue. Also: #560 (comment)

@findepi please comment.

that's an old conversion... this is @dain 's comment prestodb/presto#11979 (comment)
couple other references you can find @martint involvement there as well
prestodb/presto#11963 prestodb/presto#11978 prestodb/presto#11979 prestodb/presto#11977 prestodb/presto#12080

@jirassimok jirassimok force-pushed the fix-array-functions-distinctness branch 3 times, most recently from 7806271 to c4b5c06 Compare February 2, 2022 16:47
@jirassimok
Copy link
Member Author

Rebased onto master.

@jirassimok jirassimok requested a review from losipiuk February 7, 2022 15:14
@losipiuk
Copy link
Member

please do rebase

@jirassimok jirassimok force-pushed the fix-array-functions-distinctness branch from c4b5c06 to d805c3e Compare February 11, 2022 20:49
@jirassimok jirassimok force-pushed the fix-array-functions-distinctness branch from d805c3e to 73eb50b Compare March 23, 2022 15:47
@jirassimok
Copy link
Member Author

@dain @losipiuk I believe this is ready to merge.

@losipiuk
Copy link
Member

@dain @losipiuk I believe this is ready to merge.

Yeah I think. @jirassimok can you update PR description to use tample (specifically putting release notes entry there) :)

@jirassimok
Copy link
Member Author

Added release note to PR description.

@losipiuk losipiuk force-pushed the fix-array-functions-distinctness branch from 73eb50b to 8ae6fe0 Compare April 7, 2022 10:16
@losipiuk
Copy link
Member

losipiuk commented Apr 7, 2022

rebased

@losipiuk losipiuk merged commit c3959f8 into trinodb:master Apr 7, 2022
@github-actions github-actions bot added this to the 376 milestone Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants