Make array_distinct compare elements with "is distinct"#11979
Make array_distinct compare elements with "is distinct"#11979findepi wants to merge 1 commit intoprestodb:masterfrom findepi:findepi/array-distinct
Conversation
|
@findepi, I'd like to see @haozhun review this. Specifically the method handle bits, and how they relate to the new calling conventions. Related to that, I am wondering if we can adapt the TypeSet code to alway work on an "equalitator" function based on block and position. I think we can adapt both version to that style. |
|
I think, but i may be wrong, that
That should be possible. Currently the problem is that equality is like
Do you know which of these should use "equals" and which should use "distinctness"? |
|
@findepi I am no expert on these functions, but to me I would guess all of them should be using |
|
@gerlou did the majority of work for support of elasticity around calling convention for I'll take over #11263 and wrap it up. With that, |
| assertFunction("ARRAY_DISTINCT(ARRAY [NULL, NULL])", new ArrayType(UNKNOWN), asList((Object) null)); | ||
| assertFunction("ARRAY_DISTINCT(ARRAY [NULL, NULL, NULL])", new ArrayType(UNKNOWN), asList((Object) null)); | ||
|
|
||
| // Indeterminate values |
There was a problem hiding this comment.
Can you add the test cases mentioned in #11977 as well?
|
@dain @haozhun @findepi correct me if I'm wrong. I don't quite know the history of TypedSet and the isDistinctFrom operator, but it looks to me that TypedSet was trying to support the "distinctness" semantics. This can be seen from how it treats null: It has special slot in hashtable for NULL, and a flag indicating the existence of NULL, and also added guard to check NULL values before calling type.equalTo() in positionEqualsPosition() (this guard takes the distinctness semantics instead of the equal to semantics). So I would think replacing the type.equalTo with isDistinctFrom (and remove the equalTo() call) is sufficient enough, unless there is future use case that requires the "equalTo" semantics. Another thought is: the wrong results issue (#11977) was caused by the (array.getPositionCount() == 2) optimization only. It called type.equalTo() without checking on null in front. The callsite of positionEqualsPosition() (which calls type.equalTo()) inside of TypedSet.getHashPositionOfElement() shouldn't be causing correctness issues, because it would throw "not supported" error if it sees NULL in composite data types, and for primitive types it has guard to check null in front. So if we want to fix the correctness issue before #11983 is merged, we can just add null check for that special case, or simply delete the if (array.getPositionCount() == 2) block. I can send PR for it if people think we want a quicker fix on the correctness issue. |
@yingsu00 it looks so. I listed all usages above (#11979 (comment)) and the plan is to make the class work in "distinctness" semantics always.
As you observed, this solves one known issue but not the other. (And would probably fix |
|
Superseded by trinodb/trino#559 |
Fixes #11963, fixes #11977