-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: inconsistent scalar types in DistinctArrayAggAccumulator
state
#7385
Conversation
@@ -1280,6 +1280,10 @@ NULL NULL 781 7.81 125 -117 100 | |||
# ---- | |||
# [4, 2, 3, 5, 1] | |||
|
|||
# additional count(1) forces array_agg_distinct instead of array_agg over aggregated by c2 data | |||
statement ok | |||
SELECT array_agg(distinct c2), count(1) FROM aggregate_test_100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify the output as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we don't have array_sort / list_sort function yet, and output of array_agg(distinct) is non-deterministic, so absence of failures is the best option of SQL-level check (as I can see it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about unnesting and then using order by
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done using cross-join and CTE for array indices
Co-authored-by: Metehan Yıldırım <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me -- thank you @korowa
(0..array.len()).try_for_each(|i| { | ||
let scalar = ScalarValue::try_from_array(array, i)?; | ||
if let ScalarValue::List(Some(values), _) = scalar { | ||
self.values.extend(values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 extend should do the deduplication. 👍
# ---- | ||
# [4, 2, 3, 5, 1] | ||
query III | ||
WITH indices AS ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make an input that has at least one non distinct value too -- not just 1,2,3,4,5 but maybe repeat the value 1
a few times:
SELECT 1 AS idx UNION ALL
SELECT 1 AS idx UNION ALL
SELECT 1 AS idx UNION ALL
SELECT 2 AS idx UNION ALL
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @korowa and @metesynnada |
Which issue does this PR close?
Closes #6743.
Rationale for this change
Currently
DistinctArrayAggAccumulator
stores its state as a set of scalar values and returns it as a single scalar list. The problem is that while merging state, accumulator, instead of adding list elements, simply adds the whole list to the state as a single value, which causes errors while executingevaluate
on accumulator with merged state.What changes are included in this PR?
merge_state
now unpacks scalar list input and updates state with its elements.Are these changes tested?
DistinctArrayAggAccumulator
s have been added,array_agg(distinct col)
has been added to sqllogictestsAre there any user-facing changes?
array_agg(distinct)
should not fail in case query execution requires merging accumulator states (e. g. aggregation over multiple partitions)