Skip to content

Fix bugs in duplicate items in UNNEST expression#18909

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:error_for_duplicate_unnest_item
Jan 12, 2023
Merged

Fix bugs in duplicate items in UNNEST expression#18909
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:error_for_duplicate_unnest_item

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Jan 9, 2023

What's the change?

Current UNNEST node will fail if there are duplicate expressions in UNNEST, for example, query SELECT * from (select * FROM (values 1) as t(k)) CROSS JOIN unnest(array[2, 3], ARRAY[2, 3]) AS r(r1, r2). This is because we use a map to store the mapping between expressions in UNNEST and output symbols. Duplicate expressions will produce the same key for the map.

In this PR, I skip the duplicate items in the unnest node, and add a projection node over the output of the UNNEST node for the output of the skipped items.

This basically is a rewrite from
SELECT * from (select * FROM (values 1) as t(k)) CROSS JOIN unnest(array[2, 3], ARRAY[2, 3]) AS r(r1, r2)
to
SELECT *, r1 as r2 from (select * FROM (values 1) as t(k)) CROSS JOIN unnest(array[2, 3]) AS r(r1)

Test plan - (Please fill in how you tested your changes)

Added unit test.

== RELEASE NOTES ==

General Changes
* Fix the bug where duplicate items in UNNEST leads to query compilation failure

@feilong-liu feilong-liu requested a review from a team as a code owner January 9, 2023 22:53
@feilong-liu feilong-liu requested a review from presto-oss January 9, 2023 22:53
@feilong-liu feilong-liu force-pushed the error_for_duplicate_unnest_item branch 3 times, most recently from 22be41d to ddc749b Compare January 10, 2023 00:58
@feilong-liu feilong-liu marked this pull request as draft January 10, 2023 01:28
@feilong-liu feilong-liu force-pushed the error_for_duplicate_unnest_item branch 4 times, most recently from 46ea0ea to 3a2e818 Compare January 11, 2023 00:20
@feilong-liu feilong-liu requested a review from kaikalur January 11, 2023 00:50
@feilong-liu feilong-liu marked this pull request as ready for review January 11, 2023 00:50
Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the Simple UNNEST. It seems to work OK but good to add test

SELET * from UNNEST(ARRAY[1,2], ARRAY[1,2])

@feilong-liu feilong-liu force-pushed the error_for_duplicate_unnest_item branch 3 times, most recently from d687145 to f264b69 Compare January 12, 2023 00:02
@feilong-liu feilong-liu requested a review from kaikalur January 12, 2023 00:55
@feilong-liu feilong-liu force-pushed the error_for_duplicate_unnest_item branch 2 times, most recently from f18ad4f to a1f3193 Compare January 12, 2023 01:45
Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kaikalur kaikalur requested a review from rschlussel January 12, 2023 02:51
@feilong-liu feilong-liu force-pushed the error_for_duplicate_unnest_item branch from a1f3193 to 3171dad Compare January 12, 2023 06:07
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just use a multimap instead?

@kaikalur
Copy link
Contributor

Why can't we just use a multimap instead?

That was my first suggestion too :) except there is a canonicalize method that removes them further down.

@rschlussel
Copy link
Contributor

rschlussel commented Jan 12, 2023

Why can't we just use a multimap instead?

That was my first suggestion too :) except there is a canonicalize method that removes them further down.

can you clarify this more? Would it make more sense to fix that to be able to support the duplicate entries? Just seems weird to make plan choices based on the data structure representations we've chosen.

@kaikalur
Copy link
Contributor

Why can't we just use a multimap instead?

That was my first suggestion too :) except there is a canonicalize method that removes them further down.

can you clarify this more? Would it make more sense to fix that to be able to support the duplicate entries? Just seems weird to make plan choices based on the data structure representations we've chosen.

The main thing is it's not done in a principled way - the output are just the map keys! They are relying on the fact that the Builder is adding in order which I'm not a big fan of. So yeah we could do a bigger change to have list of inputs and outputs explicitly but this one is actually also a small optimization as we unnest only one array :) and the change is self-contained (corner case too - only 2 failures in the last year+)

Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. just the one nit.
I'll note that this means that if you have any non-deterministic functions in the duplicates, they will get executed only once, but since we do that already in other cases, i don't think it's something to worry about.

Current UNNEST node will fail if there are duplicate expressions in
UNNEST, for example, query "SELECT * from (select * FROM (values 1) as
 t(k)) CROSS JOIN unnest(array[2, 3], ARRAY[2, 3]) AS r(r1, r2)". This
PR fix this bug.
@feilong-liu feilong-liu force-pushed the error_for_duplicate_unnest_item branch from 3171dad to 099bd96 Compare January 12, 2023 18:58
@feilong-liu feilong-liu merged commit a7827c9 into prestodb:master Jan 12, 2023
@feilong-liu feilong-liu deleted the error_for_duplicate_unnest_item branch January 24, 2023 19:10
@wanglinsong wanglinsong mentioned this pull request Feb 25, 2023
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants