Skip to content

Utilized column analyzer unnest fix map case#24789

Merged
kevintang2022 merged 1 commit intoprestodb:masterfrom
kevintang2022:utilized-column-analyzer-unnest-fix-map-case
Mar 27, 2025
Merged

Utilized column analyzer unnest fix map case#24789
kevintang2022 merged 1 commit intoprestodb:masterfrom
kevintang2022:utilized-column-analyzer-unnest-fix-map-case

Conversation

@kevintang2022
Copy link
Contributor

@kevintang2022 kevintang2022 commented Mar 24, 2025

Description

Currently, unnest cannot handle input expresssions that are the Map type. This is because 2 output expressions are created when unnest is used on a Map (one for key and one for value). This leads to index out of range errors because the current code only expects that Array types are passed into unnest.

Here is an example of building a mapping from output column to its corresponding input expression in the unnest.

unnest(m1, a1, m2, a2, m3,m4) t (m1k, m1v, a1v, m2k,m2v, a2v, m3k, m3v, m4k, m4v)
The correct field to expression index mapping should be
{
m1k: 0
m1v: 0
a1v: 1
m2k: 2
m2v: 2
a2v: 3
...
}

but it is currently

{
m1k: 0
m1v: 1
a1v: 2
m2k: 3
m2v: 4
...
}

Motivation and Context

T217212061
Currently, for utilized column analyzer, unnest will throw an index out of range exception when a map column is passed into the unnest function. This is important because if an index error is thrown, all columns in utilized tables will be checked for ACL.

Impact

With this change, utilized column analyzer will check all expressions passed to unnest instead of checking all columns in the entire query. This will unblock cases like unnest(m) t (c1, c2)

Test Plan

Unit tests in utilized column analyzer covers the cases mentioned, which were failing before.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix index error when a map column is passed into an unnest function by using the column analyzer to correctly map key and value output fields back to correct input expression

@kevintang2022 kevintang2022 force-pushed the utilized-column-analyzer-unnest-fix-map-case branch from 4c68541 to 4ab5101 Compare March 24, 2025 16:05
@kevintang2022 kevintang2022 marked this pull request as ready for review March 24, 2025 20:08
@kevintang2022 kevintang2022 force-pushed the utilized-column-analyzer-unnest-fix-map-case branch from 7eec9ee to 02a68a9 Compare March 26, 2025 00:48
ImmutableMap.of(QualifiedObjectName.valueOf("tpch.s1.t7"), ImmutableSet.of("a", "c", "d")));

// Unused unnest columns are pruned, but used ones are kept
// This test unnecessarily includes column c. This will be fixed once unnest for Map types are supported
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the type of cases that will experience some regression, but it should easy for the user to unblock themselves if they can just clean out the unused expressions passed into unnest

@steveburnett
Copy link
Contributor

Thanks for the release note entry! Suggest rephrasing to follow the Order of Changes in the Release Notes Guidelines. I adapted phrasing from your Description in the PR, but if you have a better way to phrase it, let me know.

== RELEASE NOTES ==

General Changes
* Fix index error when a map column is passed into an unnest function by using the column analyzer to check ACL on all input expressions for an unnest function.


if (!transformedFields.isEmpty()) {
// TODO(kevintang2022): Update this fallback case to check if expressions
// inside the unnest are map type or array type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do this now by checking analysis.getType(expression)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can check for Map type

@kevintang2022 kevintang2022 force-pushed the utilized-column-analyzer-unnest-fix-map-case branch from 95eeded to 22e6e97 Compare March 26, 2025 20:55
rschlussel
rschlussel previously approved these changes Mar 26, 2025
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this fix!

List<Expression> expandedExpressions = new ArrayList<>();
for (Expression expression : unnestExpressions) {
if (analysis.getType(expression) instanceof MapType) {
// Map produces two output columns, so input expression gets added twice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically it might be possible to do even better by adding keys and values expressions separately in case only one is used, but that seems complicated to do, so probably not worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think the case you're referring to is
Unnest(MAP(array_agg(a), array_agg(b))) t (c1, c2)

If only c2 is used, then we should only count b as utilized, but this change would count both a and b as utilized

@rschlussel
Copy link
Contributor

there's a merge conflict now. you'll need to rebase.

Remove import

Always check every expression inside unnest until MAP case is supported

Handle map case
@kevintang2022 kevintang2022 force-pushed the utilized-column-analyzer-unnest-fix-map-case branch from 22e6e97 to 06af8bf Compare March 27, 2025 16:28
@kevintang2022 kevintang2022 merged commit 1db8f12 into prestodb:master Mar 27, 2025
94 checks passed
@prestodb-ci prestodb-ci mentioned this pull request Mar 28, 2025
30 tasks
pradeepvaka pushed a commit to pradeepvaka/presto that referenced this pull request Apr 11, 2025
Remove import

Always check every expression inside unnest until MAP case is supported

Handle map case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants