Utilized column analyzer unnest fix map case#24789
Conversation
4c68541 to
4ab5101
Compare
7eec9ee to
02a68a9
Compare
| ImmutableMap.of(QualifiedObjectName.valueOf("tpch.s1.t7"), ImmutableSet.of("a", "c", "d"))); | ||
|
|
||
| // Unused unnest columns are pruned, but used ones are kept | ||
| // This test unnecessarily includes column c. This will be fixed once unnest for Map types are supported |
There was a problem hiding this comment.
This is the type of cases that will experience some regression, but it should easy for the user to unblock themselves if they can just clean out the unused expressions passed into unnest
|
Thanks for the release note entry! Suggest rephrasing to follow the Order of Changes in the Release Notes Guidelines. I adapted phrasing from your Description in the PR, but if you have a better way to phrase it, let me know. |
|
|
||
| if (!transformedFields.isEmpty()) { | ||
| // TODO(kevintang2022): Update this fallback case to check if expressions | ||
| // inside the unnest are map type or array type |
There was a problem hiding this comment.
can we do this now by checking analysis.getType(expression)?
There was a problem hiding this comment.
Yes I can check for Map type
95eeded to
22e6e97
Compare
| List<Expression> expandedExpressions = new ArrayList<>(); | ||
| for (Expression expression : unnestExpressions) { | ||
| if (analysis.getType(expression) instanceof MapType) { | ||
| // Map produces two output columns, so input expression gets added twice |
There was a problem hiding this comment.
technically it might be possible to do even better by adding keys and values expressions separately in case only one is used, but that seems complicated to do, so probably not worth it.
There was a problem hiding this comment.
Yes I think the case you're referring to is
Unnest(MAP(array_agg(a), array_agg(b))) t (c1, c2)
If only c2 is used, then we should only count b as utilized, but this change would count both a and b as utilized
|
there's a merge conflict now. you'll need to rebase. |
Remove import Always check every expression inside unnest until MAP case is supported Handle map case
22e6e97 to
06af8bf
Compare
Remove import Always check every expression inside unnest until MAP case is supported Handle map case
Description
Currently, unnest cannot handle input expresssions that are the Map type. This is because 2 output expressions are created when unnest is used on a Map (one for key and one for value). This leads to index out of range errors because the current code only expects that Array types are passed into unnest.
Here is an example of building a mapping from output column to its corresponding input expression in the unnest.
unnest(m1, a1, m2, a2, m3,m4) t (m1k, m1v, a1v, m2k,m2v, a2v, m3k, m3v, m4k, m4v)
The correct field to expression index mapping should be
{
m1k: 0
m1v: 0
a1v: 1
m2k: 2
m2v: 2
a2v: 3
...
}
but it is currently
{
m1k: 0
m1v: 1
a1v: 2
m2k: 3
m2v: 4
...
}
Motivation and Context
T217212061
Currently, for utilized column analyzer, unnest will throw an index out of range exception when a map column is passed into the unnest function. This is important because if an index error is thrown, all columns in utilized tables will be checked for ACL.
Impact
With this change, utilized column analyzer will check all expressions passed to unnest instead of checking all columns in the entire query. This will unblock cases like
unnest(m) t (c1, c2)Test Plan
Unit tests in utilized column analyzer covers the cases mentioned, which were failing before.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.