Extend remove map cast rule to cover map_subset function#25395
Extend remove map cast rule to cover map_subset function#25395feilong-liu merged 1 commit intoprestodb:masterfrom
Conversation
| .setSystemProperty(REMOVE_MAP_CAST, "false") | ||
| .build(); | ||
| assertQueryWithSameQueryRunner(enableOptimization, "select map_subset(feature, array[key]) from (values (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), cast(2 as bigint)), (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), 4)) t(feature, key)", disableOptimization); | ||
| assertQueryWithSameQueryRunner(enableOptimization, "select map_subset(feature, array[1, 3]) from (values (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), cast(2 as bigint)), (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), 4)) t(feature, key)", disableOptimization); |
There was a problem hiding this comment.
As I see, this query wouldn't trigger the new optimization logic, since array[1, 3] would be set as type array<int>, so there is no CAST for feature. Is this intentional, or should we use map_subset(feature, array[1, cast(3 as bigint)]) in order to trigger the optimization logic?
There was a problem hiding this comment.
Yeah, you are right. Thanks for pointing out, just fixed it.
By the way, there is another PR #25394 which optimizes queries with map_subset functions, greatly appreciate if you could take a look when you have time. Thanks!
f4dd52c to
0552999
Compare
|
Thanks for the release note! Suggest adding a link to the doc in the release note, like this: |
97ea4c5 to
b657961
Compare
rschlussel
left a comment
There was a problem hiding this comment.
Looks good. just recommend changing the description and variable names from using "index" to using "key" to be clearer.
| return call(functionAndTypeManager, "element_at", node.getType(), castInput, newIndex); | ||
| } | ||
| else if (functionResolution.isMapSubSetFunction(node.getFunctionHandle())) { | ||
| RowExpression newIndex = null; |
| Type arrayElementType = ((ArrayType) constantArray.getType()).getElementType(); | ||
| ImmutableList.Builder<RowExpression> arguments = ImmutableList.builder(); | ||
| for (int i = 0; i < arrayValue.getPositionCount(); ++i) { | ||
| ConstantExpression mapIndex = constant(readNativeValue(arrayElementType, arrayValue, i), arrayElementType); |
| return null; | ||
| } | ||
|
|
||
| private static boolean canRemoveMapCast(Type fromKeyType, Type fromValueType, Type toKeyType, Type toValueType, Type indexType) |
There was a problem hiding this comment.
indexType -> subsetKeysType?
6a1fb85 to
087427e
Compare
087427e to
dd0e2d3
Compare
Description
Rewrite to remove map cast in map_subset function.
map_subset(cast(feature as map<bigint, double>), array[k1, k2])where feature is of type map<integer, double> and key is of type arraycast(map_subset(feature, array[try_cast(k1 as integer), try_cast(k2 as integer)]) as map<bigint, double>)When k1, or k2 is out of integer range, try_cast will return NULL, where map_subset will not return values for this key, which is the same behavior for both input and output
This together with #25394 will enable subfield pushdown for feature map accessed in
map_subset(cast(feature as map<bigint, double>), array[k1, k2]).Also cast a subset of the map is cheaper than casting the whole map.
Motivation and Context
As in description
Impact
Optimization of map_subset function which is often used in feature extraction in ML workload
Test Plan
Add unit tests
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.