Skip to content

Extend remove map cast rule to cover map_subset function#25395

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:map_cast_mapsubset
Jun 26, 2025
Merged

Extend remove map cast rule to cover map_subset function#25395
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:map_cast_mapsubset

Conversation

@feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Jun 20, 2025

Description

Rewrite to remove map cast in map_subset function.

  • Input: map_subset(cast(feature as map<bigint, double>), array[k1, k2]) where feature is of type map<integer, double> and key is of type array
  • Output: cast(map_subset(feature, array[try_cast(k1 as integer), try_cast(k2 as integer)]) as map<bigint, double>)
    When k1, or k2 is out of integer range, try_cast will return NULL, where map_subset will not return values for this key, which is the same behavior for both input and output

This together with #25394 will enable subfield pushdown for feature map accessed in map_subset(cast(feature as map<bigint, double>), array[k1, k2]).
Also cast a subset of the map is cheaper than casting the whole map.

Motivation and Context

As in description

Impact

Optimization of map_subset function which is often used in feature extraction in ML workload

Test Plan

Add unit tests

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Improve remove map cast rule to cover map_subset function

@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Jun 20, 2025
@feilong-liu feilong-liu marked this pull request as draft June 20, 2025 19:40
@feilong-liu feilong-liu marked this pull request as ready for review June 21, 2025 03:46
@feilong-liu feilong-liu requested a review from hantangwangd June 23, 2025 16:35
.setSystemProperty(REMOVE_MAP_CAST, "false")
.build();
assertQueryWithSameQueryRunner(enableOptimization, "select map_subset(feature, array[key]) from (values (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), cast(2 as bigint)), (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), 4)) t(feature, key)", disableOptimization);
assertQueryWithSameQueryRunner(enableOptimization, "select map_subset(feature, array[1, 3]) from (values (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), cast(2 as bigint)), (map(array[cast(1 as integer), 2, 3, 4], array[0.3, 0.5, 0.9, 0.1]), 4)) t(feature, key)", disableOptimization);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see, this query wouldn't trigger the new optimization logic, since array[1, 3] would be set as type array<int>, so there is no CAST for feature. Is this intentional, or should we use map_subset(feature, array[1, cast(3 as bigint)]) in order to trigger the optimization logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. Thanks for pointing out, just fixed it.
By the way, there is another PR #25394 which optimizes queries with map_subset functions, greatly appreciate if you could take a look when you have time. Thanks!

@steveburnett
Copy link
Contributor

Thanks for the release note! Suggest adding a link to the doc in the release note, like this:

== RELEASE NOTES ==

General Changes
* Improve remove map cast rule to cover :func:`map_subset`.

@feilong-liu feilong-liu force-pushed the map_cast_mapsubset branch 2 times, most recently from 97ea4c5 to b657961 Compare June 26, 2025 04:42
rschlussel
rschlussel previously approved these changes Jun 26, 2025
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. just recommend changing the description and variable names from using "index" to using "key" to be clearer.

return call(functionAndTypeManager, "element_at", node.getType(), castInput, newIndex);
}
else if (functionResolution.isMapSubSetFunction(node.getFunctionHandle())) {
RowExpression newIndex = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newKeys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Type arrayElementType = ((ArrayType) constantArray.getType()).getElementType();
ImmutableList.Builder<RowExpression> arguments = ImmutableList.builder();
for (int i = 0; i < arrayValue.getPositionCount(); ++i) {
ConstantExpression mapIndex = constant(readNativeValue(arrayElementType, arrayValue, i), arrayElementType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapIndex -> mapKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return null;
}

private static boolean canRemoveMapCast(Type fromKeyType, Type fromValueType, Type toKeyType, Type toValueType, Type indexType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexType -> subsetKeysType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@feilong-liu feilong-liu force-pushed the map_cast_mapsubset branch 2 times, most recently from 6a1fb85 to 087427e Compare June 26, 2025 16:40
@feilong-liu feilong-liu requested a review from rschlussel June 26, 2025 16:42
@feilong-liu feilong-liu merged commit 3edc0ad into prestodb:master Jun 26, 2025
230 of 238 checks passed
@feilong-liu feilong-liu deleted the map_cast_mapsubset branch June 26, 2025 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants