Add NativeExecutionShuffleManager for presto spark native execution#18623

Merged

arunthirupathi merged 1 commit intoprestodb:masterfrom

miaoever:add_presto_spark_native_shuffle_manager

Nov 18, 2022

Contributor

miaoever commented Nov 3, 2022

PrestoSparkNativeExecutionShuffleManager is the shuffle manager implementing the Spark shuffle manager interface specifically for native execution. The reasons we have this new shuffle manager are:

To bypass calling into Spark java shuffle writer/reader since the actual shuffle read/write will happen in C++ side. In PrestoSparkNativeExecutionShuffleManager, we registered a pair of no-op shuffle reader/writer to hook-up with regular Spark shuffle workflow.
To capture the shuffle metadata (eg. {@link ShuffleHandle}) for later use. These metadata are only available during shuffle writer creation internally which is beyond the whole Presto-Spark native execution flow. By using the {@link PrestoSparkNativeExecutionShuffleManager}, we capture and store these metadata inside the shuffle manager and provide the APIs to allow native execution runtime access.

miaoever requested a review from a team as a code owner

November 3, 2022 22:35

miaoever requested review from chenyangfb, pgupta2, presto-oss and sameeragarwal

November 3, 2022 22:35

miaoever force-pushed the add_presto_spark_native_shuffle_manager branch 6 times, most recently from 8cab7cb to ef95594 Compare

November 4, 2022 00:13

Contributor

tdcmeehan commented Nov 4, 2022

@bot kick off test

Contributor

tdcmeehan commented Nov 4, 2022

@prestodbbot kick off test

Contributor

tdcmeehan commented Nov 4, 2022

@presto-release-bot kick off test

miaoever requested review from highker and tanjialiang

November 4, 2022 19:38

Contributor

tanjialiang commented Nov 4, 2022

@presto-release-bot kick off test

What is this arcane trick about?

highker reviewed

View reviewed changes

highker left a comment

nit comments only; will leave to others to review first

...om/facebook/presto/spark/classloader_interface/PrestoSparkNativeExecutionShuffleManager.java Outdated

highker Nov 4, 2022

unused clazz

...om/facebook/presto/spark/classloader_interface/PrestoSparkNativeExecutionShuffleManager.java Outdated

highker Nov 4, 2022

int shuffleId

...om/facebook/presto/spark/classloader_interface/PrestoSparkNativeExecutionShuffleManager.java Outdated

highker Nov 4, 2022

new EmptyShuffleWriter<>

...om/facebook/presto/spark/classloader_interface/PrestoSparkNativeExecutionShuffleManager.java Outdated

highker Nov 4, 2022

same, no "K, C"

...om/facebook/presto/spark/classloader_interface/PrestoSparkNativeExecutionShuffleManager.java Outdated

highker Nov 4, 2022

you don't need this

...om/facebook/presto/spark/classloader_interface/PrestoSparkNativeExecutionShuffleManager.java Outdated

highker Nov 4, 2022

remove this line

presto-spark-base/src/test/java/com/facebook/presto/spark/TestPrestoSparkNativeExecution.java Outdated

highker Nov 4, 2022

Can use specialize the type for BypassMergeSortShuffleHandle: BypassMergeSortShuffleHandle<.., ..>

Contributor

v-jizhang commented Nov 8, 2022

We are deploying a bot that can restart failed tests. It's not ready yet.

chenyangfb reviewed

View reviewed changes

presto-spark-base/src/test/java/com/facebook/presto/spark/PrestoSparkQueryRunner.java Outdated

Contributor

chenyangfb Nov 16, 2022

Could you explain a bit more about why we need
resetSparkContext(Map<String, String> additionalSparkConfigs)
get(Map<String, String> additionalSparkConfigs)
My understanding is you want add additional configs to existing sparkContext? And this is only needed for testing
Is this correct or not?

Contributor

chenyangfb Nov 16, 2022

looks good overall, have one question about changes in PrestoSparkQueryRunner

Contributor Author

miaoever Nov 16, 2022

The reason we need reset: the Spark shuffle manager will be only created/initialized once at Spark context/Env creation time, and it'll be used throughout the life time of that Spark context. On other side, In PrestoSpark test suite, all the test cases in one suite will be sharing one Spark context (held by the PrestoSparkQueryRunner) by default. With these two constraints, we have to 'reset' the Spark Context to create a new one to register our new shuffle manager for our testing purpose.

miaoever force-pushed the add_presto_spark_native_shuffle_manager branch from ef95594 to 75cbfe8 Compare

November 16, 2022 18:37

miaoever requested a review from highker

November 16, 2022 18:37

highker approved these changes

View reviewed changes

presto-spark-base/src/test/java/com/facebook/presto/spark/TestPrestoSparkNativeExecution.java Outdated

highker Nov 16, 2022

This is a private method. Also, move it to the end of the class

Contributor Author

miaoever Nov 16, 2022

Done.

miaoever force-pushed the add_presto_spark_native_shuffle_manager branch from 75cbfe8 to 904f04c Compare

November 16, 2022 19:49

Contributor

v-jizhang commented Nov 17, 2022

@bot kick off tests


          Add NativeExecutionShuffleManager for presto spark native execution

3897f13

miaoever force-pushed the add_presto_spark_native_shuffle_manager branch from 904f04c to 3897f13 Compare

November 17, 2022 18:28

arunthirupathi merged commit 5a7e10d into prestodb:master

wanglinsong mentioned this pull request

Add release notes for 0.279 #18920

Merged

30 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

chenyangfb chenyangfb left review comments

highker highker approved these changes

presto-oss Awaiting requested review from presto-oss presto-oss is a code owner automatically assigned from prestodb/committers

sameeragarwal Awaiting requested review from sameeragarwal

pgupta2 Awaiting requested review from pgupta2

tanjialiang Awaiting requested review from tanjialiang

Labels

None yet