Skip to content

Add NativeExecutionShuffleManager for presto spark native execution#18623

Merged
arunthirupathi merged 1 commit intoprestodb:masterfrom
miaoever:add_presto_spark_native_shuffle_manager
Nov 18, 2022
Merged

Add NativeExecutionShuffleManager for presto spark native execution#18623
arunthirupathi merged 1 commit intoprestodb:masterfrom
miaoever:add_presto_spark_native_shuffle_manager

Conversation

@miaoever
Copy link
Contributor

@miaoever miaoever commented Nov 3, 2022

PrestoSparkNativeExecutionShuffleManager is the shuffle manager implementing the Spark shuffle manager interface specifically for native execution. The reasons we have this new shuffle manager are:

  1. To bypass calling into Spark java shuffle writer/reader since the actual shuffle read/write will happen in C++ side. In PrestoSparkNativeExecutionShuffleManager, we registered a pair of no-op shuffle reader/writer to hook-up with regular Spark shuffle workflow.
  2. To capture the shuffle metadata (eg. {@link ShuffleHandle}) for later use. These metadata are only available during shuffle writer creation internally which is beyond the whole Presto-Spark native execution flow. By using the {@link PrestoSparkNativeExecutionShuffleManager}, we capture and store these metadata inside the shuffle manager and provide the APIs to allow native execution runtime access.

@miaoever miaoever requested a review from a team as a code owner November 3, 2022 22:35
@miaoever miaoever force-pushed the add_presto_spark_native_shuffle_manager branch 6 times, most recently from 8cab7cb to ef95594 Compare November 4, 2022 00:13
@tdcmeehan
Copy link
Contributor

@bot kick off test

@tdcmeehan
Copy link
Contributor

@prestodbbot kick off test

@tdcmeehan
Copy link
Contributor

@presto-release-bot kick off test

@tanjialiang
Copy link
Contributor

@presto-release-bot kick off test

What is this arcane trick about?

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit comments only; will leave to others to review first

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused clazz

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int shuffleId

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new EmptyShuffleWriter<>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, no "K, C"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use specialize the type for BypassMergeSortShuffleHandle: BypassMergeSortShuffleHandle<.., ..>

@v-jizhang
Copy link
Contributor

We are deploying a bot that can restart failed tests. It's not ready yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a bit more about why we need
resetSparkContext(Map<String, String> additionalSparkConfigs)
get(Map<String, String> additionalSparkConfigs)
My understanding is you want add additional configs to existing sparkContext? And this is only needed for testing
Is this correct or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall, have one question about changes in PrestoSparkQueryRunner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we need reset: the Spark shuffle manager will be only created/initialized once at Spark context/Env creation time, and it'll be used throughout the life time of that Spark context. On other side, In PrestoSpark test suite, all the test cases in one suite will be sharing one Spark context (held by the PrestoSparkQueryRunner) by default. With these two constraints, we have to 'reset' the Spark Context to create a new one to register our new shuffle manager for our testing purpose.

@miaoever miaoever force-pushed the add_presto_spark_native_shuffle_manager branch from ef95594 to 75cbfe8 Compare November 16, 2022 18:37
@miaoever miaoever requested a review from highker November 16, 2022 18:37
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a private method. Also, move it to the end of the class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@miaoever miaoever force-pushed the add_presto_spark_native_shuffle_manager branch from 75cbfe8 to 904f04c Compare November 16, 2022 19:49
@v-jizhang
Copy link
Contributor

@bot kick off tests

@miaoever miaoever force-pushed the add_presto_spark_native_shuffle_manager branch from 904f04c to 3897f13 Compare November 17, 2022 18:28
@arunthirupathi arunthirupathi merged commit 5a7e10d into prestodb:master Nov 18, 2022
@wanglinsong wanglinsong mentioned this pull request Jan 12, 2023
30 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants