Skip to content

[native pos] Add end to end functionality for file based broadcast#19917

Merged
singcha merged 1 commit intoprestodb:masterfrom
singcha:add_broadcast_join_support
Jul 21, 2023
Merged

[native pos] Add end to end functionality for file based broadcast#19917
singcha merged 1 commit intoprestodb:masterfrom
singcha:add_broadcast_join_support

Conversation

@singcha
Copy link
Contributor

@singcha singcha commented Jun 21, 2023

Add broadcast support for file based broadcast by :

  • Adding plan node conversion to use BroadcastWriteNode for broadcast output & BroadcastExchangeSource for broadcast read
  • Adding code changes to pass baseBroadcastPath & list of broadcast files

Overall flow after this change :

  1. Executor submits plan fragment with broadcast output to native process
  2. native process converts the plan fragment by adding BroadcastWriteNode, which writes to file system
  3. File details(currently file path only) is collected back and returned to spark driver
  4. Driver broadcasts file details in next stage to all executors
  5. Executors send plan fragment with remote source as filePaths to native process
  6. Native process uses BroadcastExchangeSource to read back broadcast table

Test plan -
Tested by enabling disabled tests which uses broadcast.

== RELEASE NOTES ==

General Changes
* Add support for broadcast join in Presto-on-Spark/Velox execution path
* Added new property `native-execution-broadcast-base-path` which is used to specify base path for temporary storage of broadcast data for presto-on-spark native execution

@singcha singcha requested a review from mbasmanova June 21, 2023 05:25
@singcha singcha force-pushed the add_broadcast_join_support branch 7 times, most recently from f63c83f to 1cc5db3 Compare June 23, 2023 17:37
@singcha singcha requested a review from xiaoxmeng June 23, 2023 17:37
@singcha singcha force-pushed the add_broadcast_join_support branch 2 times, most recently from 3a72832 to 66f9434 Compare June 23, 2023 17:46
@singcha singcha marked this pull request as ready for review June 23, 2023 17:46
@singcha singcha requested review from a team as code owners June 23, 2023 17:46
@singcha singcha requested review from miaoever and presto-oss June 23, 2023 17:46
@mbasmanova
Copy link
Contributor

@pgupta2 Arjun pointed out that we may need to implement a cache of broadcast data to avoid reading same data multiple times from the same executor in case multiple tasks from the same stage end up on that executor. Here is Java implementation: https://github.com/prestodb/presto/blob/master/presto-spark-base/src/main/java/com/facebook/presto/spark/execution/PrestoSparkBroadcastTableCacheManager.java

@singcha singcha force-pushed the add_broadcast_join_support branch 2 times, most recently from 35cb1ec to 1bd52c2 Compare July 5, 2023 19:43
@singcha singcha force-pushed the add_broadcast_join_support branch 2 times, most recently from 68a5ebe to 432b32e Compare July 11, 2023 23:05
@singcha singcha requested a review from shrinidhijoshi July 11, 2023 23:13
@singcha singcha force-pushed the add_broadcast_join_support branch 3 times, most recently from c67c047 to 16f988e Compare July 12, 2023 17:19
Copy link
Collaborator

@shrinidhijoshi shrinidhijoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment on the property name.

Also, consider below changes on the commit message

  1. Wrap the text rather than a single long line
  2. Add more details about changes the commit introduces.

@singcha singcha force-pushed the add_broadcast_join_support branch 4 times, most recently from 2761263 to dc3e0d4 Compare July 18, 2023 23:40
@singcha
Copy link
Contributor Author

singcha commented Jul 18, 2023

One last comment on the property name.

Also, consider below changes on the commit message

  1. Wrap the text rather than a single long line
  2. Add more details about changes the commit introduces.

Done, Thanks for pointing it out

@singcha
Copy link
Contributor Author

singcha commented Jul 18, 2023

@mbasmanova @vermapratyush @shrinidhijoshi - Thank you for the review. I have addressed these and below items will be addressed in followup PR

  1. Refactoring - Using multi-bind for JSON codecs
  2. Refactoring - Splitting Presto & POS server
  3. Re-using and aggregating plan node for stats generation

Can you please review.

@singcha singcha force-pushed the add_broadcast_join_support branch 2 times, most recently from e84954c to 1c075f2 Compare July 19, 2023 16:48
@shrinidhijoshi
Copy link
Collaborator

@singcha Please update the release notes section with the new property added.

@singcha
Copy link
Contributor Author

singcha commented Jul 19, 2023

@singcha Please update the release notes section with the new property added.

Added, thanks!

Copy link
Collaborator

@shrinidhijoshi shrinidhijoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @singcha . Thanks for working on this and addressing all the feedback.

Copy link
Member

@vermapratyush vermapratyush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback @singcha

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@singcha singcha force-pushed the add_broadcast_join_support branch from 1c075f2 to 78b50c0 Compare July 20, 2023 17:35
Add broadcast support for file based broadcast by :
1. presto to velox query plan changes to :
 a. add BroadcastWriteNode if plan fragment has broadcast output
 b. add ExchangeNode for broadcast read path which uses BroadcastExchangeSource
2. wire end to end flow with java executor
3. enable tests failing due to missing broadcast feature
@singcha singcha force-pushed the add_broadcast_join_support branch from 78b50c0 to b435f7a Compare July 20, 2023 22:25
@singcha singcha merged commit 88b4348 into prestodb:master Jul 21, 2023
@wanglinsong wanglinsong mentioned this pull request Jul 27, 2023
28 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants