Skip to content

[pos] Support building presto-on-spark for spark2.4 and spark3.4#25552

Merged
singcha merged 1 commit intoprestodb:masterfrom
wangkewen:pt-spark3
Aug 21, 2025
Merged

[pos] Support building presto-on-spark for spark2.4 and spark3.4#25552
singcha merged 1 commit intoprestodb:masterfrom
wangkewen:pt-spark3

Conversation

@wangkewen
Copy link
Copy Markdown

@wangkewen wangkewen commented Jul 16, 2025

Description

This initial PR prepares the release of Presto on Spark 3.4.1 by incorporating updates from this pull request by @shrinidhijoshi.

Prior to the stable release of Presto on Spark 3 (e.g., Spark 3.4.1), code compatibility is maintained for both Spark 2.4 (the current version) and Spark 3.4. To support this, a Maven profile named spark3 has been created to facilitate building Presto on Spark 3.

All code changes are made within the presto-spark-classloader-interface module. To support different implementations for Spark versions 2.4 and 3.4, two separate modules have been created: presto-spark-classloader-spark2 and presto-spark-classloader-spark3. These modules contain the version-specific code, including a new PrestoSparkUtils.

                             presto-spark-classloader-spark2  (spark 2.4) 
                                         /                             
                                        /                              
presto-spark-classloader-interface ---                                
                                        \                               
                                         \                              
                             presto-spark-classloader-spark3 (spark 3.4)

In this PR, it adds support for shuffle deserialize in PrestoSparkShuffleSerializer.java

Motivation and Context

It is needed for upgrading Presto on Spark to support Spark3 (e.g., Spark 3.4.1) .

Impact

Test Plan

Internal pt verifier test: x1125914587947600

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Jul 16, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: wangkewen / name: Kewen Wang (cf2f3fa)

@unidevel
Copy link
Copy Markdown
Contributor

@wangkewen Thansk for your contribution, can you sign the EasyCLA and resolve the conflicts first? Also since this is a new feature, it is better to provide release notes and documentation about setup and configurations.

@wangkewen wangkewen force-pushed the pt-spark3 branch 4 times, most recently from 0f08664 to 7a73f47 Compare July 23, 2025 17:50
@wangkewen wangkewen force-pushed the pt-spark3 branch 3 times, most recently from 79dbe4b to 92d0b76 Compare July 26, 2025 19:01
@wangkewen wangkewen changed the title Presto on Spark3.4.1 [WIP] Presto on Spark3.4.1 Aug 1, 2025
@wangkewen wangkewen force-pushed the pt-spark3 branch 3 times, most recently from 77c9586 to 06641bd Compare August 2, 2025 00:27
Copy link
Copy Markdown
Collaborator

@shrinidhijoshi shrinidhijoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangkewen Thanks for making the changes. I have left some initial comments on amount of classes that are duplicated.

I see quite a few pom issues that CI is flagging, let's work through fixing all those.

@wangkewen wangkewen force-pushed the pt-spark3 branch 3 times, most recently from 776df45 to 3959e03 Compare August 12, 2025 01:12
Copy link
Copy Markdown
Collaborator

@shrinidhijoshi shrinidhijoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangkewen Thanks for addressing the comments. I did another pass and left some more comments.

Also, please update the PR description in line with presto contributor guidlines.

@wangkewen
Copy link
Copy Markdown
Author

wangkewen commented Aug 15, 2025

rebase on @shrinidhijoshi's PR-25794

@wangkewen
Copy link
Copy Markdown
Author

rebase on master to include @shrinidhijoshi 's PR-25797

Copy link
Copy Markdown
Collaborator

@shrinidhijoshi shrinidhijoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments @wangkewen . Based on the latest changes I left a few more comments.

@shrinidhijoshi shrinidhijoshi changed the title [WIP] Presto on Spark3.4.1 [pos] Enable building presto-on-spark for spark2.4 and spark3.4 Aug 19, 2025
@shrinidhijoshi
Copy link
Copy Markdown
Collaborator

@wangkewen Created #25833 to track adding spark3.4 based CI tests as a follow up

shrinidhijoshi
shrinidhijoshi previously approved these changes Aug 20, 2025
Copy link
Copy Markdown
Collaborator

@shrinidhijoshi shrinidhijoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for working through all the comments @wangkewen

@rschlussel
Copy link
Copy Markdown
Contributor

Change looks good. Can you clean up the commits? (squash them and make sure the commit title and message follows our guidelines)? https://github.com/prestodb/presto/wiki/Review-and-Commit-guidelines#commit-formatting-and-pull-requests

@wangkewen
Copy link
Copy Markdown
Author

@rschlussel I have updated it.

@shrinidhijoshi shrinidhijoshi changed the title [pos] Enable building presto-on-spark for spark2.4 and spark3.4 [pos] Support building presto-on-spark for spark2.4 and spark3.4 Aug 20, 2025
All code changes are made within the presto-spark-classloader-interface module. To support different implementations for spark2.4 and spark3.4, two separate modules have been created: presto-spark-classloader-spark2 and presto-spark-classloader-spark3. These modules contain the version-specific codes, including `PrestoSparkUtils`.

It adds support for shuffle deserialize in `PrestoSparkShuffleSerializer`.
@facebook-github-bot
Copy link
Copy Markdown
Collaborator

@wangkewen has imported this pull request. If you are a Meta employee, you can view this in D80638570.

@singcha singcha merged commit a40ff0d into prestodb:master Aug 21, 2025
69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants