-
Notifications
You must be signed in to change notification settings - Fork 334
Add support for IPv6 and iceberg with spark >= 3.4 #3206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for opening this pull request! 🙌 These tips will help get your PR across the finish line:
|
Code Review Agent Run #3a1fdfActionable Suggestions - 1
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
Signed-off-by: Julian <[email protected]>
Signed-off-by: Julian <[email protected]>
Code Review Agent Run #655175Actionable Suggestions - 0Review Details
|
Code Review Agent Run #46748bActionable Suggestions - 0Review Details
|
Signed-off-by: Julian <[email protected]>
|
Hi @Future-Outlier, I signed the last commit but one action failed with a timeout on the previous run. integration (ubuntu-latest, 3.9, integration_test_codecov) |
Future-Outlier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run un a flyte cluster to prove it works?
Code Review Agent Run #6b0815Actionable Suggestions - 0Review Details
|
Signed-off-by: Julian <[email protected]>
|
@Future-Outlier, to test this PR I did run spark tasks on k8s submitted with the build docker image from this PR without the ipv6 hack and it worked as expected. I also added jars for iceberg support and gave the spark user access to the jars dir so one can add jars in the spark config to download at runtime. The image is published at https://hub.docker.com/r/juliastreibel/flyte-spark-plugin. The iceberg tasks also run as expected now. |
Code Review Agent Run #31c59cActionable Suggestions - 1
Review Details
|
Signed-off-by: Julian <[email protected]>
Code Review Agent Run #f98abcActionable Suggestions - 0Review Details
|
Signed-off-by: Julian <[email protected]>
Code Review Agent Run #b3690bActionable Suggestions - 0Review Details
|
|
@Future-Outlier, @pingsutw could you review this? :) |
Can you give us a screenshot or a video to prove your code works? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3206 +/- ##
===========================================
- Coverage 81.95% 46.69% -35.26%
===========================================
Files 346 214 -132
Lines 27852 22276 -5576
Branches 2920 2919 -1
===========================================
- Hits 22826 10402 -12424
- Misses 4191 11346 +7155
+ Partials 835 528 -307 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Julian <[email protected]>
Code Review Agent Run #d6e6beActionable Suggestions - 0Review Details
|
|
@Future-Outlier, @pingsutw could you review this? :) |
|
@Future-Outlier, @pingsutw any updates? :) |
|
Congrats on merging your first pull request! 🎉 |
* Add support for ipv6 with spark >= 3.4 Signed-off-by: Julian <[email protected]> * Integrade flyte-bot suggestion Signed-off-by: Julian <[email protected]> * Change to spark 3.4 for iceberg support and docker image Signed-off-by: Julian <[email protected]> * Add iceberg jars Signed-off-by: Julian <[email protected]> * Upgrade hadoop deps to match spark version Signed-off-by: Julian <[email protected]> * Add pyspark lower bound instead of match in the spark plugin Signed-off-by: Julian <[email protected]> * Trim trailing whitespace Signed-off-by: Julian <[email protected]> --------- Signed-off-by: Julian <[email protected]>
* Add support for ipv6 with spark >= 3.4 Signed-off-by: Julian <[email protected]> * Integrade flyte-bot suggestion Signed-off-by: Julian <[email protected]> * Change to spark 3.4 for iceberg support and docker image Signed-off-by: Julian <[email protected]> * Add iceberg jars Signed-off-by: Julian <[email protected]> * Upgrade hadoop deps to match spark version Signed-off-by: Julian <[email protected]> * Add pyspark lower bound instead of match in the spark plugin Signed-off-by: Julian <[email protected]> * Trim trailing whitespace Signed-off-by: Julian <[email protected]> --------- Signed-off-by: Julian <[email protected]> Signed-off-by: Atharva <[email protected]>
Why are the changes needed?
The upgrade to spark >= 3.4 is needed to support IPv6 and iceberg. This is very useful for k8s deployments and is currently breaking our pipelines. We implemented an ugly fix overwriting arguments with ImageSpecs.
Without this we are seeing issues where the ip is not wrapped in [] fixed in
apache/spark#36868
What changes were proposed in this pull request?
Upgrade from spark 3.2.1 to 3.5.5
How was this patch tested?
Ran test of spark plugin successfully
Check all the applicable boxes
Summary by Bito
This PR upgrades Flytekit Spark integration to support Spark 3.4+ with IPv6 and Iceberg support for Kubernetes deployments. Updates include a new Spark base image, revised hadoop-aws dependencies, and modified installation scripts. The PR also fixes file permissions for spark jars directory, locks pyspark version to prevent compatibility issues, and resolves pipeline issues in Kubernetes deployments.Unit tests added: False
Estimated effort to review (1-5, lower is better): 1