Skip to content

Conversation

@julianStreibel
Copy link
Contributor

@julianStreibel julianStreibel commented Mar 21, 2025

Why are the changes needed?

The upgrade to spark >= 3.4 is needed to support IPv6 and iceberg. This is very useful for k8s deployments and is currently breaking our pipelines. We implemented an ugly fix overwriting arguments with ImageSpecs.
Without this we are seeing issues where the ip is not wrapped in [] fixed in
apache/spark#36868

What changes were proposed in this pull request?

Upgrade from spark 3.2.1 to 3.5.5

How was this patch tested?

Ran test of spark plugin successfully

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

This PR upgrades Flytekit Spark integration to support Spark 3.4+ with IPv6 and Iceberg support for Kubernetes deployments. Updates include a new Spark base image, revised hadoop-aws dependencies, and modified installation scripts. The PR also fixes file permissions for spark jars directory, locks pyspark version to prevent compatibility issues, and resolves pipeline issues in Kubernetes deployments.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 1

@welcome
Copy link

welcome bot commented Mar 21, 2025

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 21, 2025

Code Review Agent Run #3a1fdf

Actionable Suggestions - 1
  • plugins/flytekit-spark/scripts/flytekit_install_spark3.sh - 1
    • Incorrect SHA-512 checksum verification format · Line 26-26
Review Details
  • Files reviewed - 1 · Commit Range: a0694d5..a0694d5
    • plugins/flytekit-spark/scripts/flytekit_install_spark3.sh
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 21, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
New Feature - Spark & Iceberg Support Enhancements

Dockerfile - Updated base image to apache/spark-py:v3.4.0, revised download commands to fetch hadoop-aws 3.4.0 and added Iceberg jars, and inserted a chown command to adjust permissions for the spark jars directory.

flytekit_install_spark3.sh - Replaced the spark distribution URL and checksum to reference spark-3.4.0 and updated download links for hadoop-aws and AWS SDK bundles to the new versions.

setup.py - Modified the plugin dependency to require pyspark>=3.4.0, aligning the package with the upgraded Spark version.

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 21, 2025

Code Review Agent Run #655175

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 91b3466..f4c0bbe
    • plugins/flytekit-spark/scripts/flytekit_install_spark3.sh
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@julianStreibel julianStreibel changed the title Add support for ipv6 with spark >= 3.4 Add support for IPv6 with spark >= 3.4 Mar 22, 2025
@julianStreibel julianStreibel changed the title Add support for IPv6 with spark >= 3.4 Add support for IPv6 and iceberg with spark >= 3.4 Mar 23, 2025
@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 23, 2025

Code Review Agent Run #46748b

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: f4c0bbe..d0184c5
    • plugins/flytekit-spark/Dockerfile
    • plugins/flytekit-spark/scripts/flytekit_install_spark3.sh
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@julianStreibel
Copy link
Contributor Author

Hi @Future-Outlier, I signed the last commit but one action failed with a timeout on the previous run.

integration (ubuntu-latest, 3.9, integration_test_codecov)
failed 53 minutes ago in 1h 22m 37s

SSH: ssh [email protected]
or: ssh -i <path-to-private-SSH-key> [email protected]
SSH: ssh [email protected]
or: ssh -i <path-to-private-SSH-key> [email protected]
Error: The action 'Setup tmate session' has timed out after 60 minutes.

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run un a flyte cluster to prove it works?

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 24, 2025

Code Review Agent Run #6b0815

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 91b3466..5e75d32
    • plugins/flytekit-spark/Dockerfile
    • plugins/flytekit-spark/scripts/flytekit_install_spark3.sh
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

Signed-off-by: Julian <[email protected]>
@julianStreibel
Copy link
Contributor Author

julianStreibel commented Mar 25, 2025

@Future-Outlier, to test this PR I did run spark tasks on k8s submitted with the build docker image from this PR without the ipv6 hack and it worked as expected. I also added jars for iceberg support and gave the spark user access to the jars dir so one can add jars in the spark config to download at runtime. The image is published at https://hub.docker.com/r/juliastreibel/flyte-spark-plugin. The iceberg tasks also run as expected now.

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 25, 2025

Code Review Agent Run #31c59c

Actionable Suggestions - 1
  • plugins/flytekit-spark/Dockerfile - 1
    • Consider matching Hadoop and Spark versions · Line 15-18
Review Details
  • Files reviewed - 2 · Commit Range: 5e75d32..db4d9e6
    • plugins/flytekit-spark/Dockerfile
    • plugins/flytekit-spark/setup.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 26, 2025

Code Review Agent Run #f98abc

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: db4d9e6..f60ee18
    • plugins/flytekit-spark/Dockerfile
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@julianStreibel julianStreibel requested a review from pingsutw March 26, 2025 12:19
@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 26, 2025

Code Review Agent Run #b3690b

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: f60ee18..a2c1565
    • plugins/flytekit-spark/setup.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@julianStreibel
Copy link
Contributor Author

@Future-Outlier, @pingsutw could you review this? :)

@Future-Outlier
Copy link
Member

@Future-Outlier, to test this PR I did run spark tasks on k8s submitted with the build docker image from this PR without the ipv6 hack and it worked as expected. I also added jars for iceberg support and gave the spark user access to the jars dir so one can add jars in the spark config to download at runtime. The image is published at https://hub.docker.com/r/juliastreibel/flyte-spark-plugin. The iceberg tasks also run as expected now.

Can you give us a screenshot or a video to prove your code works?

@codecov
Copy link

codecov bot commented Mar 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 46.69%. Comparing base (5503ee5) to head (a2c1565).
Report is 2 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (5503ee5) and HEAD (a2c1565). Click for more details.

HEAD has 49 uploads less than BASE
Flag BASE (5503ee5) HEAD (a2c1565)
51 2
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3206       +/-   ##
===========================================
- Coverage   81.95%   46.69%   -35.26%     
===========================================
  Files         346      214      -132     
  Lines       27852    22276     -5576     
  Branches     2920     2919        -1     
===========================================
- Hits        22826    10402    -12424     
- Misses       4191    11346     +7155     
+ Partials      835      528      -307     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julianStreibel
Copy link
Contributor Author

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 28, 2025

Code Review Agent Run #d6e6be

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: a2c1565..aafe4e5
    • plugins/flytekit-spark/Dockerfile
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@julianStreibel
Copy link
Contributor Author

@Future-Outlier, @pingsutw could you review this? :)

@julianStreibel
Copy link
Contributor Author

@Future-Outlier, @pingsutw any updates? :)

@davidmirror-ops davidmirror-ops enabled auto-merge (squash) April 2, 2025 23:19
@davidmirror-ops davidmirror-ops merged commit 1a8da07 into flyteorg:master Apr 3, 2025
114 checks passed
@welcome
Copy link

welcome bot commented Apr 3, 2025

Congrats on merging your first pull request! 🎉

fiedlerNr9 pushed a commit that referenced this pull request Apr 7, 2025
* Add support for ipv6 with spark >= 3.4

Signed-off-by: Julian <[email protected]>

* Integrade flyte-bot suggestion

Signed-off-by: Julian <[email protected]>

* Change to spark 3.4 for iceberg support and docker image

Signed-off-by: Julian <[email protected]>

* Add iceberg jars

Signed-off-by: Julian <[email protected]>

* Upgrade hadoop deps to match spark version

Signed-off-by: Julian <[email protected]>

* Add pyspark lower bound instead of match in the spark plugin

Signed-off-by: Julian <[email protected]>

* Trim trailing whitespace

Signed-off-by: Julian <[email protected]>

---------

Signed-off-by: Julian <[email protected]>
Atharva1723 pushed a commit to Atharva1723/flytekit that referenced this pull request Oct 5, 2025
* Add support for ipv6 with spark >= 3.4

Signed-off-by: Julian <[email protected]>

* Integrade flyte-bot suggestion

Signed-off-by: Julian <[email protected]>

* Change to spark 3.4 for iceberg support and docker image

Signed-off-by: Julian <[email protected]>

* Add iceberg jars

Signed-off-by: Julian <[email protected]>

* Upgrade hadoop deps to match spark version

Signed-off-by: Julian <[email protected]>

* Add pyspark lower bound instead of match in the spark plugin

Signed-off-by: Julian <[email protected]>

* Trim trailing whitespace

Signed-off-by: Julian <[email protected]>

---------

Signed-off-by: Julian <[email protected]>
Signed-off-by: Atharva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants