Skip to content

Add spark sql integration test for Hudi#3194

Merged
gh-yzou merged 2 commits intoapache:mainfrom
rahil-c:rahil/polaris-hudi-it-main
Jan 27, 2026
Merged

Add spark sql integration test for Hudi#3194
gh-yzou merged 2 commits intoapache:mainfrom
rahil-c:rahil/polaris-hudi-it-main

Conversation

@rahil-c
Copy link
Contributor

@rahil-c rahil-c commented Dec 3, 2025

Description

  • This PR aims to add an integration test for the polaris-hudi integration, following a similar pattern as what was done in SparkDeltaIT

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

@rahil-c
Copy link
Contributor Author

rahil-c commented Dec 3, 2025

@flyrain @gh-yzou @singhpk234

@gh-yzou
Copy link
Contributor

gh-yzou commented Dec 3, 2025

@rahil-c there is also an ongoing work for spark 4.0 support here #3188, does the hudi change also work with 4.0 without extra change?

// TODO: extract a polaris-rest module as a thin layer for
// client to depends on.
implementation(project(":polaris-core")) { isTransitive = false }
testImplementation("org.apache.hudi:hudi-spark3.5-bundle_${scalaVersion}:1.1.0")
Copy link
Contributor

@flyrain flyrain Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we put versions in the file pluginlibs.versions.toml, refer it as line 35 does.

flyrain
flyrain previously approved these changes Dec 3, 2025
Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @rahil-c !

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Dec 3, 2025
// TODO: extract a polaris-rest module as a thin layer for
// client to depends on.
implementation(project(":polaris-core")) { isTransitive = false }
testImplementation("org.apache.hudi:hudi-spark3.5-bundle_${scalaVersion}:1.1.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the actual spark project, we don't really intend to introduce any table format specific dependency, even for testing. i didn't see any change in the actual spark project, is there a reason that we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me try removing and see what occurs

exclude("org.slf4j", "jul-to-slf4j")
}

// Add spark-hive for Hudi integration - provides HiveExternalCatalog that Hudi needs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahil-c could you also update the readme to include the support for hudi?

It would be great if we could also have a notebook in the get-started to help people to onboard for hudi, we could do that in follow up, we should also extend the regress test to include actual end to end test for hudi to avoid any potential break of the feature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to follow up in seperate pr for this if possible?

@adam-christian-software
Copy link
Contributor

adam-christian-software commented Dec 5, 2025

@rahil-c there is also an ongoing work for spark 4.0 support here #3188, does the hudi change also work with 4.0 without extra change?

@gh-yzou - I believe that we can merge this & the Spark 4 work. Then, address the Hudi support in Spark 4. Maybe, we can file a Git issue on this to move forward?

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 8, 2026
@github-actions github-actions bot closed this Jan 18, 2026
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Jan 18, 2026
@gh-yzou gh-yzou reopened this Jan 23, 2026
@github-project-automation github-project-automation bot moved this from Done to PRs In Progress in Basic Kanban Board Jan 23, 2026
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jan 23, 2026
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too !

@github-actions github-actions bot removed the Stale label Jan 24, 2026
@gh-yzou gh-yzou merged commit a242f51 into apache:main Jan 27, 2026
15 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Jan 27, 2026
@rahil-c rahil-c mentioned this pull request Feb 9, 2026
6 tasks
snazy added a commit to snazy/polaris that referenced this pull request Feb 11, 2026
* Replace custom token-bucket implementation with Guava's `RateLimiter` (apache#3507)

Addresses the issues discussed on the dev mailing-list discussion https://lists.apache.org/thread/gkyw7m4fcbjbzhcrlrp4kcq5lr05r0m4, opting to use Guava as the easiest replacement here.

* Move idempotency_records schema to v4 and add H2 support (apache#3386)

* Move idempotency_records schema to v4 and add H2 support

* address comments and fix test failures

* fix format

* add comment to resource_id

* (nit): Getting started examples with mc/s5cmd to aws cli (apache#3526)

* Switch mc/s3cmd to aws cli

* Switch mc/s3cmd to aws cli

* Add support for no KMS with s3-compatible backend (apache#3501)

* chore(deps): update amazon/aws-cli docker tag to v2.33.7 (apache#3558)

* Update doc for helm around rateLimiter (apache#3562)

* Disable renoavte update for python version (apache#3560)

* Fix the Keycloak getting-started example for 26.5+ (apache#3568)

The example was failing because Keycloak 26.5 introduced stricter validation rules for session lifespan and timeout.

* NoSQL: Add to runtime-service (apache#3396)

* NoSQL: Add to runtime-service

This change adds the NoSQL persistence to polaris-runtime-service.

* chore(deps): update amazon/aws-cli docker tag to v2.33.8 (apache#3575)

* Add spark sql integration test for Hudi (apache#3194)

* Fix ozone getting started example (apache#3574)

* Fix Ozone getting started example

* Fix Ozone getting started example

* Change AWS CLI image to weekly (apache#3578)

* fix(deps): update dependency com.diffplug.spotless:spotless-plugin-gradle to v8.2.1 (apache#3576)

* chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.24-2.1769108682 (apache#3588)

* removed references of BEFORE/AFTER_COMMIT_VIEW (apache#3554)

* nits - post-merge fixes

* Last merged commit 2b0ca21

---------

Co-authored-by: Huaxin Gao <huaxin.gao11@gmail.com>
Co-authored-by: Yong Zheng <yongzheng0809@gmail.com>
Co-authored-by: Mend Renovate <bot@renovateapp.com>
Co-authored-by: Alexandre Dutra <adutra@apache.org>
Co-authored-by: Rahil C <32500120+rahil-c@users.noreply.github.com>
Co-authored-by: Innocent Djiofack <djiofack007@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants