Skip to content

[HUDI-9410] Moved the code from hudi-spark3-common to hudi-spark-common module#13301

Merged
yihua merged 6 commits intoapache:masterfrom
wombatu-kun:HUDI-9410_move_spark3common_to_sparkcommon
May 15, 2025
Merged

[HUDI-9410] Moved the code from hudi-spark3-common to hudi-spark-common module#13301
yihua merged 6 commits intoapache:masterfrom
wombatu-kun:HUDI-9410_move_spark3common_to_sparkcommon

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

@wombatu-kun wombatu-kun commented May 14, 2025

Change Logs

Right now there are many duplicated classes which will make the maintenance Hudi Spark integration much harder.
As Spark 2 is not supported anymore, it's reasonable to merge the spark3-common code into spark-common.
So i:

  • moved all code from hudi-spark3-common to hudi-spark-common module
  • removed hudi-spark3-common module
  • renamed the package: org.apache.hudi.spark3.internal -> org.apache.hudi.spark.internal

It's a prerequisite PR to make Spark 4 support easier (#12772) and avoid code duplication in the future.

Addressing these comments:
#12772 (comment)
#12772 (comment)

Impact

reduced code duplication in spark-x modules

Risk level (write none, low medium or high below)

none

Documentation Update

none

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label May 14, 2025
@wombatu-kun wombatu-kun requested a review from yihua May 14, 2025 08:28
@wombatu-kun wombatu-kun mentioned this pull request May 14, 2025
4 tasks
@wombatu-kun wombatu-kun marked this pull request as ready for review May 14, 2025 09:40
@wombatu-kun
Copy link
Copy Markdown
Contributor Author

wombatu-kun commented May 14, 2025

@yihua Ethan, review please and merge if it's ok. and i'll continue fixing spark4 support PR.

@@ -15,7 +15,7 @@
* limitations under the License.
*/

package org.apache.spark.sql.catalyst.plans.logcal
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, someone submitted a fix for this. Thank you!

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should these classes be renamed to *Spark* from *Spark3*?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it too, and even tried to do so, but found that some classes already have such "brothers" (classes with the same names but without Spark version). So i decided to not rename anything. But i'll revisit this one more time in Spark 4 support PR.

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the refactoring!

*/

package org.apache.hudi.spark3.internal;
package org.apache.hudi.spark.internal;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should revisit in a separate JIRA and PR to see if we can move these bulk insert tests to hudi-spark to avoid test code duplication across version-specific modules.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihua Ready for review #13306

@yihua yihua merged commit b5a25f7 into apache:master May 15, 2025
58 checks passed
wombatu-kun pushed a commit to wombatu-kun/hudi that referenced this pull request May 16, 2025
…on module (apache#13301)

Co-authored-by: Vova Kolmakov <kolmakov.vladimir@huawei.com>

(cherry picked from commit b5a25f7)
alexr17 pushed a commit to alexr17/hudi that referenced this pull request Aug 25, 2025
…on module (apache#13301)

Co-authored-by: Vova Kolmakov <kolmakov.vladimir@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants