Skip to content

Conversation

@geserdugarov
Copy link
Contributor

@geserdugarov geserdugarov commented Jul 24, 2025

Change Logs

First steps towards integrating of Spark Datasource V2 were taken in RFC-38, which is now marked as completed. However, there are multiple issues with advertising Hudi table as V2 without actual implementing certain API, and with using custom relation rule to fall back to V1 API. As a result, the current implementation of HoodieCatalog and Spark3DefaultSource returns a V1Table instead of HoodieInternalV2Table, in order to address performance regression.

There was an attempt to implement Spark Datasource V2 read functionality as a regular task, but it failed due to the scope of work required. Therefore, this RFC proposes to discuss design of Spark Datasource V2 integration in advance and to continue working on it accordingly.

Current integration with Spark

Impact

No impact at this stage

Risk level (write none, low medium or high below)

No

Documentation Update

No need at this stage

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Jul 24, 2025
@geserdugarov
Copy link
Contributor Author

@hudi-bot run azure

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @geserdugarov Thanks for taking this up! Looking forward to the full RFC.

@yihua yihua merged commit 675effb into apache:master Jul 31, 2025
5 checks passed
@geserdugarov geserdugarov deleted the master-datasource-v2-claim branch August 11, 2025 06:36
alexr17 pushed a commit to alexr17/hudi that referenced this pull request Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants