Skip to content

Conversation

@umehrot2
Copy link
Contributor

@umehrot2 umehrot2 commented Apr 1, 2020

What is the purpose of the pull request

This is a first draft of spark data source implementation for bootstrapped tables. This work is currently in progress, and I have put this WIP PR out to gather feedback feedback early. l will continue to work on this and push to this PR. Currently this implementation is able to infer schema, merge skeleton and data files and perform column pruning.

Note: This depends on the PR by @bvaradar #1112 for getting the file system view of external data files.

Feedback is welcome.

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@umehrot2
Copy link
Contributor Author

umehrot2 commented Apr 1, 2020

@bvaradar if you glance through my implementation one thing I might require from you is that in HoodieBaseFile if we can store FileStatus for external data file instead of just the string path. I need the list of FileStatus of external data files to work with here.

@umehrot2
Copy link
Contributor Author

umehrot2 commented Jun 4, 2020

Closing this pull request, in favor of the new pull request #1702 where I have consolidated all the datasource related changes in one PR for review. It includes this read datasource part as well.

@umehrot2 umehrot2 closed this Jun 4, 2020
kroushan-nit pushed a commit to kroushan-nit/hudi-oss-fork that referenced this pull request Aug 28, 2025
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants