-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[WIP] [HUDI-1125] build framework to support structured streaming #1880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
|
@linshan-ma Thanks for your contribution. Two suggestions:
|
@yanghua Thank you for your advice。1 I checked.i will remove irrelevant code 2 I have tested the code ,Are you asking me to submit a test class? 3,the code is completed about build framework .The jiar [HUDI-1126] is other sub-task to Implement in detail |
Hi,
I mean the irrelevant PRs, as the first version of this PR, it would be better to only contain one commit, right?
Yes, it would be better to add test cases for your changes.
I mean we should provide a completed feature, especially for some newly introduced features so that the reviewer can make sure all the changes are good for merging into the codebase. Just a suggestion, if you make sure this PR is the basis of subsequent PRs. Please ignore it. |
|
This is a good addition. Can we implement this such that, users can do |
|
agree with @yanghua that we would implement the full feature in this PR. |
|
@n3nash hi, about this work. @pengzhiwei2018 is taking over this. |
|
Hello, Hudi will have nice features like clustering and clustering probably will rewrite a lot of data, so is it possible this rewrites without new data doesn't affect downstream consumer of spark structured streaming? It is something like delta lake has on compaction operation https://docs.delta.io/latest/best-practices.html On compaction has .option("dataChange", "false"), so the downstream consumer won't be affected. Thank you. |
Hi @leesf @n3nash @rubenssoto A new PR has proposed at #2485, we can move the discuss there. |
|
Closing this in favor of #2485 |
Tips
This pr is Sub-task which build framework to support structured streaming.
override def getBatch(start: Option[Offset], end: Offset), We plan to do it next
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.