Skip to content

Conversation

@linshan-ma
Copy link
Contributor

Tips

This pr is Sub-task which build framework to support structured streaming.

override def getBatch(start: Option[Offset], end: Offset), We plan to do it next

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@linshan-ma linshan-ma changed the title [HUDI-1125]build framework to support structured streaming [HUDI-1125] build framework to support structured streaming Jul 27, 2020
@yanghua
Copy link
Contributor

yanghua commented Jul 28, 2020

@linshan-ma Thanks for your contribution. Two suggestions:

  1. It contains some irrelevant commits you should remove;
  2. Each PR must be completed and test-able before merging it into the codebase, otherwise, you can only provide a completed implementation.

@linshan-ma
Copy link
Contributor Author

@linshan-ma Thanks for your contribution. Two suggestions:

  1. It contains some irrelevant commits you should remove;
  2. Each PR must be completed and test-able before merging it into the codebase, otherwise, you can only provide a completed implementation.

@yanghua Thank you for your advice。1 I checked.i will remove irrelevant code 2 I have tested the code ,Are you asking me to submit a test class? 3,the code is completed about build framework .The jiar [HUDI-1126] is other sub-task to Implement in detail

@linshan-ma linshan-ma changed the title [HUDI-1125] build framework to support structured streaming [HUDI-1125] build framework to support structured streaming Jul 28, 2020
@yanghua
Copy link
Contributor

yanghua commented Jul 28, 2020

@linshan-ma Thanks for your contribution. Two suggestions:

  1. It contains some irrelevant commits you should remove;
  2. Each PR must be completed and test-able before merging it into the codebase, otherwise, you can only provide a completed implementation.

@yanghua Thank you for your advice。1 I checked.i will remove irrelevant code 2 I have tested the code ,Are you asking me to submit a test class? 3,the code is completed about build framework .The jiar [HUDI-1126] is other sub-task to Implement in detail

Hi,

1 I checked.i will remove irrelevant code

I mean the irrelevant PRs, as the first version of this PR, it would be better to only contain one commit, right?

2 I have tested the code ,Are you asking me to submit a test class?

Yes, it would be better to add test cases for your changes.

3,the code is completed about build framework .The jiar [HUDI-1126] is other sub-task to Implement in detail

I mean we should provide a completed feature, especially for some newly introduced features so that the reviewer can make sure all the changes are good for merging into the codebase. Just a suggestion, if you make sure this PR is the basis of subsequent PRs. Please ignore it.

@vinothchandar
Copy link
Member

This is a good addition.
+1 on @yanghua 's comments on adding tests and completeness of feature.

Can we implement this such that, users can do readStream() using commit times? this is a very desired feature on spark

@leesf
Copy link
Contributor

leesf commented Jul 30, 2020

agree with @yanghua that we would implement the full feature in this PR.

@vinothchandar vinothchandar changed the title [HUDI-1125] build framework to support structured streaming [WIP] [HUDI-1125] build framework to support structured streaming Aug 31, 2020
@vinothchandar vinothchandar added the status:in-progress Work in progress label Oct 4, 2020
@n3nash
Copy link
Contributor

n3nash commented Jan 18, 2021

@yanghua @leesf Any update on this PR ?

@leesf
Copy link
Contributor

leesf commented Jan 19, 2021

@yanghua @leesf Any update on this PR ?

@n3nash hi, about this work. @pengzhiwei2018 is taking over this.

@pengzhiwei2018
Copy link

@yanghua @leesf Any update on this PR ?

@n3nash hi, about this work. @pengzhiwei2018 is taking over this.

Hi @n3nash @leesf I am still working on this Feature. Maybe the next week,I will provide a new version of struct streaming source.

@rubenssoto
Copy link

Hello,

Hudi will have nice features like clustering and clustering probably will rewrite a lot of data, so is it possible this rewrites without new data doesn't affect downstream consumer of spark structured streaming?

It is something like delta lake has on compaction operation

https://docs.delta.io/latest/best-practices.html

On compaction has .option("dataChange", "false"), so the downstream consumer won't be affected.

Thank you.

@pengzhiwei2018
Copy link

Hello,

Hudi will have nice features like clustering and clustering probably will rewrite a lot of data, so is it possible this rewrites without new data doesn't affect downstream consumer of spark structured streaming?

It is something like delta lake has on compaction operation

https://docs.delta.io/latest/best-practices.html

On compaction has .option("dataChange", "false"), so the downstream consumer won't be affected.

Thank you.

Hi @leesf @n3nash @rubenssoto A new PR has proposed at #2485, we can move the discuss there.

@vinothchandar
Copy link
Member

Closing this in favor of #2485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status:in-progress Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants