-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2767] Enabling timeline server based marker as default #3967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vinothchandar : wrt structured streaming and timeline server being closed ahead, this is what I see. After the end of first micro batch, the write client is closed and hence triggers closure of timeline service. but subsequent micro batches do succeed though. I added some logs for testStructuredStreaming (using direct markers) Test was using direct marker types for the purpose of collecting logs. if I switch to timeline server based(intent of this patch), test will fail since the 2nd batch marker creations fail. |
0bb1d33 to
09f87e9
Compare
b897418 to
0ab4fdb
Compare
|
@hudi-bot azure run |
|
@vinothchandar : I am thinking, for users who explicitly disable timeline server, should we fallback to using direct style markers? |
|
@hudi-bot azure run |
@nsivabalan We have jobs that disable timeline server in production environment, for backward compatibility, we should fallback to using direct style markers. |
| .withEngineType(EngineType.JAVA) | ||
| .withPath(basePath) | ||
| .withSchema(schema.toString()) | ||
| .withMarkersType(MarkerType.DIRECT.name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plan/ticket to convert all these tests to timeline based markers? Or by design these still have to be direct type?
|
I'm putting up another PR to flip the default and fix things: #4112. Closing this one. |
What is the purpose of the pull request
we might have to re-think enabling timeline server based marker as default.
incase of structured streaming, once the source stream completes, it shuts down the timeline server. and so some writes to hudi fails while creating the markers (timeline server based). I need to understand this flow better, whether we need to fix the structured streaming or its an inherent constraint.
#3950 is a pre-requisite to land this PR. If not, I might have to do duplicate test fixes. So, will first land #3950 and then will attempt at fixing the remaining test failures. I did go through once pass already. Except for structured streaming, I don't see any other major issues.
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.