-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-6170] Use correct zone id while calculating earliestTimeToRetain #8631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| UTC("utc", TimeZone.getTimeZone("UTC").toZoneId()); | ||
|
|
||
| private final String timeZone; | ||
| private final ZoneId zoneId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we need a zoneId member, can we calculate it on the fly in method getZoneId ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do a switch-case but I feel this is better. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, it's just a preference.
| Instant latestCommitInstant = HoodieActiveTimeline.parseDateFromInstantTime(commitTimeline.lastInstant().get().getTimestamp()).toInstant(); | ||
| ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(latestCommitInstant, ZoneId.systemDefault()); | ||
| ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(latestCommitInstant, HoodieInstantTimeGenerator.getTimelineTimeZone().getZoneId()); | ||
| String earliestTimeToRetain = HoodieActiveTimeline.formatDate(Date.from(currentDateTime.minusHours(config.getCleanerHoursRetained()).toInstant())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code only works when HoodieInstantTimeGenerator.setCommitTimeZone is executed in the same JVM process, we must asure that. Either set it up explicitly or instantiate a HoodieTableConfig within which the timezone is set up.
| Instant instant = Instant.now(); | ||
| ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(instant, ZoneId.systemDefault()); | ||
| ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(instant, HoodieInstantTimeGenerator.getTimelineTimeZone().getZoneId()); | ||
| String earliestTimeToRetain = HoodieActiveTimeline.formatDate(Date.from(currentDateTime.minusHours(hoursRetained).toInstant())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code only works when HoodieInstantTimeGenerator.setCommitTimeZone is executed in the same JVM process, we must asure that. Either set it up explicitly or instantiate a HoodieTableConfig within which the timezone is set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tableConfig is generated for the first time when table is created and the time zone is set right then. after that, we can't alter the time zone.
This is in line w/ how we generate our commit times as well.
not sure if we need any fixes here. can you help clarify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tableConfig is generated for the first time when table is created
Can we always ensure table config is initialized first? How to guard this sequence dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the issue. may be we should not set any default value for commitTimeZone in HoodieInstantTimeGenerator. we can set it to null infact. And so unless the table properties are instantiation whcih in turn will call into setCommitTimeZone, if any other callers tries to access the commit time zone, will fail fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it an option in the latest commit but I see some failures in Azure CI where option value is not present for timeline timezone. Its in hudi-cli and HoodieJavaGenerateApp, both of these use HoodieInstantTimeGenerator directly without setting timezone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that means the code is prone to making misusages, let's fix all those test falures by initialzing the zoneId manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the PR to use metaClient's tableConfig. Please take a look.
I think HoodieInstantTimeGenerator usage needs some refactoring. The static access doesn't seem right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think HoodieInstantTimeGenerator usage needs some refactoring
Yes, maybe we should pass in the table path as param and triggers a lazy initialization for the zoneId if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. We can handle it in a separate jira though.
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a suggestion
|
|
||
| public static void setCommitTimeZone(HoodieTimelineTimeZone commitTimeZone) { | ||
| commitTimeZoneOpt = Option.of(commitTimeZone); | ||
| HoodieInstantTimeGenerator.commitTimeZone = commitTimeZone; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should make it singleton by using a synchromized lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean making this method synchronized or making HoodieInstantTimeGenerator a singleton?
danny0405
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks for the contribution @lokeshj1703 ~
apache#8631) * Use correct zone id while calculating earliestTimeToRetain * Use metaClient table config
apache#8631) * Use correct zone id while calculating earliestTimeToRetain * Use metaClient table config
apache#8631) * Use correct zone id while calculating earliestTimeToRetain * Use metaClient table config
Change Logs
Currently we use default zoneId while calculating earliestTimeToRetain. Jira aims to use the configured timezone.
Impact
NA
Risk level (write none, low medium or high below)
low
Documentation Update
NA
Contributor's checklist