Skip to content

Conversation

@lokeshj1703
Copy link
Collaborator

Change Logs

Currently we use default zoneId while calculating earliestTimeToRetain. Jira aims to use the configured timezone.

Impact

NA

Risk level (write none, low medium or high below)

low

Documentation Update

NA

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@lokeshj1703 lokeshj1703 marked this pull request as ready for review May 4, 2023 10:04
@lokeshj1703 lokeshj1703 changed the title HUDI-6170. Use correct zone id while calculating earliestTimeToRetain [HUDI-6170] Use correct zone id while calculating earliestTimeToRetain May 4, 2023
UTC("utc", TimeZone.getTimeZone("UTC").toZoneId());

private final String timeZone;
private final ZoneId zoneId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need a zoneId member, can we calculate it on the fly in method getZoneId ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do a switch-case but I feel this is better. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, it's just a preference.

Instant latestCommitInstant = HoodieActiveTimeline.parseDateFromInstantTime(commitTimeline.lastInstant().get().getTimestamp()).toInstant();
ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(latestCommitInstant, ZoneId.systemDefault());
ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(latestCommitInstant, HoodieInstantTimeGenerator.getTimelineTimeZone().getZoneId());
String earliestTimeToRetain = HoodieActiveTimeline.formatDate(Date.from(currentDateTime.minusHours(config.getCleanerHoursRetained()).toInstant()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code only works when HoodieInstantTimeGenerator.setCommitTimeZone is executed in the same JVM process, we must asure that. Either set it up explicitly or instantiate a HoodieTableConfig within which the timezone is set up.

Instant instant = Instant.now();
ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(instant, ZoneId.systemDefault());
ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(instant, HoodieInstantTimeGenerator.getTimelineTimeZone().getZoneId());
String earliestTimeToRetain = HoodieActiveTimeline.formatDate(Date.from(currentDateTime.minusHours(hoursRetained).toInstant()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code only works when HoodieInstantTimeGenerator.setCommitTimeZone is executed in the same JVM process, we must asure that. Either set it up explicitly or instantiate a HoodieTableConfig within which the timezone is set up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tableConfig is generated for the first time when table is created and the time zone is set right then. after that, we can't alter the time zone.
This is in line w/ how we generate our commit times as well.

not sure if we need any fixes here. can you help clarify

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tableConfig is generated for the first time when table is created

Can we always ensure table config is initialized first? How to guard this sequence dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the issue. may be we should not set any default value for commitTimeZone in HoodieInstantTimeGenerator. we can set it to null infact. And so unless the table properties are instantiation whcih in turn will call into setCommitTimeZone, if any other callers tries to access the commit time zone, will fail fast.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it an option in the latest commit but I see some failures in Azure CI where option value is not present for timeline timezone. Its in hudi-cli and HoodieJavaGenerateApp, both of these use HoodieInstantTimeGenerator directly without setting timezone.

Copy link
Contributor

@danny0405 danny0405 May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that means the code is prone to making misusages, let's fix all those test falures by initialzing the zoneId manually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the PR to use metaClient's tableConfig. Please take a look.
I think HoodieInstantTimeGenerator usage needs some refactoring. The static access doesn't seem right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think HoodieInstantTimeGenerator usage needs some refactoring

Yes, maybe we should pass in the table path as param and triggers a lazy initialization for the zoneId if necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We can handle it in a separate jira though.

@danny0405 danny0405 self-assigned this May 5, 2023
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a suggestion


public static void setCommitTimeZone(HoodieTimelineTimeZone commitTimeZone) {
commitTimeZoneOpt = Option.of(commitTimeZone);
HoodieInstantTimeGenerator.commitTimeZone = commitTimeZone;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should make it singleton by using a synchromized lock

Copy link
Collaborator Author

@lokeshj1703 lokeshj1703 May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean making this method synchronized or making HoodieInstantTimeGenerator a singleton?

@hudi-bot
Copy link
Collaborator

hudi-bot commented May 9, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks for the contribution @lokeshj1703 ~

@danny0405 danny0405 merged commit 92e52dc into apache:master May 10, 2023
yihua pushed a commit to yihua/hudi that referenced this pull request May 15, 2023
apache#8631)

* Use correct zone id while calculating earliestTimeToRetain
* Use metaClient table config
yihua pushed a commit to yihua/hudi that referenced this pull request May 15, 2023
apache#8631)

* Use correct zone id while calculating earliestTimeToRetain
* Use metaClient table config
yihua pushed a commit to yihua/hudi that referenced this pull request May 17, 2023
apache#8631)

* Use correct zone id while calculating earliestTimeToRetain
* Use metaClient table config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:table-service Table services

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants