-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-6155] Fix cleaner based on hours for earliest commit to retain #8659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| return d.toInstant().atZone(ZoneId.systemDefault()).toLocalDateTime(); | ||
| return d.toInstant().atZone(getZoneId()).toLocalDateTime(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we supplement some UTs for parseDateFromInstantTime and convertDateToTemporalAccessor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we supplement some UTs for
parseDateFromInstantTimeandconvertDateToTemporalAccessor?
I would be happy to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convertDateToTemporalAccessor
I added two UTs: testFormatDateWithCommitTimeZone and testInstantDateParsingWithCommitTimeZone, testInstantDateParsingWithCommitTimeZone is used to test the correctness of the HoodieInstantTimeGenerator#convertDateToTemporalAccessor()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we supplement some UTs for
parseDateFromInstantTimeandconvertDateToTemporalAccessor?
And in the TestHoodieActiveTimeline.java, there are many UTs related to DateParsing, such as:
testInvalidInstantDateParsingtestMillisGranularityInstantDateParsing
etc.
| return Date.from(dt.atZone(ZoneId.systemDefault()).toInstant()); | ||
| Instant instant = dt.atZone(getZoneId()).toInstant(); | ||
| TimeZone.setDefault(TimeZone.getTimeZone(getZoneId())); | ||
| return Date.from(instant); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is risky to set up timezone per JVM process: TimeZone.setDefault(, this could impact all the threads in the JVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is risky to set up timezone per JVM process:
TimeZone.setDefault(, this could impact all the threads in the JVM.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is risky to set up timezone per JVM process:
TimeZone.setDefault(, this could impact all the threads in the JVM.
One of the tests failed, I will try to find the reason and change the code.
[ERROR] Failures:
[ERROR] TestHoodieDeltaStreamer.testCleanerDeleteReplacedDataWithArchive:1120 expected: <1> but was: <0>
[ERROR] TestHoodieDeltaStreamer.testCleanerDeleteReplacedDataWithArchive:1090 expected: <false> but was: <true>
[ERROR] TestHoodieDeltaStreamer.testCleanerDeleteReplacedDataWithArchive:1120 expected: <1> but was: <0>
[ERROR] TestHoodieDeltaStreamer.testCleanerDeleteReplacedDataWithArchive:1090 expected: <false> but was: <true>
[INFO]
[ERROR] Tests run: 354, Failures: 4, Errors: 0, Skipped: 7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is known flaky test that has been fixed, just rebase with the latest master would solve it.
|
|
||
| private static ZoneId getZoneId() { | ||
| return commitTimeZone.equals(HoodieTimelineTimeZone.LOCAL) | ||
| ? ZoneId.systemDefault() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, fetch the timezone with metaClient.tableConfig, the HoodieTimelineTimeZone can not assure the initialization of zoneId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, fetch the timezone whout metaClient.tableConfig, the
HoodieTimelineTimeZonecan not assure the initialization of zoneId.
I will try to modify the code as you say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, fetch the timezone whout metaClient.tableConfig, the
HoodieTimelineTimeZonecan not assure the initialization of zoneId.
In the class HoodieInstantTimeGenerator, set an initial value( HoodieTimelineTimeZone.LOCAL ) for the property commitTimeZone
private static HoodieTimelineTimeZone commitTimeZone = HoodieTimelineTimeZone.LOCAL;And update commitTimeZone value in HoodieTableConfig#create
if (hoodieConfig.contains(TIMELINE_TIMEZONE)) {
HoodieInstantTimeGenerator.setCommitTimeZone(HoodieTimelineTimeZone.valueOf(hoodieConfig.getString(TIMELINE_TIMEZONE)));
}public static void setCommitTimeZone(HoodieTimelineTimeZone commitTimeZone) {
HoodieInstantTimeGenerator.commitTimeZone = commitTimeZone;
}So, I think getting ZoneId by HoodieTimelineTimeZone should be correct. and I don't really understand the meaning of the HoodieTimelineTimeZone can not assure the initialization of zoneId.
I don't know if my idea is correct, looking forward to your reply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the discussions we take in: #8631
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the discussions we take in: #8631
I sees, I will try to modify the code as you say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the discussions we take in: #8631
It seems that there is no good way to get HoodieTimelineTimeZone through HoodieTableMetaClient in HoodieInstantTimeGenerator, I currently get HoodieTimelineTimeZone by instantiate a HoodieTableConfig, can you give me some advice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently get HoodieTimelineTimeZone by instantiate a HoodieTableConfig
If no existing table meta client or table config can be reused, we must instantiate a new one. For HoodieTableConfig, usually we fetch a meta client first then get the config, take
hudi/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
Line 307 in 42b517d
| public static Option<HoodieTableConfig> getTableConfig(String basePath, org.apache.hadoop.conf.Configuration hadoopConf) { |
…udi-6155 # Conflicts: # hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
|
Does PR #8631 already fixes the problem? |
Looks like it's been fixed, sorry for take up your time. |
Change Logs
HoodieTableConfig has a config to let users to override timezone for commit time generation. but looks like there are some places where we use current system's zone instead of honoring the config.
Impact
Fix the time based on time zone set in table config.
Risk level (write none, low medium or high below)
low
Documentation Update
none
Contributor's checklist