Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -146,8 +146,8 @@ private static void setupConfOptions(
conf.setString(FlinkOptions.TABLE_NAME.key(), tableName);
// hoodie key about options
setupHoodieKeyOptions(conf, table);
// cleaning options
setupCleaningOptions(conf);
// compaction options
setupCompactionOptions(conf);
// infer avro schema from physical DDL schema
inferAvroSchema(conf, schema.toRowDataType().notNull().getLogicalType());
}
Expand Down Expand Up @@ -186,9 +186,9 @@ private static void setupHoodieKeyOptions(Configuration conf, CatalogTable table
}

/**
* Sets up the cleaning options from the table definition.
* Sets up the compaction options from the table definition.
*/
private static void setupCleaningOptions(Configuration conf) {
private static void setupCompactionOptions(Configuration conf) {
int commitsToRetain = conf.getInteger(FlinkOptions.CLEAN_RETAIN_COMMITS);
int minCommitsToKeep = conf.getInteger(FlinkOptions.ARCHIVE_MIN_COMMITS);
if (commitsToRetain >= minCommitsToKeep) {
Expand All @@ -199,6 +199,12 @@ private static void setupCleaningOptions(Configuration conf) {
conf.setInteger(FlinkOptions.ARCHIVE_MIN_COMMITS, commitsToRetain + 10);
conf.setInteger(FlinkOptions.ARCHIVE_MAX_COMMITS, commitsToRetain + 20);
}
if (conf.getBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just respect the config user-defined? this change the config without noticing anyone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have decided that if the option value is as the default.

Copy link
Contributor

@yanghua yanghua Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, @garyli1019 's opinion is reasonable. @danny0405 I know what's your purpose. Can we define an explicit config option for this? So that users can know this detailed information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have respected the user specified option, the value is rewritten only when the default value is set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the user may not know it would be rewritten under some condition like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the user needs to know that ? They can set up the target IO by themself if they want BTW.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewritten value is a constant value. Why not provide an SYNC_COMPACTION_TARGET_IO option and make it to be 500GB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already one, the default value is 5GB for online compaction, this patch fix the offline compaction with default value as 500GB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, a bit late on this. I don't have a strong objection here but slightly incline to not turning config inside the code.
My point is that tweaking config is ok if the user wrote their own connector or wanted to reduce the overhead of having too many configs in their app. I do this in my app as well. But this is the official connector of Hudi, so I think we should keep the config as how it is.

&& !conf.getBoolean(FlinkOptions.COMPACTION_ASYNC_ENABLED)
&& FlinkOptions.isDefaultValueDefined(conf, FlinkOptions.COMPACTION_TARGET_IO)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this line.

// if compaction schedule is on, tweak the target io to 500GB
conf.setLong(FlinkOptions.COMPACTION_TARGET_IO, 500 * 1024L);
}
}

/**
Expand Down