[HUDI-6315] Feature flag for disabling prepped merge. #9203

amrishlal · 2023-07-15T06:20:53Z

Change Logs

Add user-defined feature flag for disabling prepped merge.

Impact

New feature flag ENABLE_OPTIMIZED_MERGE_WRITES

Risk level (write none, low medium or high below)

Low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

nsivabalan

Lets also parametrize some tests for MIT as well

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieInternalConfig.java

jonvex · 2023-07-17T12:49:22Z

We can set ENABLE_PREPPED_MERGE_WRITES in MergeIntoHoodieTableCommand using hoodie.spark.sql.writes.optimized.enable. We don't need to introduce another public config that way

amrishlal · 2023-07-18T05:16:58Z

Lets also parametrize some tests for MIT as well

Parameterized test case to test with both hoodie.spark.sql.optimized.merge.enable set to true and false

We can set ENABLE_PREPPED_MERGE_WRITES in MergeIntoHoodieTableCommand using hoodie.spark.sql.writes.optimized.enable. We don't need to introduce another public config that way

Moved SQL_MERGE_INTO_WRITES back to HoodieSparkSqlWriter.scala as it was originally, created ENABLE_OPTIMIZED_SQL_MERGE_WRITES in DataSourceOptions.scala for use as public config.

nsivabalan · 2023-07-18T14:34:36Z

@hudi-bot run azure

amrishlal · 2023-07-18T15:44:43Z

@hudi-bot run azure

nsivabalan

I realized we have two diff configs across MIT and UPDATE/DELETEs. not a good user exp imo. lets unfiy them.

So, we will have one config "hoodie.spark.sql.optimized.writes.enable" in HoodieWriteConfig for users to enable to disable the optimized flow. default value is true.

But internally, we can use two diff internal configs, one for MIT (_hoodie.datasource.merge.into.prepped) and one for UPDATE and DELETEs (_hoodie.datasource.writes.prepped).

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

nsivabalan · 2023-07-19T02:52:05Z

hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala

-    .withDocumentation("Controls whether spark sql optimized update is enabled.")
+    .withDocumentation("Controls whether spark sql prepped update and delete is enabled.")
+
+  val ENABLE_OPTIMIZED_SQL_MERGE_WRITES: ConfigProperty[String] = ConfigProperty


I am also thinking, from a user standpoint we should have just 1 config to enable or disable the optimized flow (irrespective of whether its mIT or updates or deletes).
but internally we can use diff configs if we wish to differentiate MIT and rest.

nsivabalan · 2023-07-19T05:18:40Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

          + "The class must be a subclass of `org.apache.hudi.callback.HoodieClientInitCallback`."
          + "By default, no Hudi client init callback is executed.");

+  public static final String WRITE_PREPPED_MERGE_KEY = "_hoodie.datasource.merge.into.prepped";


can we add java docs calling out the purpose of this

sorry. lets name this
_hoodie.spark.sql.merge.into.prepped

lets name the variable
SPARK_SQL_MERGE_INTO_PREPPED

hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala

nsivabalan · 2023-07-19T05:23:11Z

hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala

-    val df = if (optParams.getOrDefault(DATASOURCE_WRITE_PREPPED_KEY,
-      optParams.getOrDefault(SQL_MERGE_INTO_WRITES.key(), SQL_MERGE_INTO_WRITES.defaultValue().toString))
-      .equalsIgnoreCase("true")) {
+    val df = if (optParams.getOrDefault(DATASOURCE_WRITE_PREPPED_KEY, "false").toBoolean || optParams.getOrDefault(WRITE_PREPPED_MERGE_KEY, "false").toBoolean) {


lets also rename the variable
DATASOURCE_WRITE_PREPPED_KEY to
SPARK_SQL_WRITE_PREPPED_KEY

and the config key be
_hoodie.spark.sql.writes.prepped

nsivabalan · 2023-07-19T05:25:07Z

...spark/src/test/scala/org/apache/spark/sql/hudi/TestMergeIntoTableWithNonRecordKeyField.scala

  test("Test pkless complex merge cond") {
    withRecordType()(withTempDir { tmp =>
      spark.sql("set hoodie.payload.combined.schema.validate = true")
+      spark.sql("set hoodie.spark.sql.optimized.merge.enable=true")


don't we need to fix all these?

amrishlal · 2023-07-19T16:01:21Z

@hudi-bot run azure

nsivabalan · 2023-07-19T21:01:44Z

hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/SparkHoodieIndexFactory.java

  public static HoodieIndex createIndex(HoodieWriteConfig config) {
-    boolean mergeIntoWrites = config.getProps().getBoolean(HoodieInternalConfig.SQL_MERGE_INTO_WRITES.key(),
-        HoodieInternalConfig.SQL_MERGE_INTO_WRITES.defaultValue());
+    boolean mergeIntoWrites = config.getProps().getBoolean(HoodieWriteConfig.SPARK_SQL_MERGE_INTO_PREPPED_KEY, false);


lets align the var name w/ the config.
boolean sqlMergeIntoPrepped

Replaced all instances of mergeIntoWrites to sqlMergeIntoPrepped (except for the original reference in deduceWriterSchema)

nsivabalan · 2023-07-19T21:02:41Z

hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala

   * Config key with boolean value that indicates whether record being written is already prepped.
   */
-  val DATASOURCE_WRITE_PREPPED_KEY = "_hoodie.datasource.write.prepped";
+  val SPARK_SQL_WRITE_PREPPED_KEY = "_hoodie.spark.sql.writes.prepped";


minor
"SPARK_SQL_WRITES_PREPPED_KEY"

nsivabalan · 2023-07-19T21:04:14Z

...spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala

-            val mergeIntoWrites = parameters.getOrDefault(SQL_MERGE_INTO_WRITES.key(),
-               SQL_MERGE_INTO_WRITES.defaultValue.toString).toBoolean
+            val isPrepped = hoodieConfig.getBooleanOrDefault(SPARK_SQL_WRITE_PREPPED_KEY, false)
+            val mergeIntoWrites = parameters.getOrDefault(SPARK_SQL_MERGE_INTO_PREPPED_KEY, "false").toBoolean


lets fix the var name

nsivabalan · 2023-07-19T21:06:02Z

hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala

    hoodieConfig.setDefaultValue(DROP_PARTITION_COLUMNS)
    hoodieConfig.setDefaultValue(KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED)
-    hoodieConfig.setDefaultValue(ENABLE_OPTIMIZED_SQL_WRITES)
+    hoodieConfig.setDefaultValue(SPARK_SQL_OPTIMIZED_WRITES)


lets check if this is really required.

Removed. From what I can see it is not really needed.

...udi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala

nsivabalan · 2023-07-19T21:08:52Z

hudi-spark-datasource/hudi-spark/src/test/java/HoodieJavaApp.java

        .option(DataSourceWriteOptions.ASYNC_CLUSTERING_ENABLE().key(), "true")
        .option(DataSourceWriteOptions.ASYNC_CLUSTERING_ENABLE().key(), "true")
-        .option(DataSourceWriteOptions.ENABLE_OPTIMIZED_SQL_WRITES().key(), "true")
+        .option(DataSourceWriteOptions.SPARK_SQL_OPTIMIZED_WRITES().key(), "true")


for defaults, lets try to use the actual default instead of hard coding it
DataSourceWriteOptions.SPARK_SQL_OPTIMIZED_WRITES().default()

nsivabalan · 2023-07-19T21:10:09Z

hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestDeleteTable.scala


          // test with optimized sql writes enabled / disabled.
-          spark.sql(s"set hoodie.spark.sql.writes.optimized.enable=$optimizedSqlEnabled")
+          spark.sql(s"set hoodie.spark.sql.optimized.writes.enable=$optimizedSqlEnabled")


lets see if we can use the variable to avoid any mis-steps

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

...udi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala

amrishlal · 2023-07-20T14:24:04Z

@hudi-bot run azure

hudi-bot · 2023-07-20T22:36:05Z

CI report:

0af9136 Azure: FAILURE Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

nsivabalan · 2023-07-20T23:17:29Z

CI failed due to known flaky tests. going ahead w/ merging

[HUDI-6315] Feature flag for disabling prepped merge.

7eee2a4

nsivabalan reviewed Jul 17, 2023

View reviewed changes

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieInternalConfig.java Outdated Show resolved Hide resolved

amrishlal added 5 commits July 17, 2023 14:36

Refactor merge feature flag.

dc40a66

Cleanup

778327d

Parameterized merge test cases.

64977fc

Merge branch 'apache:master' into prepped-mit-feature-flag

e437917

Enable hoodie.spark.sql.optimized.merge.enable for pkless test cases.

585935c

codope changed the title ~~[HUDI-6315] [WIP] Feature flag for disabling prepped merge.~~ [HUDI-6315] Feature flag for disabling prepped merge. Jul 18, 2023

nsivabalan reviewed Jul 19, 2023

View reviewed changes

Use hoodie.spark.sql.optimized.writes.enable config for merge as well.

539cad2

nsivabalan requested changes Jul 19, 2023

View reviewed changes

Rename feature flag names.

cbecc75

nsivabalan reviewed Jul 19, 2023

View reviewed changes

Fix variable names.

cad8c2a

nsivabalan reviewed Jul 20, 2023

View reviewed changes

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java Show resolved Hide resolved

...udi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala Show resolved Hide resolved

amrishlal added 2 commits July 19, 2023 21:16

set HoodieSparkSqlWriter.SQL_MERGE_INTO_WRITES.key

b9f8d87

Merge branch 'apache:master' into prepped-mit-feature-flag

0af9136

nsivabalan approved these changes Jul 20, 2023

View reviewed changes

nsivabalan merged commit f2fdf8a into apache:master Jul 20, 2023

amrishlal deleted the prepped-mit-feature-flag branch August 7, 2023 19:41

[HUDI-6315] Feature flag for disabling prepped merge. #9203

[HUDI-6315] Feature flag for disabling prepped merge. #9203

Uh oh!

Conversation

amrishlal commented Jul 15, 2023

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

nsivabalan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jonvex commented Jul 17, 2023

Uh oh!

amrishlal commented Jul 18, 2023

Uh oh!

nsivabalan commented Jul 18, 2023

Uh oh!

amrishlal commented Jul 18, 2023

Uh oh!

nsivabalan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amrishlal commented Jul 19, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nsivabalan left a comment •

edited

Loading