[HUDI-3780] improve drop partitions #5178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

xiarixiaoyao merged 1 commit into apache:master from XuQianJin-Stars:fix-delete-partitions

Apr 5, 2022

Contributor

XuQianJin-Stars commented Mar 30, 2022

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

XuQianJin-Stars force-pushed the fix-delete-partitions branch from ae1d66b to 8889b34 Compare

April 2, 2022 04:59

XuQianJin-Stars changed the title ~~[WIP][MINOR] improve drop partitions~~ [MINOR] improve drop partitions

XuQianJin-Stars changed the title ~~[MINOR] improve drop partitions~~ [HUDI-3780] improve drop partitions

XuQianJin-Stars force-pushed the fix-delete-partitions branch from 8889b34 to ee4501b Compare

April 3, 2022 04:21

Contributor Author

XuQianJin-Stars commented Apr 3, 2022

hi @vinothchandar @xushiyan this pr fix for #4291, please hive a look.

xushiyan self-assigned this

xushiyan added the priority:high label

xiarixiaoyao reviewed

View reviewed changes

hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java Outdated

    
              import java.util.stream.Collectors;

              import java.util.stream.Stream;

              import javax.annotation.Nonnull;

              import org.apache.avro.AvroTypeException;

Contributor

xiarixiaoyao Apr 3, 2022

why change import order？

hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java

    
                  } catch (Exception e) {

                    throw new HoodieException("Failed to get commit metadata", e);

                  }

                }

Contributor

xiarixiaoyao Apr 3, 2022 •

edited

Loading

why move this method from TableSchemaResolver to this class

Contributor Author

XuQianJin-Stars Apr 4, 2022

why move this method from TableSchemaResolver to this class

@vinothchandar suggest extract this out to a separate static helper.

hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/HoodieBigQuerySyncClient.java

    
                @Override

                public void dropPartitions(String tableName, List<String> partitionsToDrop) {

                  throw new UnsupportedOperationException("No support for dropPartitions yet.");

Contributor

xiarixiaoyao Apr 3, 2022

pls do not delete origin code annotation

// bigQuery discovers the new partitions automatically, so do nothing.

...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala Outdated

    
              import org.apache.spark.sql.types.StructType

              import java.util

              import scala.collection.JavaConverters.propertiesAsScalaMapConverter

Contributor

xiarixiaoyao Apr 3, 2022

java import and scala import should be separated

...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala Outdated

    
                  val options = hoodieCatalogTable.catalogProperties ++ tableConfig.getProps.asScala.toMap ++ extraOptions

                  val options: Map[String, String] = hoodieCatalogTable.catalogProperties ++ tableConfig.getProps.asScala.toMap ++ sparkSession.sqlContext.conf.getAllConfs ++ extraOptions

                  val hiveSyncConfig = buildHiveSyncConfig(options, hoodieCatalogTable)

Contributor

xiarixiaoyao Apr 3, 2022

it may be good to reuse code？ i see double defined

...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala

    
                  val tableConfig = hoodieCatalogTable.tableConfig

                  val tableSchema = hoodieCatalogTable.tableSchema

                  val partitionColumns = tableConfig.getPartitionFieldProp.split(",").map(_.toLowerCase)

                  val partitionSchema = StructType(tableSchema.filter(f => partitionColumns.contains(f.name)))

Contributor

xiarixiaoyao Apr 3, 2022

.toLowerCase(Locale.ROOT)

...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala Outdated

    
                  val partitionSchema = StructType(tableSchema.filter(f => partitionColumns.contains(f.name)))

                  assert(hoodieCatalogTable.primaryKeys.nonEmpty,

                    s"There are no primary key defined in table ${hoodieCatalogTable.table.identifier}, cannot execute delete operator")

Contributor

xiarixiaoyao Apr 3, 2022

delete operation ？

...source/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala Outdated

    
                      PARTITIONPATH_FIELD.key -> tableConfig.getPartitionFieldProp,

                      HiveSyncConfig.HIVE_SYNC_MODE.key -> HiveSyncMode.HMS.name(),

                      HiveSyncConfig.HIVE_SUPPORT_TIMESTAMP_TYPE.key -> "true",

                      HoodieWriteConfig.DELETE_PARALLELISM_VALUE.key -> "200",

Contributor

xiarixiaoyao Apr 3, 2022

why 200？ it will be better to make it Configurable

xiarixiaoyao requested review from xiarixiaoyao and removed request for xiarixiaoyao

April 3, 2022 14:47


          [HUDI-3780] improve drop partitions

d77964d

XuQianJin-Stars force-pushed the fix-delete-partitions branch from ee4501b to d77964d Compare

April 4, 2022 03:25

Collaborator

hudi-bot commented Apr 4, 2022

CI report:

d77964d Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

xiarixiaoyao approved these changes

View reviewed changes

Contributor

xiarixiaoyao commented Apr 4, 2022

LGTM

xiarixiaoyao merged commit 3449e86 into apache:master

dongkelun mentioned this pull request

[HUDI-2057][HUDI-5499] CTAS Generate An External Table When Create Managed Table #6419

Open

4 tasks

alexeykudinkin added a commit that referenced this pull request


          [HUDI-5499] Fixing Spark SQL configs not being properly propagated fo…

f0f8d61

…r CTAS and other commands (#7607)

### Change Logs

While following up and adding support for BrooklynData Benchmarks we've discovered that CTAS isn't properly propagating configs due to a recent change in [#5178](https://github.com/apache/hudi/pull/5178/files#diff-560283e494c8ba8da102fc217a2201220dd4db731ec23d80884e0f001a7cc0bcR117)

Unfortunately logic of handling the configuration in `ProvidesHoodieConfig` become overly complicated and fragmented. 

This PR takes a stab at it trying to unify and streamline fusing the options from different sources (Spark Catalog props, Table properties, Spark SQL conf, overrides, etc) making sure different Spark SQL operations do handle it in much the same way (for ex, `MERGE INTO`, CTAS, `INSERT INTO`, etc)

Changes

 - Simplify and unify `ProvidesHoodieConfig` configuration fusion from different sources
 - Fixing CTAS to override "hoodie.combine.before.insert" as "false"

fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request


          [HUDI-5499] Fixing Spark SQL configs not being properly propagated fo…

8e58c55

…r CTAS and other commands (apache#7607)

### Change Logs

While following up and adding support for BrooklynData Benchmarks we've discovered that CTAS isn't properly propagating configs due to a recent change in [apache#5178](https://github.com/apache/hudi/pull/5178/files#diff-560283e494c8ba8da102fc217a2201220dd4db731ec23d80884e0f001a7cc0bcR117)

Unfortunately logic of handling the configuration in `ProvidesHoodieConfig` become overly complicated and fragmented. 

This PR takes a stab at it trying to unify and streamline fusing the options from different sources (Spark Catalog props, Table properties, Spark SQL conf, overrides, etc) making sure different Spark SQL operations do handle it in much the same way (for ex, `MERGE INTO`, CTAS, `INSERT INTO`, etc)

Changes

 - Simplify and unify `ProvidesHoodieConfig` configuration fusion from different sources
 - Fixing CTAS to override "hoodie.combine.before.insert" as "false"

nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request


          [HUDI-5499] Fixing Spark SQL configs not being properly propagated fo…

78c0c14

…r CTAS and other commands (apache#7607)

While following up and adding support for BrooklynData Benchmarks we've discovered that CTAS isn't properly propagating configs due to a recent change in [apache#5178](https://github.com/apache/hudi/pull/5178/files#diff-560283e494c8ba8da102fc217a2201220dd4db731ec23d80884e0f001a7cc0bcR117)

Unfortunately logic of handling the configuration in `ProvidesHoodieConfig` become overly complicated and fragmented.

This PR takes a stab at it trying to unify and streamline fusing the options from different sources (Spark Catalog props, Table properties, Spark SQL conf, overrides, etc) making sure different Spark SQL operations do handle it in much the same way (for ex, `MERGE INTO`, CTAS, `INSERT INTO`, etc)

Changes

 - Simplify and unify `ProvidesHoodieConfig` configuration fusion from different sources
 - Fixing CTAS to override "hoodie.combine.before.insert" as "false"

fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request


          [HUDI-5499] Fixing Spark SQL configs not being properly propagated fo…

3e5efd7

…r CTAS and other commands (apache#7607)

### Change Logs

While following up and adding support for BrooklynData Benchmarks we've discovered that CTAS isn't properly propagating configs due to a recent change in [apache#5178](https://github.com/apache/hudi/pull/5178/files#diff-560283e494c8ba8da102fc217a2201220dd4db731ec23d80884e0f001a7cc0bcR117)

Unfortunately logic of handling the configuration in `ProvidesHoodieConfig` become overly complicated and fragmented. 

This PR takes a stab at it trying to unify and streamline fusing the options from different sources (Spark Catalog props, Table properties, Spark SQL conf, overrides, etc) making sure different Spark SQL operations do handle it in much the same way (for ex, `MERGE INTO`, CTAS, `INSERT INTO`, etc)

Changes

 - Simplify and unify `ProvidesHoodieConfig` configuration fusion from different sources
 - Fixing CTAS to override "hoodie.combine.before.insert" as "false"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels