-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3780] improve drop partitions #5178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-3780] improve drop partitions #5178
Conversation
ae1d66b to
8889b34
Compare
8889b34 to
ee4501b
Compare
|
hi @vinothchandar @xushiyan this pr fix for #4291, please hive a look. |
| import java.util.stream.Collectors; | ||
| import java.util.stream.Stream; | ||
| import javax.annotation.Nonnull; | ||
| import org.apache.avro.AvroTypeException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change import order?
| } catch (Exception e) { | ||
| throw new HoodieException("Failed to get commit metadata", e); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move this method from TableSchemaResolver to this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move this method from TableSchemaResolver to this class
@vinothchandar suggest extract this out to a separate static helper.
|
|
||
| @Override | ||
| public void dropPartitions(String tableName, List<String> partitionsToDrop) { | ||
| throw new UnsupportedOperationException("No support for dropPartitions yet."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls do not delete origin code annotation
// bigQuery discovers the new partitions automatically, so do nothing.
| import org.apache.spark.sql.types.StructType | ||
|
|
||
| import java.util | ||
| import scala.collection.JavaConverters.propertiesAsScalaMapConverter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java import and scala import should be separated
|
|
||
| val options = hoodieCatalogTable.catalogProperties ++ tableConfig.getProps.asScala.toMap ++ extraOptions | ||
| val options: Map[String, String] = hoodieCatalogTable.catalogProperties ++ tableConfig.getProps.asScala.toMap ++ sparkSession.sqlContext.conf.getAllConfs ++ extraOptions | ||
| val hiveSyncConfig = buildHiveSyncConfig(options, hoodieCatalogTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be good to reuse code? i see double defined
| val tableConfig = hoodieCatalogTable.tableConfig | ||
| val tableSchema = hoodieCatalogTable.tableSchema | ||
| val partitionColumns = tableConfig.getPartitionFieldProp.split(",").map(_.toLowerCase) | ||
| val partitionSchema = StructType(tableSchema.filter(f => partitionColumns.contains(f.name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.toLowerCase(Locale.ROOT)
| val partitionSchema = StructType(tableSchema.filter(f => partitionColumns.contains(f.name))) | ||
|
|
||
| assert(hoodieCatalogTable.primaryKeys.nonEmpty, | ||
| s"There are no primary key defined in table ${hoodieCatalogTable.table.identifier}, cannot execute delete operator") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete operation ?
| PARTITIONPATH_FIELD.key -> tableConfig.getPartitionFieldProp, | ||
| HiveSyncConfig.HIVE_SYNC_MODE.key -> HiveSyncMode.HMS.name(), | ||
| HiveSyncConfig.HIVE_SUPPORT_TIMESTAMP_TYPE.key -> "true", | ||
| HoodieWriteConfig.DELETE_PARALLELISM_VALUE.key -> "200", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 200? it will be better to make it Configurable
ee4501b to
d77964d
Compare
|
LGTM |
…r CTAS and other commands (#7607) ### Change Logs While following up and adding support for BrooklynData Benchmarks we've discovered that CTAS isn't properly propagating configs due to a recent change in [#5178](https://github.com/apache/hudi/pull/5178/files#diff-560283e494c8ba8da102fc217a2201220dd4db731ec23d80884e0f001a7cc0bcR117) Unfortunately logic of handling the configuration in `ProvidesHoodieConfig` become overly complicated and fragmented. This PR takes a stab at it trying to unify and streamline fusing the options from different sources (Spark Catalog props, Table properties, Spark SQL conf, overrides, etc) making sure different Spark SQL operations do handle it in much the same way (for ex, `MERGE INTO`, CTAS, `INSERT INTO`, etc) Changes - Simplify and unify `ProvidesHoodieConfig` configuration fusion from different sources - Fixing CTAS to override "hoodie.combine.before.insert" as "false"
…r CTAS and other commands (apache#7607) ### Change Logs While following up and adding support for BrooklynData Benchmarks we've discovered that CTAS isn't properly propagating configs due to a recent change in [apache#5178](https://github.com/apache/hudi/pull/5178/files#diff-560283e494c8ba8da102fc217a2201220dd4db731ec23d80884e0f001a7cc0bcR117) Unfortunately logic of handling the configuration in `ProvidesHoodieConfig` become overly complicated and fragmented. This PR takes a stab at it trying to unify and streamline fusing the options from different sources (Spark Catalog props, Table properties, Spark SQL conf, overrides, etc) making sure different Spark SQL operations do handle it in much the same way (for ex, `MERGE INTO`, CTAS, `INSERT INTO`, etc) Changes - Simplify and unify `ProvidesHoodieConfig` configuration fusion from different sources - Fixing CTAS to override "hoodie.combine.before.insert" as "false"
…r CTAS and other commands (apache#7607) While following up and adding support for BrooklynData Benchmarks we've discovered that CTAS isn't properly propagating configs due to a recent change in [apache#5178](https://github.com/apache/hudi/pull/5178/files#diff-560283e494c8ba8da102fc217a2201220dd4db731ec23d80884e0f001a7cc0bcR117) Unfortunately logic of handling the configuration in `ProvidesHoodieConfig` become overly complicated and fragmented. This PR takes a stab at it trying to unify and streamline fusing the options from different sources (Spark Catalog props, Table properties, Spark SQL conf, overrides, etc) making sure different Spark SQL operations do handle it in much the same way (for ex, `MERGE INTO`, CTAS, `INSERT INTO`, etc) Changes - Simplify and unify `ProvidesHoodieConfig` configuration fusion from different sources - Fixing CTAS to override "hoodie.combine.before.insert" as "false"
…r CTAS and other commands (apache#7607) ### Change Logs While following up and adding support for BrooklynData Benchmarks we've discovered that CTAS isn't properly propagating configs due to a recent change in [apache#5178](https://github.com/apache/hudi/pull/5178/files#diff-560283e494c8ba8da102fc217a2201220dd4db731ec23d80884e0f001a7cc0bcR117) Unfortunately logic of handling the configuration in `ProvidesHoodieConfig` become overly complicated and fragmented. This PR takes a stab at it trying to unify and streamline fusing the options from different sources (Spark Catalog props, Table properties, Spark SQL conf, overrides, etc) making sure different Spark SQL operations do handle it in much the same way (for ex, `MERGE INTO`, CTAS, `INSERT INTO`, etc) Changes - Simplify and unify `ProvidesHoodieConfig` configuration fusion from different sources - Fixing CTAS to override "hoodie.combine.before.insert" as "false"
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.