[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path #5377

alexeykudinkin · 2022-04-20T21:28:07Z

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

#5364 made extraction of values for partition columns from partition path became configurable, disabling it by default,
since by default Hudi persists partition columns in the data file which could be fetched directly instead of parsing partition values from partition path.

This PR adds a fallback configuration allowing to control whether partition values should be parsed from the partition path (which is default Spark behavior).

Brief change log

Unified shouldOmitPartitionColumns and shouldExtractPartitionValuesFromPartitionPath flags
Added new EXTRACT_PARTITION_VALUES_FROM_PARTITION_PATH
Added test

Verify this pull request

This pull request is already covered by existing tests, such as (please describe tests).
This change added tests and can be verified as follows:

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

Added docs

…sFromPartitionPath` flags

…om path or not

…etFileFormat`

nsivabalan · 2022-04-21T04:01:22Z

hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java

      return this;
    }

-    public PropertyBuilder setDropPartitionColumnsWhenWrite(Boolean dropPartitionColumnsWhenWrite) {


having "write" in the name makes is clear. If not, one could read it as "should drop partition columns when reading". So, I feel we can leave it as is.

hudi-bot · 2022-04-21T05:30:41Z

CI report:

51333eb Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

xushiyan · 2022-04-21T08:24:32Z

hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala

  }
+
+  implicit def convert[T, U](prop: ConfigProperty[T])(implicit converter: T => U): ConfigProperty[U] = {
+    checkState(prop.hasDefaultValue)


this could implicitly break when add a new config with no default. i see this improves code quality but we should avoid nice-to-have changes in the last min patch before release.

If this will break it will break when the class is loaded, meaning that all the tests using class would be broken, which is very easy to diagnose

…raction from Partition path (#5377)

TengHuo · 2022-11-15T02:47:19Z

Hi @alexeykudinkin
As I understand, when this config hoodie.datasource.read.extract.partition.values.from.path is false, it preserves the same behaviour as previous version (version < 0.11.0). Am I right?

alexeykudinkin · 2022-11-15T19:24:52Z

@TengHuo correct

TengHuo · 2022-11-16T02:38:50Z

Got it, thanks a lot

Alexey Kudinkin added 3 commits April 20, 2022 12:11

Abstracted shared configuration;

5bce2e1

Added docs

Unified shouldOmitPartitionColumns and `shouldExtractPartitionValue…

0a16db7

…sFromPartitionPath` flags

Typos

e63b627

alexeykudinkin changed the title ~~[WIP] Fixing Spark32HoodieParquetFileFormat not being compatible w/ Spark 3.2.0~~ [WIP] Adding config to fallback to appending columns Apr 20, 2022

alexeykudinkin force-pushed the ak/ts-kgen-cfg-fb branch from 08b5e4f to e63b627 Compare April 20, 2022 21:34

Alexey Kudinkin added 6 commits April 20, 2022 17:19

Tidying up

d67ba65

Added new EXTRACT_PARTITION_VALUES_FROM_PARTITION_PATH

1c27cac

Wired new config into HoodieTableMetaClient

d5591b5

Rely on new feedback to decide whether to extract partition values fr…

662af81

…om path or not

Tidying up

274bdd0

Added test

8554a83

alexeykudinkin changed the title ~~[WIP] Adding config to fallback to appending columns~~ [HUDI-3935] Adding config to fallback to appending columns Apr 21, 2022

Alexey Kudinkin added 5 commits April 20, 2022 18:48

Removed config from HoodieTableConfig made it read-side only

3ec7e0f

Fixed invalid cast

d46188d

Added test

2596fc2

Handle EXTRACT_PARTITION_VALUES_FROM_PARTITION_PATH in `HoodieParqu…

1a865d9

…etFileFormat`

Fixed tests

51333eb

nsivabalan reviewed Apr 21, 2022

View reviewed changes

nsivabalan approved these changes Apr 21, 2022

View reviewed changes

nsivabalan added the priority:blocker Production down; release blocker label Apr 21, 2022

alexeykudinkin changed the title ~~[HUDI-3935] Adding config to fallback to appending columns~~ [HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path Apr 21, 2022

xushiyan approved these changes Apr 21, 2022

View reviewed changes

xushiyan merged commit 4b296f7 into apache:master Apr 21, 2022

xushiyan pushed a commit that referenced this pull request Apr 21, 2022

[HUDI-3935] Adding config to fallback to enabled Partition Values ext…

6fccca6

…raction from Partition path (#5377)

nsivabalan mentioned this pull request Apr 29, 2022

[HUDI-3997] Add 0.11.0 release notes #5466

Merged

codope self-assigned this Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path #5377

[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path #5377

Uh oh!

alexeykudinkin commented Apr 20, 2022 •

edited

Loading

Uh oh!

nsivabalan Apr 21, 2022

Uh oh!

hudi-bot commented Apr 21, 2022

Uh oh!

xushiyan Apr 21, 2022

Uh oh!

alexeykudinkin Apr 21, 2022

Uh oh!

TengHuo commented Nov 15, 2022

Uh oh!

alexeykudinkin commented Nov 15, 2022

Uh oh!

TengHuo commented Nov 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path #5377

[HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path #5377

Uh oh!

Conversation

alexeykudinkin commented Apr 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

nsivabalan Apr 21, 2022

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Apr 21, 2022

CI report:

Uh oh!

xushiyan Apr 21, 2022

Choose a reason for hiding this comment

Uh oh!

alexeykudinkin Apr 21, 2022

Choose a reason for hiding this comment

Uh oh!

TengHuo commented Nov 15, 2022

Uh oh!

alexeykudinkin commented Nov 15, 2022

Uh oh!

TengHuo commented Nov 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

alexeykudinkin commented Apr 20, 2022 •

edited

Loading