Skip to content

Conversation

@alexeykudinkin
Copy link
Contributor

@alexeykudinkin alexeykudinkin commented Apr 20, 2022

Tips

What is the purpose of the pull request

#5364 made extraction of values for partition columns from partition path became configurable, disabling it by default,
since by default Hudi persists partition columns in the data file which could be fetched directly instead of parsing partition values from partition path.

This PR adds a fallback configuration allowing to control whether partition values should be parsed from the partition path (which is default Spark behavior).

Brief change log

  • Unified shouldOmitPartitionColumns and shouldExtractPartitionValuesFromPartitionPath flags
  • Added new EXTRACT_PARTITION_VALUES_FROM_PARTITION_PATH
  • Added test

Verify this pull request

This pull request is already covered by existing tests, such as (please describe tests).
This change added tests and can be verified as follows:

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@alexeykudinkin alexeykudinkin changed the title [WIP] Fixing Spark32HoodieParquetFileFormat not being compatible w/ Spark 3.2.0 [WIP] Adding config to fallback to appending columns Apr 20, 2022
@alexeykudinkin alexeykudinkin changed the title [WIP] Adding config to fallback to appending columns [HUDI-3935] Adding config to fallback to appending columns Apr 21, 2022
return this;
}

public PropertyBuilder setDropPartitionColumnsWhenWrite(Boolean dropPartitionColumnsWhenWrite) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having "write" in the name makes is clear. If not, one could read it as "should drop partition columns when reading". So, I feel we can leave it as is.

@nsivabalan nsivabalan added the priority:blocker Production down; release blocker label Apr 21, 2022
@alexeykudinkin alexeykudinkin changed the title [HUDI-3935] Adding config to fallback to appending columns [HUDI-3935] Adding config to fallback to enabled Partition Values extraction from Partition path Apr 21, 2022
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

}

implicit def convert[T, U](prop: ConfigProperty[T])(implicit converter: T => U): ConfigProperty[U] = {
checkState(prop.hasDefaultValue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could implicitly break when add a new config with no default. i see this improves code quality but we should avoid nice-to-have changes in the last min patch before release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this will break it will break when the class is loaded, meaning that all the tests using class would be broken, which is very easy to diagnose

@xushiyan xushiyan merged commit 4b296f7 into apache:master Apr 21, 2022
xushiyan pushed a commit that referenced this pull request Apr 21, 2022
@codope codope self-assigned this Jul 26, 2022
@TengHuo
Copy link
Contributor

TengHuo commented Nov 15, 2022

Hi @alexeykudinkin
As I understand, when this config hoodie.datasource.read.extract.partition.values.from.path is false, it preserves the same behaviour as previous version (version < 0.11.0). Am I right?

@alexeykudinkin
Copy link
Contributor Author

@TengHuo correct

@TengHuo
Copy link
Contributor

TengHuo commented Nov 16, 2022

Got it, thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

No open projects
Status: No status

Development

Successfully merging this pull request may close these issues.

6 participants