Skip to content

[HUDI-2526] Make spark.sql.parquet.writeLegacyFormat configurable#3917

Merged
nsivabalan merged 1 commit intoapache:masterfrom
codope:hudi-2526-spark-config
Nov 5, 2021
Merged

[HUDI-2526] Make spark.sql.parquet.writeLegacyFormat configurable#3917
nsivabalan merged 1 commit intoapache:masterfrom
codope:hudi-2526-spark-config

Conversation

@codope
Copy link
Member

@codope codope commented Nov 3, 2021

What is the purpose of the pull request

spark.sql.parquet.writeLegacyFormat was harcoded to false in HoodieRowParquetWriteSupport. In some cases, users need to set it to true. From a user on Slack:

Reason to use this config:
Current Bulk insert use spark dataframe writer and don't do avro conversion. The decimal columns in my DF are written as INT32 type in parquet.
The upsert functionality which uses avro conversion is generating Fixed Length byte array for decimal types which is failing with datatype mismatch.

Brief change log

  • Make spark.sql.parquet.writeLegacyFormat configurable.

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan nsivabalan added the priority:blocker Production down; release blocker label Nov 3, 2021
@hudi-bot
Copy link
Collaborator

hudi-bot commented Nov 5, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit 08c35a5 into apache:master Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants