Skip to content

Conversation

@dujl
Copy link
Contributor

@dujl dujl commented Aug 29, 2022

issue description

Create a non-partition hudi table in Spark,it will store spark.sql.sources.schema.partCol.0 with an empty value in hiveMetastore. This is unexpected behavior, it should not store spark.sql.sources.schema.partCol.0 in HiveMetastore when it is a non-partition table.

Steps to reproduce the behavior:

  1. Create a non-partition hudi table in Spark
create table hudi_mor_tbl (
id int,
name string,
price double,
ts bigint
) using hudi
tblproperties (
type = 'mor',
primaryKey = 'id',
preCombineField = 'ts'
) 
  1. insert data one row to it.
insert into hudi_mor_tbl select 1, 'a1', 20, 1000; 
  1. cat hoodie.properties in table's base path,
    it include partition.fields key with an empty value
hoodie.table.partition.fields=
  1. check spark.sql.sources.schema.partCol.0 that stored in table TABLE_PARAMS of the HiveMetaStore .
|50|spark.sql.sources.schema.partCol.0|

it has a value "".

Change Logs

When init a non-partition hoodie table, should set PartitionFields as null instead of empty string "".
Then after sync table meta to hiveMetaStore, it will not store spark.sql.sources.schema.partCol.

Impact

fix the bug when create non-partition table in spark
more detail see jira https://issues.apache.org/jira/browse/HUDI-4237
Risk level: none | low | medium | high

low

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@dujl dujl changed the title [HUDI-4237] should not sync partition parametes when create non-partition table in spark [HUDI-4237] should not sync partition parameters when create non-partition table in spark Aug 29, 2022
@yihua yihua added priority:high Significant impact; potential bugs engine:spark Spark integration area:catalog Catalog integration priority:critical Production degraded; pipelines stalled and removed priority:high Significant impact; potential bugs labels Aug 30, 2022
@yihua
Copy link
Contributor

yihua commented Sep 5, 2022

@XuQianJin-Stars @alexeykudinkin could you check if this is needed? Functionality-wise, is the fix necessary?

@dujl dujl closed this Sep 6, 2022
@dujl dujl reopened this Sep 6, 2022
@alexeykudinkin
Copy link
Contributor

@dujl can you please update the PR description w/ the crux of the issue?

The one in Jira is very detailed (thanks for providing it!), but it's important to make sure PRs also have detailed descriptions as well.

@dujl
Copy link
Contributor Author

dujl commented Sep 7, 2022

@dujl can you please update the PR description w/ the crux of the issue?

The one in Jira is very detailed (thanks for providing it!), but it's important to make sure PRs also have detailed descriptions as well.

done

@dujl
Copy link
Contributor Author

dujl commented Sep 7, 2022

@alexeykudinkin please help to review and approve

@alexeykudinkin
Copy link
Contributor

Approved already.

@nsivabalan can you please help landing this one?

@minihippo
Copy link
Contributor

Hi @XuQianJin-Stars , Can you land this bugfix?

@XuQianJin-Stars
Copy link
Contributor

hi @dujl the ci is failed.

@dujl
Copy link
Contributor Author

dujl commented Sep 17, 2022 via email

@yihua
Copy link
Contributor

yihua commented Sep 17, 2022

@dujl It's likely due to CI flakiness. Could you rebase this PR on the latest master?

@yihua yihua force-pushed the bug-hudi-partition-table-github branch from 5d16a4e to 7506862 Compare September 17, 2022 06:00
@yihua
Copy link
Contributor

yihua commented Sep 17, 2022

@hudi-bot run azure

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua
Copy link
Contributor

yihua commented Sep 17, 2022

@dujl The failed tests in GH action are reproducible. Could you look into those?

@dujl
Copy link
Contributor Author

dujl commented Sep 19, 2022

@dujl The failed tests in GH action are reproducible. Could you look into those?

ok, i will check it

@nsivabalan nsivabalan added priority:blocker Production down; release blocker and removed priority:critical Production degraded; pipelines stalled labels Sep 22, 2022
@xushiyan
Copy link
Member

close in favor or #6821

@xushiyan xushiyan closed this Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:catalog Catalog integration engine:spark Spark integration priority:blocker Production down; release blocker

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants