Skip to content

Conversation

@linshan-ma
Copy link
Contributor

Tips

What is the purpose of the pull request

Modify hive partition synchronization

Brief change log

  • delete Collections.sort(hivePartitionValues) and Collections.sort(storagePartitionValues)

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@codecov-io
Copy link

codecov-io commented Nov 19, 2020

Codecov Report

Merging #2262 (c1e36b2) into master (c8d5ea2) will decrease coverage by 1.49%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2262      +/-   ##
============================================
- Coverage     53.51%   52.01%   -1.50%     
- Complexity     2769     2952     +183     
============================================
  Files           348      395      +47     
  Lines         16107    17650    +1543     
  Branches       1643     1809     +166     
============================================
+ Hits           8619     9181     +562     
- Misses         6789     7710     +921     
- Partials        699      759      +60     
Flag Coverage Δ Complexity Δ
hudicli 38.83% <ø> (+0.45%) 0.00 <ø> (ø)
hudiclient 100.00% <ø> (ø) 0.00 <ø> (ø)
hudicommon 54.93% <ø> (-0.34%) 0.00 <ø> (ø)
hudiflink 10.20% <ø> (?) 0.00 <ø> (?)
hudihadoopmr 33.52% <ø> (+0.58%) 0.00 <ø> (ø)
hudispark ? ?
hudisparkdatasource 62.85% <ø> (?) 0.00 <ø> (?)
hudisync 48.06% <ø> (?) 0.00 <ø> (?)
huditimelineservice 65.30% <ø> (ø) 0.00 <ø> (ø)
hudiutilities 69.65% <ø> (-0.41%) 0.00 <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...in/java/org/apache/hudi/hive/HoodieHiveClient.java 72.01% <ø> (ø) 40.00 <0.00> (?)
...di/common/table/timeline/HoodieActiveTimeline.java 69.62% <0.00%> (-4.51%) 41.00% <0.00%> (ø%)
.../apache/hudi/common/table/log/HoodieLogFormat.java 71.23% <0.00%> (-4.13%) 3.00% <0.00%> (ø%)
...che/hudi/common/table/timeline/HoodieTimeline.java 89.13% <0.00%> (-4.06%) 43.00% <0.00%> (ø%)
.../apache/hudi/common/util/ObjectSizeCalculator.java 73.48% <0.00%> (-2.64%) 24.00% <0.00%> (ø%)
...g/apache/hudi/common/model/WriteOperationType.java 53.12% <0.00%> (-2.44%) 2.00% <0.00%> (ø%)
...ache/hudi/common/util/collection/DiskBasedMap.java 80.88% <0.00%> (-2.07%) 29.00% <0.00%> (+2.00%) ⬇️
...i/common/util/collection/ExternalSpillableMap.java 71.42% <0.00%> (-1.61%) 29.00% <0.00%> (ø%)
...a/org/apache/hudi/common/util/ClusteringUtils.java 89.06% <0.00%> (-1.42%) 18.00% <0.00%> (ø%)
.../hudi/common/util/collection/LazyFileIterable.java 74.41% <0.00%> (-0.59%) 2.00% <0.00%> (ø%)
... and 122 more

Copy link
Contributor

@n3nash n3nash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linshan-ma Can you please describe the need for this change ?

@n3nash n3nash self-assigned this Nov 20, 2020
@linshan-ma
Copy link
Contributor Author

linshan-ma commented Nov 23, 2020

@n3nash The hive/hudi table have contain partition 'year=2020/month=01/day=11', when you write a new partition 'year=2020/month=11/day=01' to the table, it will throw error! Because in the getPartitionEvents method logic, 'year=2020/month=01/day=11' will transfer to be "01, 11, 2020" and 'year=2020/month=11/day=01' will transfer to be "01, 11, 2020" too, So the new partition 'year=2020/month=11/day=01' is treated as a update event actually it is a new partition, and the coming processing will update the table partitions using "ALTER TABLE XX PARTITION (year='2020',month='11',day='01') SET LOCATION ..."! To update a not existed partition will throw error!

@linshan-ma
Copy link
Contributor Author

@linshan-ma Can you please describe the need for this change ?

This is described in more detail。#2234

@leesf
Copy link
Contributor

leesf commented Dec 6, 2020

@n3nash would you please review this again when free?

@yanghua
Copy link
Contributor

yanghua commented Dec 12, 2020

@liujinhui1994 Can you help to review this PR. @linshan-ma It would be better to make the title of the issue more descriable.

@vinothchandar vinothchandar self-assigned this Dec 15, 2020
@vinothchandar
Copy link
Member

@n3nash do you see any reason to keep the sort?

@nsivabalan nsivabalan added the priority:blocker Production down; release blocker label Dec 26, 2020
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bvaradar @n3nash : really surprised that we had this bug so far. How come none of our customers reported this until now nor didn't catch at Uber so far. Any interplay between month value and day value will run into issues.

Here is the gist of the bug. After getting partition value, we were sorting each partition value. For eg, "2020,12,25" will be sorted(lexicographically) as "12,2020,25". In other words, both "2020,01,02" and "2020,02,01" will be sorted as "01,02,2020".

I did some debugging and couldn't quite comprehend why we added the sorting in the first place only. I am sure I am missing something here.
I am trying to add a test to cover this scenario and having some issues on that end. In the meantime, do respond w/ your comments if any.

@nsivabalan
Copy link
Contributor

nsivabalan commented Dec 29, 2020

@linshan-ma : I see lot of commits(/rebase) in the patch even though actual fix is just one commit. guess it wasn't cleanly applied. Can you try fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants