-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Handle partition evolution case in PartitionStatsUtil#computeStats #12137
Conversation
@ajantha-bhat, @aokolnychyi, could you please take a look? |
ping @ajantha-bhat |
partitionData.set(0, c2); | ||
partitionData.set(1, c3); | ||
return partitionData; | ||
@org.junit.Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use junit5 like other tests. (Used wrong annotation?)
I don't think the tests are running right now.
So, please fix it and I will try to understand the test case.
The original intention is to store the unified tuple as the key itself. So, it can handle all the partition evolution. So, from the test I am not yet clear what was expected and what went wrong easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @ajantha-bhat for checking the PR, I almost gave up expecting an answer :)
- removed the wrong test annotation
- I understood the intention, however, there is a bug with the key matching: PartitionMap is initialized with individual partition specs [], but the supplied key is a unified tuple.
For the created test case, the existing code returns just 3 partition values instead of 6. The test simulates partition evolution by changing the number of buckets from 2 to 4: spec(c2, bucket(c1, 2)) -> spec(c2, bucket(c1, 4))
[foo, 0, null]
[foo, 1, null]
[bar, null, 0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expecting actual:
[(PartitionData{c2=foo, c1_bucket=0, c1_bucket_4=null}, 0, 100290L, 1, 38596L, 0L, 0, 0L, 0, 0L, 1739473244037L, 1L),
(PartitionData{c2=foo, c1_bucket=1, c1_bucket_4=null}, 0, 100293L, 1, 43840L, 0L, 0, 0L, 0, 0L, 1739473244037L, 1L),
(PartitionData{c2=bar, c1_bucket=null, c1_bucket_4=0}, 1, 402736L, 4, 46675L, 0L, 0, 0L, 0, 0L, 1739473244341L, 2L)]
to contain exactly in any order:
[(PartitionData{c2=foo, c1_bucket=0, c1_bucket_4=null}, 0, 100290L, 1, 38596L, 0L, 0, 0L, 0, 0L, 1739473244037L, 1L),
(PartitionData{c2=foo, c1_bucket=1, c1_bucket_4=null}, 0, 100293L, 1, 43840L, 0L, 0, 0L, 0, 0L, 1739473244037L, 1L),
(PartitionData{c2=bar, c1_bucket=null, c1_bucket_4=0}, 1, 100747L, 1, 5616L, 0L, 0, 0L, 0, 0L, 1739473244341L, 2L),
(PartitionData{c2=bar, c1_bucket=null, c1_bucket_4=1}, 1, 100122L, 1, 7260L, 0L, 0, 0L, 0, 0L, 1739473244341L, 2L),
(PartitionData{c2=bar, c1_bucket=null, c1_bucket_4=2}, 1, 100914L, 1, 13793L, 0L, 0, 0L, 0, 0L, 1739473244341L, 2L),
(PartitionData{c2=bar, c1_bucket=null, c1_bucket_4=3}, 1, 100953L, 1, 20006L, 0L, 0, 0L, 0, 0L, 1739473244341L, 2L)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see,
PartitionMap is initialized with individual partition specs [], but the supplied key is a unified tuple
I assumed partitionMap will use the key supplied. Now I see that it is based on the individual spec
return StructLikeMap.create(spec.partitionType()); |
Since, we don't combine stats for different spec id, your solution works.
Thanks for the fix.
63d1a22
to
a36a8bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@aokolnychyi: Could you please help in merging this PR?
PR title can be |
statsMap.computeIfAbsent(specId, key, () -> new PartitionStats(key, specId)); | ||
statsMap.computeIfAbsent( | ||
specId, | ||
((PartitionData) file.partition()).copy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about these casts.
Could keyTemplate.copyFor(file.partition())
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the tests are broken by the infra 😢
Could you help me understand the original issue? I thought that I understand, but seems like I still miss something.
I get that there is difference between the partitionType
(which is the unified spec), the coercedPartition
which is the partition key for the unified partition type.
What is the key we would like to see in the statsMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed in private. @pvary, please let me know if any questions left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this with @RussellSpitzer, and he confirmed that casting is indeed correct.
core/src/test/java/org/apache/iceberg/TestPartitionStatsUtil.java
Outdated
Show resolved
Hide resolved
a4c0df5
to
6399f60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
When the infra recovers, and ww have a green run, we can commit
6399f60
to
986a53f
Compare
Merged to master. |
Description:
PartitionMap
creates internal wrappers based on table specspartitionType
, however, we are passing coercedPartition withunified partition type
considering all specs as aKEY
, leading to wrong evaluation.#12168
Repro:
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/test/queries/positive/iceberg_bucket_map_join_4.q
Testing: