-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat(metadata): Improve Logical Type Handling on Col Stats #13711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5a8d574
d211a2c
755ea1e
7089c12
7d221fb
a852c2a
714e47e
5c34d1a
b0c0c7f
35f7030
71d846c
b41b39f
e7cfe3e
017a4cc
9b88584
789bbd1
ac9702f
b495642
ecf1b14
7d067bb
b9f3fc3
e2c834f
38a1e14
d12c482
820ca40
b10e7c8
eba6ceb
8fff23d
fc598f6
9a59641
add0ab0
672ce5b
d655091
262492c
e8d5675
bee20e5
73f2303
bad18ed
be8241b
ed16043
ac3364a
0d38bbe
5ee696f
e2d4f3e
864d78c
e1a45ee
ef6350d
079393a
9e15009
2878663
c5e59da
9687489
5e6b116
ba5e119
a955d92
6333f2c
11edda9
4f2f5c4
1bdb64b
90dc68d
8eff944
3630ba2
87bfc6f
b31de9a
357675c
9abec52
0e45de4
fbb531c
c84c787
5beb5b0
cca547d
06c79cf
7029fe7
f7dad4d
6b1fe2e
656d4e1
8f946f5
b462c63
c01ae4e
681833e
b9a3ecc
f86bbf2
f1b7f67
dade2b6
c174395
17d5552
62cfcb4
d018e59
6840594
1940eb8
fcc6153
4cfa36c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -56,7 +56,6 @@ | |
| import org.apache.hudi.metadata.HoodieTableMetadata; | ||
| import org.apache.hudi.metadata.HoodieTableMetadataUtil; | ||
| import org.apache.hudi.metadata.HoodieIndexVersion; | ||
| import org.apache.hudi.metadata.MetadataPartitionType; | ||
| import org.apache.hudi.storage.StoragePath; | ||
| import org.apache.hudi.table.HoodieTable; | ||
| import org.apache.hudi.table.action.HoodieWriteMetadata; | ||
|
|
@@ -285,23 +284,15 @@ static boolean isMetadataTableBehindDataTable(HoodieWriteConfig config, | |
| * @param table Hoodie table | ||
| * @param operationType Type of operation (upgrade/downgrade) | ||
| */ | ||
| public static void dropNonV1SecondaryIndexPartitions(HoodieWriteConfig config, HoodieEngineContext context, | ||
| HoodieTable table, SupportsUpgradeDowngrade upgradeDowngradeHelper, String operationType) { | ||
| public static void dropNonV1IndexPartitions(HoodieWriteConfig config, HoodieEngineContext context, | ||
| HoodieTable table, SupportsUpgradeDowngrade upgradeDowngradeHelper, String operationType) { | ||
| HoodieTableMetaClient metaClient = table.getMetaClient(); | ||
| try (BaseHoodieWriteClient writeClient = upgradeDowngradeHelper.getWriteClient(config, context)) { | ||
| List<String> mdtPartitions = metaClient.getTableConfig().getMetadataPartitions() | ||
| .stream() | ||
| .filter(partition -> { | ||
| // Only drop secondary indexes that are not V1 | ||
| return metaClient.getIndexForMetadataPartition(partition) | ||
| .map(indexDef -> { | ||
| if (MetadataPartitionType.fromPartitionPath(indexDef.getIndexName()).equals(MetadataPartitionType.SECONDARY_INDEX)) { | ||
| return HoodieIndexVersion.V1.lowerThan(indexDef.getVersion()); | ||
| } | ||
| return false; | ||
| }) | ||
| .orElse(false); | ||
| }) | ||
| .filter(partition -> metaClient.getIndexForMetadataPartition(partition) | ||
| .map(indexDef -> HoodieIndexVersion.V1.lowerThan(indexDef.getVersion())) | ||
|
Comment on lines
+287
to
+294
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we abstract index upgrade process and move the logic to a central place so it's easier to handle index layout upgrade/downgrade?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll take this separately. |
||
| .orElse(false)) | ||
| .collect(Collectors.toList()); | ||
| LOG.info("Dropping from MDT partitions for {}: {}", operationType, mdtPartitions); | ||
| if (!mdtPartitions.isEmpty()) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -266,7 +266,7 @@ void testDowngradeDropsOnlyV2OrAboveIndexes() { | |
| )).thenAnswer(invocation -> null); // Do nothing | ||
|
|
||
| // Mock the dropNonV1SecondaryIndexPartitions to simulate dropping V2 indexes | ||
| mockedUtils.when(() -> UpgradeDowngradeUtils.dropNonV1SecondaryIndexPartitions( | ||
| mockedUtils.when(() -> UpgradeDowngradeUtils.dropNonV1IndexPartitions( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have functional tests (without mocks) on dropping v2 indexes? |
||
| eq(config), | ||
| eq(context), | ||
| eq(table), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ensuring backwards compatibility through tests, could you generate test artifacts/tables by using 0.15.0 and 1.0.2 release (and from this PR) with column stats enabled and columns of different primitive and logical types (use MOR with log files), and used that for validating col stats reading and data skipping? That'll give us much more confidence on compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonvex also, when doing the backward compatibility testing, let's enforce stricter validation by comparing the metadata records in column stats index in the Metadata Table written by 0.14 and 1.1 (this PR) using the same set of inserts/updates to make sure the storage bytes are the same.