-
Notifications
You must be signed in to change notification settings - Fork 3k
Hive: Expose Snapshot Stats in HMS. #4456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
98a9b73
5fe283c
c7b0e62
1d96a1b
8eac228
3a11d9b
72bd3be
4daa05a
8f5edaa
cd726fe
1e0d23f
a3bea2f
617d665
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,11 +21,17 @@ | |
|
|
||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.UUID; | ||
| import org.apache.hadoop.conf.Configuration; | ||
| import org.apache.hadoop.hive.metastore.api.Database; | ||
| import org.apache.iceberg.AssertHelpers; | ||
| import org.apache.iceberg.CachingCatalog; | ||
| import org.apache.iceberg.DataFile; | ||
| import org.apache.iceberg.DataFiles; | ||
| import org.apache.iceberg.FileFormat; | ||
| import org.apache.iceberg.PartitionSpec; | ||
| import org.apache.iceberg.Schema; | ||
| import org.apache.iceberg.Snapshot; | ||
| import org.apache.iceberg.SortOrder; | ||
| import org.apache.iceberg.Table; | ||
| import org.apache.iceberg.TableProperties; | ||
|
|
@@ -42,6 +48,7 @@ | |
| import org.apache.iceberg.transforms.Transform; | ||
| import org.apache.iceberg.transforms.Transforms; | ||
| import org.apache.iceberg.types.Types; | ||
| import org.apache.iceberg.util.JsonUtil; | ||
| import org.apache.thrift.TException; | ||
| import org.junit.Assert; | ||
| import org.junit.Rule; | ||
|
|
@@ -52,6 +59,9 @@ | |
| import static org.apache.iceberg.NullOrder.NULLS_FIRST; | ||
| import static org.apache.iceberg.SortDirection.ASC; | ||
| import static org.apache.iceberg.types.Types.NestedField.required; | ||
| import static org.mockito.Mockito.mock; | ||
| import static org.mockito.Mockito.spy; | ||
| import static org.mockito.Mockito.when; | ||
|
|
||
| public class TestHiveCatalog extends HiveMetastoreTest { | ||
| private static ImmutableMap meta = ImmutableMap.of( | ||
|
|
@@ -468,4 +478,85 @@ public void testUUIDinTableProperties() throws Exception { | |
| catalog.dropTable(tableIdentifier); | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| public void testSnapshotStatsTableProperties() throws Exception { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably have a test with the string exceeding the property size threshold, checking to make sure the behavior is as we expect.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a test case for that |
||
| Schema schema = new Schema( | ||
| required(1, "id", Types.IntegerType.get(), "unique ID"), | ||
| required(2, "data", Types.StringType.get()) | ||
| ); | ||
| TableIdentifier tableIdentifier = TableIdentifier.of(DB_NAME, "tbl"); | ||
| String location = temp.newFolder("tbl").toString(); | ||
|
|
||
| try { | ||
| catalog.buildTable(tableIdentifier, schema) | ||
| .withLocation(location) | ||
| .create(); | ||
|
|
||
| String tableName = tableIdentifier.name(); | ||
| org.apache.hadoop.hive.metastore.api.Table hmsTable = | ||
| metastoreClient.getTable(tableIdentifier.namespace().level(0), tableName); | ||
|
|
||
| // check whether parameters are in expected state | ||
| Map<String, String> parameters = hmsTable.getParameters(); | ||
| Assert.assertEquals("0", parameters.get(TableProperties.SNAPSHOT_COUNT)); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we put some error messages in these asserts, like in other tests?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure it is necessary. The code here can explain itself quite well. |
||
| Assert.assertNull(parameters.get(TableProperties.CURRENT_SNAPSHOT_SUMMARY)); | ||
| Assert.assertNull(parameters.get(TableProperties.CURRENT_SNAPSHOT_ID)); | ||
| Assert.assertNull(parameters.get(TableProperties.CURRENT_SNAPSHOT_TIMESTAMP)); | ||
|
|
||
| // create a snapshot | ||
| Table icebergTable = catalog.loadTable(tableIdentifier); | ||
| String fileName = UUID.randomUUID().toString(); | ||
| DataFile file = DataFiles.builder(icebergTable.spec()) | ||
| .withPath(FileFormat.PARQUET.addExtension(fileName)) | ||
| .withRecordCount(2) | ||
| .withFileSizeInBytes(0) | ||
| .build(); | ||
| icebergTable.newFastAppend().appendFile(file).commit(); | ||
|
|
||
| // check whether parameters are in expected state | ||
| hmsTable = metastoreClient.getTable(tableIdentifier.namespace().level(0), tableName); | ||
| parameters = hmsTable.getParameters(); | ||
| Assert.assertEquals("1", parameters.get(TableProperties.SNAPSHOT_COUNT)); | ||
| String summary = JsonUtil.mapper().writeValueAsString(icebergTable.currentSnapshot().summary()); | ||
| Assert.assertEquals(summary, parameters.get(TableProperties.CURRENT_SNAPSHOT_SUMMARY)); | ||
| long snapshotId = icebergTable.currentSnapshot().snapshotId(); | ||
| Assert.assertEquals(String.valueOf(snapshotId), parameters.get(TableProperties.CURRENT_SNAPSHOT_ID)); | ||
| Assert.assertEquals(String.valueOf(icebergTable.currentSnapshot().timestampMillis()), | ||
| parameters.get(TableProperties.CURRENT_SNAPSHOT_TIMESTAMP)); | ||
|
|
||
| } finally { | ||
| catalog.dropTable(tableIdentifier); | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| public void testSetSnapshotSummary() throws Exception { | ||
| Configuration conf = new Configuration(); | ||
| conf.set("iceberg.hive.table-property-max-size", "4000"); | ||
| HiveTableOperations spyOps = spy(new HiveTableOperations(conf, null, null, catalog.name(), DB_NAME, "tbl")); | ||
| Snapshot snapshot = mock(Snapshot.class); | ||
| Map<String, String> summary = Maps.newHashMap(); | ||
| when(snapshot.summary()).thenReturn(summary); | ||
|
|
||
| // create a snapshot summary whose json string size is less than the limit | ||
| for (int i = 0; i < 100; i++) { | ||
| summary.put(String.valueOf(i), "value"); | ||
| } | ||
| Assert.assertTrue(JsonUtil.mapper().writeValueAsString(summary).length() < 4000); | ||
| Map<String, String> parameters = Maps.newHashMap(); | ||
| spyOps.setSnapshotSummary(parameters, snapshot); | ||
| Assert.assertEquals("The snapshot summary must be in parameters", 1, parameters.size()); | ||
|
|
||
| // create a snapshot summary whose json string size exceeds the limit | ||
| for (int i = 0; i < 1000; i++) { | ||
| summary.put(String.valueOf(i), "value"); | ||
| } | ||
| long summarySize = JsonUtil.mapper().writeValueAsString(summary).length(); | ||
| // the limit has been updated to 4000 instead of the default value(32672) | ||
| Assert.assertTrue(summarySize > 4000 && summarySize < 32672); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: i think no need for the upper limit check, right? We are not really saving in Hive.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is to make sure the new limit takes affect.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a comment in the code
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, I meant, can't we just do
To me the test is checking whether you can persist something beyond 4000 chars right? The fact that it's below or above 32627 chars should not trigger now that we changed limit to 4000 right? Then, no need for the additional comment, if this is the case. Let me know if I am mistaken though.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The main purpose is to test whether we save the summary in HMS parameters when the size exceeds the limit. Besides, I also want to test if the limit has changed from 32627 to 4000. That's why I check both.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK got it. |
||
| parameters.remove(TableProperties.CURRENT_SNAPSHOT_SUMMARY); | ||
| spyOps.setSnapshotSummary(parameters, snapshot); | ||
| Assert.assertEquals("The snapshot summary must not be in parameters due to the size limit", 0, parameters.size()); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a way to clear the properties if they are not there anymore in current snapshot (like original code setHmsProperties). For example if the summary becomes suddenly over the length in new snapshot, then it will not override the old summary and would be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the new commit.