Skip to content

Conversation

@thomaschow
Copy link
Contributor

@thomaschow thomaschow commented Sep 19, 2025

This PR fixes a table validity check for the BigQueryMetastoreCatalog introduced in #12808. We need this validity check in order to access metadata tables using the syntax documented in: https://iceberg.apache.org/docs/1.10.0/docs/hive/?h=metadata#querying-metadata-tables. For example inspecting metadata tables. This is similar to the HiveCatalog implementation. Without this check Iceberg will attempt to load a regular table and fail in the BigQueryMetastoreCatalog implementation because the namespace has more than one level.

Added tests as well.

@github-actions github-actions bot added the GCP label Sep 19, 2025
@thomaschow thomaschow changed the title fix: Add table validity check for BigQueryMetastoreCatalog Catalog: Add table validity check for BigQueryMetastoreCatalog Sep 19, 2025
}

@Test
public void testLoadMetadataTableIsCalled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow why this is being tested here, can you elaborate please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to test that loading a metadata table will actually happen with the fix in place, since that's technically a valid case of specifying an invalid table name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am bit confused too, if the goal is that that the metadata table loading works correctly post this fix, shouldnt we enable this test instead ?

@Disabled("BigQuery Metastore does not support multi layer namespaces")
@Test
public void testLoadMetadataTable() {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move the test case there!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why can't re-use the same test ? as if just enable this test from the base class it test if we can read the Files metadata table ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree, we should be able to remove this test method entirely and rely on the testing behavior defined in CatalogTests. The reason this method existed here is so that it could be disabled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, heres the test with the same name in CatalogTests.java

@Test
public void testLoadMetadataTable() {
C catalog = catalog();
TableIdentifier tableIdent = TableIdentifier.of("ns", "tbl");
TableIdentifier metaIdent = TableIdentifier.of("ns", "tbl", "files");
if (requiresNamespaceCreate()) {
catalog.createNamespace(tableIdent.namespace());
}
catalog.buildTable(tableIdent, SCHEMA).create();
Table table = catalog.loadTable(metaIdent);
assertThat(table).isNotNull();
assertThat(table).isInstanceOf(FilesTable.class);
// check that the table metadata can be refreshed
table.refresh();
assertThat(table.name()).isEqualTo(catalog.name() + "." + metaIdent);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good thank you.

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fix itself LGTM but I left a question for the test around metadata loading

@nastra nastra changed the title Catalog: Add table validity check for BigQueryMetastoreCatalog BigQuery: Add table validity check for BigQueryMetastoreCatalog Sep 19, 2025
@nastra
Copy link
Contributor

nastra commented Sep 19, 2025

/cc @talatuyarer

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @thomaschow :)

tchow-zlai added a commit to zipline-ai/chronon that referenced this pull request Sep 19, 2025
## Summary

- Iceberg 1.10's BQMS has a bug that's fixed in:
apache/iceberg#14113. Since this isn't included
in the 1.10 release we'll build the jar locally and include it in
`cloud_gcp` module for now.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- New Features
- Bundled an updated Iceberg BigQuery integration JAR on the classpath
by default.
- Bug Fixes
- Improved identifier handling when namespaces are empty, reducing
errors in BigQuery catalog operations.
- Tests
- Added end-to-end checks for partition parsing, namespace
compatibility, table reachability, and partition spec validation in
BigQuery catalog flows.
- Chores
- Removed the previous external dependency and stopped preloading
additional Dataproc JARs by default.
- Style
  - Minor string formatting cleanup with no behavioral changes.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: thomaschow <[email protected]>

@Test
public void testIsValidIdentifierWithInvalidMultiLevelNamespace() {
TableIdentifier invalidIdentifier =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: all of these can be inlined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, formatting creates multiple lines for a few of the more verbose cases.

@kevinjqliu kevinjqliu merged commit 998eb87 into apache:main Sep 19, 2025
42 checks passed
@kevinjqliu
Copy link
Contributor

Thanks @thomaschow and thanks everyone for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants