-
Notifications
You must be signed in to change notification settings - Fork 3k
Catalog: Add BigQuery Metastore Catalog Support #12808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@nastra Would you like to review a brand new Catalog which uses CatalogTests 😃 |
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClient.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClient.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClient.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryTableOperations.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/FakeBigQueryMetaStoreClient.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/FakeBigQueryMetaStoreClient.java
Outdated
Show resolved
Hide resolved
bigquery/src/test/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreTestUtils.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetaStoreClientImpl.java
Outdated
Show resolved
Hide resolved
|
Thank you all @nastra @ebyhr @kravikumar @gkalra18 for your review. I addressed all your comments feel free to add more :) |
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreUtils.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
…viour some signature access modifiers changes on Classes
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl.java
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java
Show resolved
Hide resolved
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java
Outdated
Show resolved
Hide resolved
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this LGTM once the remaining comments from @danielcweeks have been addressed as I think we need to fix those
|
Is there any thought to make this catalog implementation available in the kafka connect runtime? I guess by adding it here (not sure if it's the only thing to do) |
…id redundant API calls when setting/removing dataset properties.
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl.java
Outdated
Show resolved
Hide resolved
…doc to accurately reflect the new behavior.
danielcweeks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @talatuyarer, I think this is good to go.
* Initial Commit for BigQuery MetaStore
|
Hello @stevenzwu I mention you because I saw you handling the 1.10.0 milestone and I wanted to ask if this PR will be included in the 1.10.0 version |
|
@locchipinti yes this will be shipped with the next release |
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
This PR brings BigQuery Metastore support to Python after it was merged
into the [Java
implementation](apache/iceberg#12808).
This allows Iceberg catalog functionality to be backed by BigQuery. It
supports creating/deleting/listing namespaces (datasets in BigQuery
terminology), creating/deleting/listing tables, and registering tables.
This is my first PR of size to iceberg-python, so any advice would be
appreciated!
# Are these changes tested?
Integration and unit tests included.
# Are there any user-facing changes?
Introduces a new Catalog type.
<!-- In the case of user-facing changes, please add the changelog label.
-->
## Summary - Upgrade to iceberg 1.10.0 to grab column [stats](apache/iceberg#10659), and some CVE's: [CVE](apache/iceberg#13561) (and parquet, avro transitively), and [BigQueryMetastoreCatalog](apache/iceberg#12808), [Google Auth](apache/iceberg#13212). - Column stats is the key feature here - we rely on extracting the puffin files and grabbing stats metadata. <img width="1342" height="461" alt="Screenshot 2025-09-20 at 4 30 35 PM" src="https://github.com/user-attachments/assets/bc8eeb80-6ff7-4abe-8ffb-a0eebf48bc4e" /> ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Per-partition, per-column statistics extraction with optional persistence to a new data-quality metrics KV store; platform APIs can produce a metrics-specific KV store. * **Breaking Changes** * Extraction API signatures and summary/key formats changed; thrift summary shapes updated; config token renamed "groupbys" → "group_bys"; /api/summary-series now returns null. * **Refactor** * Large-scale test package reorganizations and import consolidations across the codebase. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- av pr metadata This information is embedded by the av CLI when creating PRs to track the status of stacks when using Aviator. Please do not delete or edit this section of the PR. ``` {"parent":"main","parentHead":"","trunk":"main"} ``` --> --------- Co-authored-by: thomaschow <[email protected]>
* Initial Commit for BigQuery MetaStore * Spotless Fix on Core module * Addressed comments on PR from @nastra * Addressed comments on PR from @ebyhr and @gkalra18 * Addressed comments on PR from @nastra * Removed PROJECT_ID from GCPProperties class. * Removed BIGQUERY_LOCATION from GCPProperties class. * Addressed @nastra and @amogh-jahagirdar comments * Changed Catalog initialization, removed TESTING_ENABLED property and removed optional field tests from BigQueryCatalogTests * Used toTableReference in newTableOps method. Dropped Dataset name from Client Interface's methods, Disabled testLoadMetadataTable test due to multi level namespacing. * Fixed failed testListNonExistingNamespace * Fixed assertThat * Removed BigQueryMetastoreTestUtils class and addressed @nastra's comment on pr * Addressed comments from nastra and danielcweeks * Missing changes from previous update * last @SuppressWarnings("FormatStringAnnotation") * last @SuppressWarnings("FormatStringAnnotation") * Addressed Latest Comments from @danielcweeks about exception types * Addressed Latest Comments from @danielcweeks about serialVersionUID and interface * Addressed Latest Comments from @danielcweeks about Listnamespace behaviour some signature access modifiers changes on Classes * Removed Hive and Hadoop dependencies from BigQuery Catalog * Removed double condition in listNamespaces. Thank you @nastra * Optimize Dataset property updates. Moves logic into the client to avoid redundant API calls when setting/removing dataset properties. * Renamed `filterUnsupportedTables` to `listAllTables` and updated Javadoc to accurately reflect the new behavior.
This PR addresses comments from PR's that introduces initial support for using Google BigQuery as a Metastore Catalog for Apache Iceberg. Unfortunately creator of the PR is on leave. I had to recreate a new PR for that.
Key changes include:
BigQueryMetaStoreClientand related classes to interact with the BigQuery API for Metastore operations.BigQueryMetastoreCatalogto provide Iceberg catalog functionality backed by BigQuery.