Skip to content

Conversation

@talatuyarer
Copy link
Contributor

This PR addresses comments from PR's that introduces initial support for using Google BigQuery as a Metastore Catalog for Apache Iceberg. Unfortunately creator of the PR is on leave. I had to recreate a new PR for that.

Key changes include:

  • Implementation of BigQueryMetaStoreClient and related classes to interact with the BigQuery API for Metastore operations.
  • Addition of BigQueryMetastoreCatalog to provide Iceberg catalog functionality backed by BigQuery.
  • Support for basic catalog operations such as creating/listing/deleting datasets (namespaces) and tables.
  • Integration with GCP credentials and BigQuery client setup.
  • Initial CatalogTests suite and infrastructure for BigQuery Metastore catalog.

@talatuyarer
Copy link
Contributor Author

@nastra Would you like to review a brand new Catalog which uses CatalogTests 😃

@talatuyarer
Copy link
Contributor Author

Thank you all @nastra @ebyhr @kravikumar @gkalra18 for your review. I addressed all your comments feel free to add more :)

@talatuyarer talatuyarer requested a review from nastra April 18, 2025 20:33
…viour some signature access modifiers changes on Classes
@talatuyarer talatuyarer requested a review from danielcweeks May 9, 2025 06:55
@talatuyarer talatuyarer requested a review from nastra May 9, 2025 17:42
Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM once the remaining comments from @danielcweeks have been addressed as I think we need to fix those

@juldrixx
Copy link
Contributor

Is there any thought to make this catalog implementation available in the kafka connect runtime? I guess by adding it here (not sure if it's the only thing to do)

…id redundant API calls when setting/removing dataset properties.
…doc to accurately reflect the new behavior.
Copy link
Contributor

@danielcweeks danielcweeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @talatuyarer, I think this is good to go.

@danielcweeks danielcweeks merged commit 8facdaf into apache:main May 13, 2025
43 checks passed
danielcweeks pushed a commit to danielcweeks/iceberg that referenced this pull request May 14, 2025
@locchipinti
Copy link

Hello @stevenzwu I mention you because I saw you handling the 1.10.0 milestone and I wanted to ask if this PR will be included in the 1.10.0 version
Thanks!

@nastra
Copy link
Contributor

nastra commented Jul 16, 2025

@locchipinti yes this will be shipped with the next release

Fokko pushed a commit to apache/iceberg-python that referenced this pull request Aug 26, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change
This PR brings BigQuery Metastore support to Python after it was merged
into the [Java
implementation](apache/iceberg#12808).

This allows Iceberg catalog functionality to be backed by BigQuery. It
supports creating/deleting/listing namespaces (datasets in BigQuery
terminology), creating/deleting/listing tables, and registering tables.

This is my first PR of size to iceberg-python, so any advice would be
appreciated!

# Are these changes tested?
Integration and unit tests included.

# Are there any user-facing changes?
Introduces a new Catalog type.

<!-- In the case of user-facing changes, please add the changelog label.
-->
github-merge-queue bot pushed a commit to zipline-ai/chronon that referenced this pull request Sep 22, 2025
## Summary

- Upgrade to iceberg 1.10.0 to grab column
[stats](apache/iceberg#10659), and some CVE's:
[CVE](apache/iceberg#13561) (and parquet, avro
transitively), and
[BigQueryMetastoreCatalog](apache/iceberg#12808),
[Google Auth](apache/iceberg#13212).
- Column stats is the key feature here - we rely on extracting the
puffin files and grabbing stats metadata.

<img width="1342" height="461" alt="Screenshot 2025-09-20 at 4 30 35 PM"
src="https://github.com/user-attachments/assets/bc8eeb80-6ff7-4abe-8ffb-a0eebf48bc4e"
/>


## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Per-partition, per-column statistics extraction with optional
persistence to a new data-quality metrics KV store; platform APIs can
produce a metrics-specific KV store.

* **Breaking Changes**
* Extraction API signatures and summary/key formats changed; thrift
summary shapes updated; config token renamed "groupbys" → "group_bys";
/api/summary-series now returns null.

* **Refactor**
* Large-scale test package reorganizations and import consolidations
across the codebase.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- av pr metadata
This information is embedded by the av CLI when creating PRs to track
the status of stacks when using Aviator. Please do not delete or edit
this section of the PR.
```
{"parent":"main","parentHead":"","trunk":"main"}
```
-->

---------

Co-authored-by: thomaschow <[email protected]>
devendra-nr pushed a commit to devendra-nr/iceberg that referenced this pull request Dec 8, 2025
* Initial Commit for BigQuery MetaStore

* Spotless Fix on Core module

* Addressed comments on PR from @nastra

* Addressed comments on PR from @ebyhr and @gkalra18

* Addressed comments on PR from @nastra

* Removed PROJECT_ID from GCPProperties class.

* Removed BIGQUERY_LOCATION from GCPProperties class.

* Addressed @nastra and @amogh-jahagirdar comments

* Changed Catalog initialization, removed TESTING_ENABLED property and removed optional field tests from BigQueryCatalogTests

* Used toTableReference in newTableOps method. Dropped Dataset name from Client Interface's methods, Disabled testLoadMetadataTable test due to multi level namespacing.

* Fixed failed testListNonExistingNamespace

* Fixed assertThat

* Removed BigQueryMetastoreTestUtils class and addressed @nastra's comment on pr

* Addressed comments from nastra and danielcweeks

* Missing changes from previous update

* last   @SuppressWarnings("FormatStringAnnotation")

* last @SuppressWarnings("FormatStringAnnotation")

* Addressed Latest Comments from @danielcweeks about exception types

* Addressed Latest Comments from @danielcweeks about serialVersionUID and interface

* Addressed Latest Comments from @danielcweeks about Listnamespace behaviour some signature access modifiers changes on Classes

* Removed Hive and Hadoop dependencies from BigQuery Catalog

* Removed double condition in listNamespaces. Thank you @nastra

* Optimize Dataset property updates. Moves logic into the client to avoid redundant API calls when setting/removing dataset properties.

* Renamed `filterUnsupportedTables` to `listAllTables` and updated Javadoc to accurately reflect the new behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants