Skip to content

feat(ibis): support listing tables from multiple datasets or projects for BigQuery#1391

Merged
douenergy merged 2 commits intoCanner:mainfrom
goldmedal:feat/support-multiple-dataset-bigquery
Dec 31, 2025
Merged

feat(ibis): support listing tables from multiple datasets or projects for BigQuery#1391
douenergy merged 2 commits intoCanner:mainfrom
goldmedal:feat/support-multiple-dataset-bigquery

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Dec 17, 2025

Metadata v3 API

Listing Table

  • POST /v3/connector/bigquery/metadata/tables
  • Request body:
      {
          "connectionInfo": {
              "bigquery_type": "project",
              "billing_project_id": "wrenai",
              "region": "asia-east1",
              "credentials": "..."
          },
          "filterInfo": {
              "projects": [
                  {
                      "projectId": "wrenai",
                      "datasetIds": [
                          "tpch_tiny",
                          "tpch_sf1"
                      ]
                  },
                  {
                      "projectId": "bigquery-public-data",
                      "datasetIds": [
                          "austin_311"
                      ]
                  }
              ]
          }
      }
  • Sample response
[
  {
    "name": "bigquery-public-data.austin_311.311_service_requests",
    "columns": [
      {
        "name": "city",
        "type": "STRING",
        "notNull": false,
        "description": null,
        "properties": {
          "column_order": 12
        },
        "nestedColumns": null
      },
     ...
    ],
    "description": null,
    "properties": {
    },
    "primaryKey": ""
  },
  {
    "name": "wrenai.tpch_tiny_us.orders",
    "columns": [
          ...
    ],
    "description": null,
    "properties": {
      "schema": "tpch_tiny_us",
      "catalog": "wrenai",
      "table": "orders",
      "path": null
    },
    "primaryKey": ""
  }
]

List Schema

  • POST /v3/connector/bigquery/metadata/schemas
  • Request body:
      {
          "connectionInfo": {
              "bigquery_type": "project",
              "billing_project_id": "wrenai",
              "region": "asia-east1",
              "credentials": "..."
          },
          "filterInfo": {
              "projects": [
                  {
                      "projectId": "wrenai"
                  },
                  {
                      "projectId": "bigquery-public-data"
                  }
              ]
          }
      }
  • Sample response:
[
  {
    "name": "wrenai",
    "schemas": [
      "rudderstack_setup_test",
      "alex_tide_issue"
    ]
  },
  {
    "name": "bigquery-public-data",
    "schemas": [
      "america_health_rankings",
      "austin_311",
      "austin_bikeshare",
      "austin_crime",
      ...
     ]
  }

BigQuery Filter Info for listing

{
    "projects": [
        {
            "projectId": "wrenai",
            "datasetIds": ["tpch_tiny"]
        },
        {
            "projectId": "bigquery-public-data",
            "datasetIds": ["..."]
        }
    ]
}

It can include multiple projects. Each object has projectId and datasetIds fields.

  • projectId is required for all API.
  • datasetIds is required for listing tables.

New Connection Info for BigQuery

Introduce BigQueryProjectConnectionInfo for querying multiple datasets:

{
    "bigquery_type": "project",
    "billing_project_id": "wrenai",
    "region": "asia-east1",
    "credentials": "..."
}

For BigQuery, the most important thing is the region. BigQuery only allows joining tables that are located in the same region, even if they might belong to different projects (the service account should have the corresponding permission).

BigQueryProjectConnectionInfo fields

  • billing_project_id: the billing project id of your BigQuery connection
  • region: the region of your BigQuery connection
  • credentials: Base64 encode credentials.json

For listing public datasets

  • Because we won't have the permission to access the information schema of the bigquery-public-data project. We use the Python SDK of BigQuery to list the datasets of the public data. However, the listing API of the SDK can't apply a filter for the region, and the result doesn't include the region info. We can't ensure the region of the public data when listing datasets.
  • Therefore, when listing the tables of the public data, we will check if the dataset's data matches the region of the connection info.
  • It may cause some performance issues if the number of public datasets is large because we need to use the dataset API of BigQuery SDK to check the region of each dataset.

Summary by CodeRabbit

  • New Features

    • Multi-project and multi-dataset BigQuery support, including cross-project queries.
    • New metadata endpoints to list schemas and tables with filtering and limits (v3).
    • Option to use project-level BigQuery connection for appropriate endpoints.
  • Bug Fixes

    • Stricter validation and clearer errors for BigQuery metadata (projects, datasets, regions).
    • Preserved prior behavior on unsupported project-level metadata requests (v2).
  • Tests

    • Added tests covering cross-project queries and metadata listing scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions github-actions bot added bigquery ibis python Pull requests that update Python code labels Dec 17, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 17, 2025

Walkthrough

This PR splits BigQuery connection info into dataset/project variants, propagates the discriminator union across DTOs and routers, adds multi-project/multi-dataset metadata APIs and v3 metadata endpoints, and adapts connectors, utils, and tests to handle BigQuery-specific filtering, limits, and region/permission validation.

Changes

Cohort / File(s) Summary
BigQuery Connection Model
ibis-server/app/model/__init__.py
Added BigQueryDatasetConnectionInfo and BigQueryProjectConnectionInfo with discriminator bigquery_type, get_billing_project_id() signatures, __hash__ implementations, and BigQueryConnectionUnion. Updated QueryBigQueryDTO.connection_info and ConnectionInfo union.
Connector Implementation
ibis-server/app/model/connector.py
BigQueryConnector now constructs BigQuery client with project=connection_info.get_billing_project_id() and sets client.default_query_job_config including job_timeout_ms.
Data Source Validation
ibis-server/app/model/data_source.py
Replaced BigQueryConnectionInfo import with the two variants and added conditional validation in _build_connection_info using the bigquery_type discriminator. get_bigquery_connection signature updated to accept dataset connection info.
BigQuery Metadata Implementation
ibis-server/app/model/metadata/bigquery.py
Refactored to support multi-project/multi-dataset flows: BigQueryMetadata now accepts dataset connection info, implements get_table_list, get_schema_list, get_constraints, column/table helpers, per-project SQL generation, region validation, public dataset handling, and error propagation via WrenError.
Metadata DTOs
ibis-server/app/model/metadata/dto.py
Added V2MetadataDTO, FilterInfo, BigQueryFilterInfo, ProjectDatasets, Catalog, extended MetadataDTO with table_limit and filter_info, and added get_filter_info() helper.
Base Metadata API
ibis-server/app/model/metadata/metadata.py
Removed ABC inheritance; added concrete methods raising WrenError for get_table_list, get_schema_list, get_constraints, and get_version. Imports updated for ErrorCode, WrenError, Catalog, FilterInfo.
Utility Timeout Executors
ibis-server/app/util.py
execute_get_table_list_with_timeout now accepts optional filter_info and limit; new execute_get_schema_list_with_timeout added. Both detect BigQueryMetadata and call BigQuery-specific methods when applicable.
v2 Router Changes
ibis-server/app/routers/v2/connector.py
Switched endpoints to accept V2MetadataDTO, imported BigQueryProjectConnectionInfo, and added runtime guard to reject project-level BigQuery connection info (returns WrenError/422).
v3 Router Additions
ibis-server/app/routers/v3/connector.py
Added POST /metadata/tables and /metadata/schemas endpoints, integrated MetadataFactory, get_filter_info(), and timeout executors to fetch tables/schemas with filter and limit support.
Local Run Tool
ibis-server/tools/query_local_run.py
Replaced BigQueryConnectionInfo with BigQueryDatasetConnectionInfo validation and switched to BigQueryConnector for connection creation.
Tests — v2 & v3
ibis-server/tests/routers/v2/..., ibis-server/tests/routers/v3/...
Added v2 test to assert project-level BigQuery connection is rejected; added v3 fixtures and multiple metadata/query tests for multi-project, multi-dataset, public datasets, and validation error cases.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Router as v3 Router
    participant Factory as MetadataFactory
    participant Metadata as BigQueryMetadata
    participant Util as Timeout Executor
    participant GCP as BigQuery API

    Client->>Router: POST /{ds}/metadata/tables (MetadataDTO + filterInfo, limit)
    Router->>Router: build connection_info from headers
    Router->>Factory: MetadataFactory.get_metadata(connection_info)
    Factory->>Metadata: instantiate BigQueryMetadata(connection_info)
    Router->>Router: compute filter_info via get_filter_info(dto, data_source)
    Router->>Util: execute_get_table_list_with_timeout(metadata, filter_info, limit)
    Util->>Util: detect BigQueryMetadata type
    Util->>Metadata: call get_table_list(filter_info, limit)
    Metadata->>GCP: issue per-project/dataset metadata queries (dynamic SQL)
    GCP-->>Metadata: return table/column metadata
    Metadata->>Metadata: transform rows -> Table/Column models
    Metadata-->>Util: return list[Table]
    Util-->>Router: return results (with timeout handling)
    Router-->>Client: respond 200 list[Table]
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Focus review on:
    • ibis-server/app/model/metadata/bigquery.py — dynamic SQL generation, region validation, nested column handling.
    • ibis-server/app/model/__init__.py — discriminator union, hashing, and DTO type uses.
    • Router changes in v2/v3 — ensure error guards and DTO compatibility.
    • ibis-server/app/util.py — timeout wrappers and BigQuery-detection logic.
    • New/updated tests — cross-project scenarios and error expectations.

Possibly related PRs

Suggested reviewers

  • douenergy
  • wwwy3y3

Poem

🐰 I hopped through projects far and wide,

Dataset and project now side by side,
Tables, schemas bloom in multi-project light,
Routers and metadata sing through the night,
A carrot cheer for queries taking flight! 🎋

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.41% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main feature: adding support for listing tables from multiple datasets/projects in BigQuery, which is the primary change across all modified files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79db9e4 and 9a1743f.

📒 Files selected for processing (3)
  • ibis-server/app/model/__init__.py (4 hunks)
  • ibis-server/app/model/metadata/bigquery.py (2 hunks)
  • ibis-server/app/routers/v2/connector.py (6 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-02-04T12:08:44.141Z
Learnt from: goldmedal
Repo: Canner/wren-engine PR: 1053
File: ibis-server/app/model/__init__.py:187-196
Timestamp: 2025-02-04T12:08:44.141Z
Learning: In GcsFileConnectionInfo, both secret-based (key_id/secret_key) and credentials-based (service account) authentication methods are required fields.

Applied to files:

  • ibis-server/app/model/__init__.py
🧬 Code graph analysis (2)
ibis-server/app/model/__init__.py (2)
ibis-server/app/model/error.py (2)
  • ErrorCode (19-38)
  • WrenError (65-113)
ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)
  • connection_info (25-30)
ibis-server/app/model/metadata/bigquery.py (3)
ibis-server/app/model/__init__.py (5)
  • BigQueryDatasetConnectionInfo (120-131)
  • BigQueryProjectConnectionInfo (134-154)
  • get_billing_project_id (113-117)
  • get_billing_project_id (127-128)
  • get_billing_project_id (144-145)
ibis-server/app/model/error.py (2)
  • ErrorCode (19-38)
  • WrenError (65-113)
ibis-server/app/model/metadata/dto.py (9)
  • BigQueryFilterInfo (24-25)
  • Catalog (114-116)
  • Column (87-93)
  • Constraint (125-131)
  • ConstraintType (119-122)
  • FilterInfo (14-15)
  • ProjectDatasets (28-30)
  • Table (106-111)
  • TableProperties (96-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: ci
  • GitHub Check: ci
🔇 Additional comments (19)
ibis-server/app/model/__init__.py (6)

8-9: LGTM: Error handling imports added.

These imports are necessary for the new error handling patterns introduced in BigQuery connection variants.


113-118: LGTM: Abstract method pattern using WrenError.

Using WrenError with NOT_IMPLEMENTED is an appropriate pattern for enforcing implementation in subclasses while providing clear error messages at runtime.


120-132: LGTM: BigQueryDatasetConnectionInfo is well-defined.

The class correctly implements:

  • Discriminator field for union type
  • get_billing_project_id() returning the project_id
  • __hash__() using all relevant secret fields

134-154: Past issue resolved: __hash__ now references only defined attributes.

The previous review flagged that __hash__ referenced undefined data_project_ids and dataset_ids. The current implementation correctly uses only region, billing_project_id, and credentials—all of which are defined in this class.


157-160: LGTM: Discriminated union properly configured.

The BigQueryConnectionUnion correctly uses Pydantic's discriminator pattern with bigquery_type and sets a sensible default of "dataset" for backward compatibility.


33-33: LGTM: Connection type unions updated consistently.

Both QueryBigQueryDTO.connection_info and the global ConnectionInfo union have been updated to use the new BigQuery connection variants, maintaining consistency across the codebase.

Also applies to: 620-621

ibis-server/app/routers/v2/connector.py (4)

21-21: LGTM: Required imports added for BigQuery project-level guards.

The imports enable proper type checking and error handling for the new BigQuery connection variants.

Also applies to: 29-34


267-267: LGTM: DTO type updated to V2MetadataDTO.

The parameter type change from MetadataDTO to V2MetadataDTO aligns with the v2/v3 API split for metadata operations.

Also applies to: 296-296, 323-323


278-285: LGTM: Appropriate guard for project-level connections.

The check correctly rejects BigQueryProjectConnectionInfo in the v2 API with a clear error message directing users to the v3 API, while maintaining backward compatibility for dataset-level connections via MetadataFactory.


307-313: Past issue resolved: Error message now correctly references constraints endpoint.

The error message now says "metadata constraints retrieval," which is appropriate for the get_constraints endpoint. The previous review concern about a copy-paste error from get_table_list has been addressed.

ibis-server/app/model/metadata/bigquery.py (9)

1-1: LGTM: Import updates support multi-project metadata.

The new imports enable proper exception handling (Forbidden, NotFound), connection variant support, error propagation, and BigQuery-specific filtering.

Also applies to: 4-22


43-43: LGTM: Public dataset constant defined.

The BIGQUERY_PUBLIC_DATASET_PROJECT_ID constant enables special handling for public datasets that require different access patterns.


47-49: Signature change restricts to dataset-level connections.

The __init__ signature now requires BigQueryDatasetConnectionInfo instead of the base BigQueryConnectionInfo. This aligns with the PR's goal of distinguishing dataset vs. project-level metadata handling.


127-187: LGTM: Result processing handles multi-project scenarios correctly.

The logic properly:

  • Unions multiple project queries with optional limit
  • Determines when to include catalog/schema in table names based on project/dataset counts
  • Processes nested columns and builds the table hierarchy
  • Provides clear error messaging when no tables are found

189-197: LGTM: Region validation for public datasets.

The _validate_dataset_region method correctly validates that public dataset regions match the connection region, which is essential for cross-project join compatibility as noted in the PR objectives.


199-271: LGTM: Schema listing supports multiple projects.

The get_schema_list implementation:

  • Handles both dataset and project-level connections appropriately
  • Uses region qualifiers for project-level queries
  • Includes special handling for public datasets via the BigQuery SDK
  • Groups results by catalog with proper error handling

273-338: LGTM: Helper methods for table/column transformation.

The helper methods (get_column, get_table, is_root_column, has_nested_columns, _format_compact_table_name, find_parent_column) provide clean abstractions for processing BigQuery's nested column structures and generating consistent table representations.


340-417: LGTM: Constraints retrieval handles access errors gracefully.

The get_constraints method:

  • Iterates through project datasets with appropriate qualifiers
  • Catches Forbidden/NotFound exceptions and continues (logging warnings)
  • Properly formats constraint names and references based on multi-project/dataset context
  • Returns a comprehensive list of foreign key constraints

419-476: LGTM: Helper methods provide robust validation.

The helper methods properly:

  • _get_project_datasets: Returns appropriate project/dataset structures for both connection types
  • _get_schema_qualifier: Constructs correct qualifiers based on connection type
  • _validate_project_datasets_for_table_list: Performs defensive validation with clear error messages

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ibis-server/app/routers/v2/connector.py (1)

321-330: Missing guard for BigQueryProjectConnectionInfo.

The get_db_version endpoint accepts V2MetadataDTO but lacks the guard that get_table_list and get_constraints have. If BigQueryProjectConnectionInfo is not supported by v2, this endpoint should also reject it for consistency.

 async def get_db_version(
     data_source: DataSource,
     dto: V2MetadataDTO,
     headers: Annotated[Headers, Depends(get_wren_headers)] = None,
 ) -> str:
     connection_info = data_source.get_connection_info(
         dto.connection_info, dict(headers)
     )
+    if isinstance(connection_info, BigQueryProjectConnectionInfo):
+        raise WrenError(
+            ErrorCode.INVALID_CONNECTION_INFO,
+            "BigQuery project-level connection info is only supported by v3 API for metadata version retrieval.",
+        )
     metadata = MetadataFactory.get_metadata(data_source, connection_info)
     return await execute_get_version_with_timeout(metadata)
🧹 Nitpick comments (4)
ibis-server/app/model/metadata/dto.py (1)

33-36: Consider adding validation for empty or malformed filter info.

The function directly passes info to BigQueryFilterInfo(**info) without validation. If info is None or missing required fields, this will raise a Pydantic validation error, which may be acceptable, but consider whether you want to handle edge cases more gracefully.

ibis-server/tests/routers/v3/connector/bigquery/test_query.py (1)

2-4: Minor import cleanup.

The split function from re is only used once (line 757, 777). Consider using re.split directly inline instead of importing just split.

ibis-server/app/model/__init__.py (1)

113-117: Consider using abstract base class pattern.

The base class method raises an exception instead of being abstract. While functional, using ABC with @abstractmethod would provide clearer intent and IDE support.

ibis-server/app/model/metadata/bigquery.py (1)

259-270: Potential performance concern with public datasets.

Listing all datasets from bigquery-public-data via list_datasets could be slow as noted in the PR objectives. Consider adding a warning log or documentation about this behavior.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 969dd82 and 79db9e4.

📒 Files selected for processing (13)
  • ibis-server/app/model/__init__.py (4 hunks)
  • ibis-server/app/model/connector.py (1 hunks)
  • ibis-server/app/model/data_source.py (3 hunks)
  • ibis-server/app/model/metadata/bigquery.py (2 hunks)
  • ibis-server/app/model/metadata/dto.py (2 hunks)
  • ibis-server/app/model/metadata/metadata.py (1 hunks)
  • ibis-server/app/routers/v2/connector.py (6 hunks)
  • ibis-server/app/routers/v3/connector.py (3 hunks)
  • ibis-server/app/util.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_bigquery.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/bigquery/test_query.py (3 hunks)
  • ibis-server/tools/query_local_run.py (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-02-06T07:58:58.830Z
Learnt from: goldmedal
Repo: Canner/wren-engine PR: 1055
File: ibis-server/app/model/connector.py:152-157
Timestamp: 2025-02-06T07:58:58.830Z
Learning: In BigQueryConnector, credentials with additional scopes (drive and cloud-platform) should only be created in the retry path when handling empty results with special types, not during initialization, to maintain lazy initialization.

Applied to files:

  • ibis-server/app/model/connector.py
📚 Learning: 2025-02-04T12:08:44.141Z
Learnt from: goldmedal
Repo: Canner/wren-engine PR: 1053
File: ibis-server/app/model/__init__.py:187-196
Timestamp: 2025-02-04T12:08:44.141Z
Learning: In GcsFileConnectionInfo, both secret-based (key_id/secret_key) and credentials-based (service account) authentication methods are required fields.

Applied to files:

  • ibis-server/app/model/__init__.py
🧬 Code graph analysis (9)
ibis-server/tests/routers/v2/connector/test_bigquery.py (1)
ibis-server/tests/conftest.py (1)
  • client (18-23)
ibis-server/app/util.py (3)
ibis-server/app/model/metadata/bigquery.py (3)
  • BigQueryMetadata (46-522)
  • get_table_list (51-187)
  • get_schema_list (199-271)
ibis-server/app/model/metadata/dto.py (1)
  • FilterInfo (14-15)
ibis-server/app/model/metadata/metadata.py (3)
  • get_table_list (10-13)
  • Metadata (6-24)
  • get_schema_list (21-24)
ibis-server/app/model/data_source.py (1)
ibis-server/app/model/__init__.py (2)
  • BigQueryDatasetConnectionInfo (120-131)
  • BigQueryProjectConnectionInfo (134-156)
ibis-server/app/routers/v3/connector.py (4)
ibis-server/app/model/metadata/dto.py (3)
  • Catalog (114-116)
  • MetadataDTO (18-21)
  • Table (106-111)
ibis-server/app/model/metadata/factory.py (2)
  • MetadataFactory (43-62)
  • get_metadata (45-62)
ibis-server/app/util.py (2)
  • execute_get_schema_list_with_timeout (377-398)
  • execute_get_table_list_with_timeout (353-374)
ibis-server/app/model/metadata/metadata.py (2)
  • get_table_list (10-13)
  • get_schema_list (21-24)
ibis-server/app/model/connector.py (2)
ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)
  • connection_info (25-30)
ibis-server/app/model/__init__.py (3)
  • get_billing_project_id (113-117)
  • get_billing_project_id (127-128)
  • get_billing_project_id (144-145)
ibis-server/tools/query_local_run.py (2)
ibis-server/app/model/__init__.py (1)
  • BigQueryDatasetConnectionInfo (120-131)
ibis-server/app/model/connector.py (1)
  • BigQueryConnector (506-549)
ibis-server/app/model/__init__.py (1)
ibis-server/app/model/error.py (2)
  • ErrorCode (19-38)
  • WrenError (65-113)
ibis-server/app/model/metadata/dto.py (1)
ibis-server/app/model/data_source.py (1)
  • DataSource (63-224)
ibis-server/app/model/metadata/metadata.py (2)
ibis-server/app/model/error.py (2)
  • ErrorCode (19-38)
  • WrenError (65-113)
ibis-server/app/model/metadata/dto.py (4)
  • Catalog (114-116)
  • Constraint (125-131)
  • FilterInfo (14-15)
  • Table (106-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: ci
  • GitHub Check: ci
🔇 Additional comments (30)
ibis-server/tests/routers/v2/connector/test_bigquery.py (1)

395-411: LGTM! Well-structured test for v2 project connection guard.

The test correctly validates that v2 API rejects BigQuery project-level connection info and returns the expected error message. The connection info structure properly matches BigQueryProjectConnectionInfo with all required fields.

ibis-server/app/model/connector.py (1)

524-530: LGTM! Project-scoped BigQuery client initialization.

The explicit project parameter from connection_info.get_billing_project_id() and default QueryJobConfig with timeout are correctly configured. This aligns with the new BigQuery connection info variants that distinguish dataset vs project contexts.

ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)

33-40: LGTM! Project-level connection fixture properly configured.

The new fixture provides project-level BigQuery connection info with appropriate defaults and environment variable sourcing, complementing the existing dataset-level fixture.

ibis-server/tools/query_local_run.py (2)

18-19: Updated imports for BigQuery dataset connection handling.

The imports correctly reflect the new BigQuery connection type system with BigQueryDatasetConnectionInfo and BigQueryConnector.


94-95: LGTM! BigQuery connection creation updated to use new types.

The code correctly uses BigQueryDatasetConnectionInfo.model_validate_json and BigQueryConnector instead of the previous generic types, aligning with the new dataset/project connection distinction.

ibis-server/app/model/data_source.py (3)

19-20: Updated imports for BigQuery connection variants.

The imports correctly bring in the new BigQueryDatasetConnectionInfo and BigQueryProjectConnectionInfo types that enable dataset vs project-level connection handling.


156-158: LGTM! Conditional validation for BigQuery connection types.

The logic correctly distinguishes between project-level and dataset-level connections based on the bigquery_type field, routing to the appropriate connection info class.


318-318: Function signature updated for dataset-specific connections.

The get_bigquery_connection signature correctly specifies BigQueryDatasetConnectionInfo, as this function is specifically for dataset-level Ibis backend connections (project-level connections are handled separately via BigQueryConnector).

ibis-server/app/routers/v3/connector.py (3)

35-36: New imports for metadata operations.

The imports correctly bring in the necessary DTOs (Catalog, MetadataDTO, Table), helper functions (get_filter_info, MetadataFactory), and timeout executors for the new metadata endpoints.

Also applies to: 45-46


556-579: LGTM! Well-structured table metadata endpoint.

The endpoint follows the established pattern:

  • Proper tracing with span
  • Builds connection info with headers
  • Uses MetadataFactory for metadata retrieval
  • Parses filter info from DTO
  • Executes with timeout control

The implementation is clean and consistent with other v3 endpoints.


582-605: LGTM! Schema metadata endpoint mirrors table endpoint pattern.

The schema list endpoint follows the same clean pattern as the table endpoint, ensuring consistency in the v3 metadata API design.

ibis-server/app/util.py (3)

45-46: New imports for BigQuery-aware metadata handling.

The imports bring in BigQueryMetadata and FilterInfo to enable conditional execution paths for BigQuery-specific metadata operations.


353-374: LGTM! BigQuery-aware table list execution with backward compatibility.

The function correctly:

  • Accepts optional filter_info and limit parameters
  • Conditionally invokes BigQuery-specific get_table_list when metadata is BigQueryMetadata
  • Falls back to generic path for other metadata types
  • Maintains backward compatibility for non-BigQuery data sources

377-398: LGTM! Schema list execution mirrors table list pattern.

The new function follows the same clean pattern as execute_get_table_list_with_timeout, ensuring consistency in BigQuery-aware metadata retrieval.

ibis-server/app/model/metadata/metadata.py (4)

2-3: Updated imports for error handling and new DTOs.

The imports correctly bring in ErrorCode, WrenError for proper error raising, and new DTOs (Catalog, FilterInfo) for the expanded metadata interface.


6-6: Class changed from ABC to concrete with NotImplemented methods.

The removal of ABC inheritance is appropriate since all methods now have concrete implementations that raise WrenError(ErrorCode.NOT_IMPLEMENTED). This provides better error messages than abstract method errors.


10-13: New get_table_list method with filter and limit support.

The method signature correctly accepts optional filter_info and limit parameters, enabling BigQuery's multi-project/dataset filtering while maintaining a clean base interface for subclasses to override.


21-24: New get_schema_list method completes metadata interface.

The schema list method follows the same pattern as get_table_list, providing consistent filtering capabilities across metadata operations.

ibis-server/app/routers/v2/connector.py (1)

20-34: LGTM!

The imports are well-organized and properly structured for the new BigQuery connection type handling and error management.

ibis-server/app/model/metadata/dto.py (3)

10-11: LGTM!

Clean DTO definition for v2 metadata endpoints that correctly uses the field alias.


24-30: LGTM!

The BigQueryFilterInfo and ProjectDatasets models are well-structured for the multi-project/dataset filtering use case.


114-116: LGTM!

Simple and clean model for representing catalog/schema metadata.

ibis-server/tests/routers/v3/connector/bigquery/test_query.py (4)

168-186: LGTM!

Good addition to verify multi-dataset connection info works with the existing manifest.


189-254: LGTM!

Excellent test for cross-project dataset queries, including reference to a public dataset. The test validates both the response structure and data integrity.


694-735: LGTM!

Good coverage for schema listing with assertions for expected schemas and handling of public datasets.


862-877: Duplicate test case.

This block (lines 862-877) is identical to lines 845-860, testing the same "empty projectId" scenario twice. Remove the duplicate.

     assert response.status_code == 422
     assert "project id should not be empty" in response.text

-    response = await client.post(
-        url=f"{base_url}/metadata/tables",
-        json={
-            "connectionInfo": project_connection_info,
-            "filterInfo": {
-                "projects": [
-                    {
-                        "projectId": "",
-                    }
-                ],
-            },
-        },
-    )
-
-    assert response.status_code == 422
-    assert "project id should not be empty" in response.text
-
     response = await client.post(

Likely an incorrect or invalid review comment.

ibis-server/app/model/__init__.py (2)

120-131: LGTM!

Well-structured dataset connection info with proper field definitions and hash implementation.


159-162: LGTM!

Good use of Pydantic's discriminated union pattern for type-safe connection info handling.

ibis-server/app/model/metadata/bigquery.py (2)

189-197: LGTM!

Good validation to ensure dataset region matches the connection region for cross-project queries.


468-493: LGTM!

Comprehensive validation for project and dataset IDs with clear error messages.

@goldmedal
Copy link
Copy Markdown
Contributor Author

BigQuery has tested locally:

poetry run pytest -m 'bigquery'
=================================================================================================== test session starts ===================================================================================================
platform darwin -- Python 3.11.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /Users/jax/git/wren-engine/ibis-server
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 409 items / 356 deselected / 53 selected                                                                                                                                                                        

tests/routers/v2/connector/test_bigquery.py ...................                                                                                                                                                     [ 35%]
tests/routers/v3/connector/bigquery/test_functions.py ............                                                                                                                                                  [ 58%]
tests/routers/v3/connector/bigquery/test_query.py ......................

@douenergy douenergy merged commit 8ca21f5 into Canner:main Dec 31, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants