Skip to content

fix(ibis): missing to use BigQuery SDK for dry run#1381

Merged
douenergy merged 1 commit intoCanner:mainfrom
goldmedal:fix/bigquery-connector
Dec 5, 2025
Merged

fix(ibis): missing to use BigQuery SDK for dry run#1381
douenergy merged 1 commit intoCanner:mainfrom
goldmedal:fix/bigquery-connector

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Nov 21, 2025

Summary by CodeRabbit

  • Refactor
    • Restructured database connector architecture to improve stability and ensure consistent behavior across all supported backends.
    • Enhanced connection lifecycle management with improved resource cleanup and standardized error handling.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels Nov 21, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

The PR refactors the connector architecture by introducing an abstract interface layer (ConnectorABC and IbisConnector), replacing SimpleConnector with a formal ABC-based hierarchy. All concrete connectors now inherit from this abstraction, with consistent query, dry_run, and close method implementations across all backends.

Changes

Cohort / File(s) Summary
Abstract Interface Layer
ibis-server/app/model/connector.py
Added ConnectorABC abstract base class with abstract methods query, dry_run, and close. Added IbisConnector concrete implementation of ConnectorABC to serve as the new base for specific backends. Removed SimpleConnector class.
Connector Implementations
ibis-server/app/model/connector.py
Updated all connector classes (PostgresConnector, MSSqlConnector, CannerConnector, BigQueryConnector, DuckDBConnector, RedshiftConnector, DatabricksConnector) to inherit from IbisConnector or ConnectorABC. Added/standardized close() and dry_run() methods across connectors with explicit error handling and logging. BigQueryConnector refactored to store client in self.connection and reworked query delegation.
Connector Orchestration
ibis-server/app/model/connector.py
Modified Connector.__init__ fallback logic to instantiate IbisConnector instead of SimpleConnector for unspecified data sources. Updated Connector.query wrapper to consistently delegate through the new interface.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • BigQueryConnector changes: Significant refactoring with behavioral modifications (self.connection management, dry-run configuration, error logging) requires verification that client lifecycle is managed correctly
  • Interface consistency verification: Ensure all seven connector implementations correctly implement abstract methods and maintain backward compatibility
  • Fallback behavior: Validate that IbisConnector instantiation in Connector.__init__ handles all expected data source scenarios that SimpleConnector previously handled

Possibly related PRs

Suggested labels

bigquery, ibis, python

Suggested reviewers

  • wwwy3y3
  • douenergy

Poem

🐰 A hierarchy springs from chaos into form,
Connectors unite beneath abstraction's warm
Embrace—query, dry_run, and close in line,
Where BigQuery's client and Postgres align.
SimpleConnector bows, IbisConnector shines!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title focuses on a specific BigQuery SDK dry run issue, but the PR makes extensive refactoring changes including introducing abstract interfaces (ConnectorABC, IbisConnector) and restructuring all connector hierarchies. Update the title to reflect the main scope: something like 'refactor(ibis): introduce connector abstract interface and standardize connector hierarchy' would better represent the primary changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 26.09% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@goldmedal
Copy link
Copy Markdown
Contributor Author

BigQuery is tested locally

poetry run pytest -m 'bigquery'
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.11.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /Users/jax/git/wren-engine/ibis-server
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 401 items / 354 deselected / 47 selected                                                                                                                                                                

tests/routers/v2/connector/test_bigquery.py ..................                                                                                                                                              [ 38%]
tests/routers/v3/connector/bigquery/test_functions.py ............                                                                                                                                          [ 63%]
tests/routers/v3/connector/bigquery/test_query.py .................      

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ibis-server/app/model/connector.py (1)

401-415: Remove duplicate _handle_pyarrow_unsupported_type method.

This method is identical to the inherited implementation from IbisConnector. The duplication adds maintenance burden without providing any Canner-specific behavior.

Apply this diff to remove the duplicate:

-    def _handle_pyarrow_unsupported_type(self, ibis_table: Table, **kwargs) -> Table:
-        result_table = ibis_table
-        for name, dtype in ibis_table.schema().items():
-            if isinstance(dtype, Decimal):
-                # Round decimal columns to a specified scale
-                result_table = self._round_decimal_columns(
-                    result_table=result_table, col_name=name, **kwargs
-                )
-            elif isinstance(dtype, UUID):
-                # Convert UUID to string for compatibility
-                result_table = self._cast_uuid_columns(
-                    result_table=result_table, col_name=name
-                )
-
-        return result_table
-
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51a622f and 08b30a9.

📒 Files selected for processing (1)
  • ibis-server/app/model/connector.py (10 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-18T02:23:34.040Z
Learnt from: goldmedal
Repo: Canner/wren-engine PR: 1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object without requiring conn.register().

Applied to files:

  • ibis-server/app/model/connector.py
📚 Learning: 2025-06-18T02:23:34.040Z
Learnt from: goldmedal
Repo: Canner/wren-engine PR: 1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object via its "replacement scan" mechanism that recognizes Python variables referencing Arrow objects as SQL tables. No conn.register() call is required.

Applied to files:

  • ibis-server/app/model/connector.py
🧬 Code graph analysis (1)
ibis-server/app/model/connector.py (3)
ibis-server/app/routers/v3/connector.py (1)
  • query (61-233)
ibis-server/wren/session/__init__.py (2)
  • sql (37-52)
  • dry_run (138-145)
ibis-server/app/model/data_source.py (1)
  • DataSource (62-221)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: ci
  • GitHub Check: ci
🔇 Additional comments (7)
ibis-server/app/model/connector.py (7)

183-195: LGTM! Clean abstract interface.

The ConnectorABC interface properly defines the contract for all connector implementations with the essential methods: query, dry_run, and close.


197-274: LGTM! Solid base implementation.

The IbisConnector provides a well-structured base implementation with proper lifecycle management, PyArrow type compatibility handling, and comprehensive close logic with multiple fallback strategies.


276-318: LGTM! Postgres-specific close logic is appropriate.

The custom close method properly handles Postgres connection lifecycle with query cancellation and defensive checks to prevent segfaults.


320-385: LGTM! MSSql-specific handling is well implemented.

The connector properly handles MSSQL decimal rounding requirements and includes a helpful workaround for ibis issue #10331 with descriptive error messages.


450-488: LGTM! BigQuery dry run now uses SDK as intended.

The refactored BigQueryConnector properly uses the BigQuery SDK for both query and dry_run operations, addressing the omission from PR #1370. The implementation correctly:

  • Creates a persistent BigQuery client
  • Uses bigquery.QueryJobConfig(dry_run=True, use_query_cache=False) for dry run validation
  • Implements proper cleanup in the close method

490-569: LGTM! DuckDB connector is well structured.

The connector properly handles various file storage backends (S3, Minio, GCS) and implements database file attachment with appropriate error handling.


571-675: LGTM! Redshift and Databricks connectors are properly implemented.

Both connectors correctly:

  • Handle multiple authentication methods (IAM/password for Redshift, token/service principal for Databricks)
  • Implement the ConnectorABC interface consistently
  • Use appropriate dry run strategies (LIMIT 0 pattern)
  • Include proper connection cleanup with error handling

@goldmedal
Copy link
Copy Markdown
Contributor Author

CI failure is the oracle flaky test. Let's ignore it for this PR.

@goldmedal goldmedal requested a review from douenergy November 21, 2025 06:22
@douenergy douenergy merged commit 84bd86a into Canner:main Dec 5, 2025
5 of 6 checks passed
nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants