Skip to content

feat(ibis): add statement level timeout for BigQuery#1339

Merged
douenergy merged 2 commits intoCanner:mainfrom
goldmedal:feat/bigquery-timeout
Oct 3, 2025
Merged

feat(ibis): add statement level timeout for BigQuery#1339
douenergy merged 2 commits intoCanner:mainfrom
goldmedal:feat/bigquery-timeout

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Oct 2, 2025

Description

  • Add job_timeout_ms for BigQueryConnectionInfo
  • Apply X_WREN_DB_STATEMENT_TIMEOUT to BigQuery
  • Use disconnect to close the BigQuery connection.

How to test

BigQuery has no sleep or wait function. To avoid making flaky tests, I only tested locally. If a timeout occurs, the error message would be

{
  "errorCode": "GENERIC_USER_ERROR",
  "message": "499 GET https://bigquery.googleapis.com/bigquery/v2/projects/....: Job execution was cancelled: Job timed out after 1 sec\n\nLocation: asia-east1\nJob ID: 4357235c-c72b-4398-96fb-2f3af803cd17\n",
  "metadata": {
    "dialectSql": "SELECT COUNT(1) AS count_40_42_41 FROM (SELECT events.`_TABLE_SUFFIX` FROM (SELECT events.`_TABLE_SUFFIX` FROM (SELECT __source.`_TABLE_SUFFIX` AS `_TABLE_SUFFIX` FROM project.analytics_446447784.`events_*` AS __source) AS events) AS events WHERE regexp_contains(events.`_TABLE_SUFFIX`, '^[0-9]+')) AS events"
  },
  "phase": "SQL_EXECUTION",
  "timestamp": "2025-09-26T17:53:33.590191",
  "correlationId": "dd043b2a-8f82-46b9-a7f6-2708c0a3bc1e"
}

Summary by CodeRabbit

  • New Features

    • BigQuery connections now support client-based connections and honor an optional job timeout.
    • Job timeouts can be set via the X_WREN_DB_STATEMENT_TIMEOUT header (interpreted as milliseconds).
  • Bug Fixes

    • Connection shutdown is more reliable by also handling backends that expose a disconnect-style shutdown.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 2, 2025

Walkthrough

Adds optional BigQueryConnectionInfo.job_timeout_ms, derives it from X_WREN_DB_STATEMENT_TIMEOUT, initializes BigQuery via google.cloud.bigquery.Client with default QueryJobConfig including timeout, and adds a disconnect() fallback in SimpleConnector.close. No public API signature changes.

Changes

Cohort / File(s) Summary
BigQuery connection info
ibis-server/app/model/__init__.py
Add optional BigQueryConnectionInfo.job_timeout_ms field (default None).
BigQuery client-based flow
ibis-server/app/model/data_source.py
Import google.cloud.bigquery; read X_WREN_DB_STATEMENT_TIMEOUT into job_timeout_ms for BigQuery; construct bigquery.Client with project/credentials, set default_query_job_config (uses job_timeout_ms), and create ibis.bigquery backend from the client.
Connector close fallback
ibis-server/app/model/connector.py
Extend SimpleConnector.close to call disconnect() on the underlying connection when close()/con.close() are not available, preserving logging/error handling.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant DataSource
  participant GCP as bigquery.Client
  participant Ibis as ibis.bigquery backend

  Caller->>DataSource: get_bigquery_connection(info, headers)
  Note over DataSource: derive job_timeout_ms from X_WREN_DB_STATEMENT_TIMEOUT if unset
  DataSource->>GCP: Client(project, credentials, location)
  DataSource->>GCP: set default_query_job_config (job_timeout_ms)
  DataSource->>Ibis: ibis.bigquery.connect(client=GCP)
  Ibis-->>Caller: backend instance
Loading
sequenceDiagram
  autonumber
  participant User
  participant Connector as SimpleConnector
  participant Conn as UnderlyingConnection

  User->>Connector: close()
  alt Conn has close()
    Connector->>Conn: close()
  else Conn has disconnect()
    Connector->>Conn: disconnect()
  else Conn has con with close()
    Connector->>Conn: con.close()
  else
    Note over Connector: log / noop as existing behavior
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • douenergy

Poem

I hop through clouds with timeout cheer,
Setting jobs to finish near.
If close slips out, I softly click—
A gentle disconnect does the trick.
Carrots, queries — tidy and clear. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately conveys the primary enhancement of adding statement-level timeouts for BigQuery within the ibis connector, matching the changes to BigQueryConnectionInfo and timeout header handling without extraneous detail.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5834c96 and fe0efd9.

📒 Files selected for processing (1)
  • ibis-server/app/model/data_source.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/app/model/data_source.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels Oct 2, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95bc169 and 5834c96.

📒 Files selected for processing (3)
  • ibis-server/app/model/__init__.py (1 hunks)
  • ibis-server/app/model/connector.py (1 hunks)
  • ibis-server/app/model/data_source.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: validate-pull-request-title
  • GitHub Check: ci
🔇 Additional comments (4)
ibis-server/app/model/__init__.py (1)

106-109: LGTM!

The optional job_timeout_ms field is well-defined with appropriate type annotation and clear documentation. The default of None allows backward compatibility while enabling timeout configuration when needed.

ibis-server/app/model/connector.py (1)

236-238: LGTM!

The disconnect() fallback extends the existing close pattern appropriately, ensuring compatibility with backends that use disconnect() instead of close(). The comment clearly explains the rationale, and the implementation preserves existing error handling.

ibis-server/app/model/data_source.py (2)

12-12: LGTM!

The import is necessary to support the new client-based BigQuery connection initialization.


128-131: LGTM with minor observation.

The timeout handling logic correctly converts the header value from seconds to milliseconds and only sets job_timeout_ms when not already configured.

Minor note: Since job_timeout_ms is a defined field on BigQueryConnectionInfo (with default None), the hasattr(info, "job_timeout_ms") check will always be True. The or info.job_timeout_ms is None condition handles the actual case, making the hasattr check redundant but harmless.

Comment on lines +273 to +279
bq_client = bigquery.Client(
project=info.project_id.get_secret_value(), credentials=credentials
)
job_config = bigquery.QueryJobConfig()
bq_client.default_query_job_config = job_config
backend = ibis.bigquery.connect(client=bq_client, credentials=credentials)
return backend
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Apply the timeout configuration to QueryJobConfig.

The QueryJobConfig is created but not configured with the job_timeout_ms value from info. This means statement-level timeouts will not be enforced, which is the primary objective of this PR.

Apply this diff to configure the timeout:

 bq_client = bigquery.Client(
     project=info.project_id.get_secret_value(), credentials=credentials
 )
 job_config = bigquery.QueryJobConfig()
+if info.job_timeout_ms is not None:
+    job_config.job_timeout_ms = info.job_timeout_ms
 bq_client.default_query_job_config = job_config
 backend = ibis.bigquery.connect(client=bq_client, credentials=credentials)
 return backend
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bq_client = bigquery.Client(
project=info.project_id.get_secret_value(), credentials=credentials
)
job_config = bigquery.QueryJobConfig()
bq_client.default_query_job_config = job_config
backend = ibis.bigquery.connect(client=bq_client, credentials=credentials)
return backend
bq_client = bigquery.Client(
project=info.project_id.get_secret_value(), credentials=credentials
)
job_config = bigquery.QueryJobConfig()
if info.job_timeout_ms is not None:
job_config.job_timeout_ms = info.job_timeout_ms
bq_client.default_query_job_config = job_config
backend = ibis.bigquery.connect(client=bq_client, credentials=credentials)
return backend
🤖 Prompt for AI Agents
In ibis-server/app/model/data_source.py around lines 273 to 279, the
QueryJobConfig is created but not assigned the statement timeout from info; set
the timeout on the config by assigning job_config.job_timeout_ms to the value
from info (e.g., job_config.job_timeout_ms = int(info.job_timeout_ms or
info.job_timeout_ms.get_secret_value() as appropriate) or use the correct
accessor used elsewhere for secrets), then proceed to set
bq_client.default_query_job_config = job_config and return the backend so
statement-level timeouts are enforced.

@douenergy douenergy merged commit 923d6a7 into Canner:main Oct 3, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants