Skip to content

fix(ibis): using DataFusion to format the query result#1235

Merged
douenergy merged 8 commits intoCanner:mainfrom
goldmedal:fix/df-formatting
Jul 1, 2025
Merged

fix(ibis): using DataFusion to format the query result#1235
douenergy merged 8 commits intoCanner:mainfrom
goldmedal:fix/df-formatting

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Jun 27, 2025

Description

Because DuckDB doesn't support setting the timezone offset as the session timezone, we need to change to DataFusion to format the query result. There are some changes to the result:

  • The timezone will be shown as the offset for the result always.
    • 2011-01-01 12:00:00 UTC -> 2011-01-01 12:00:00 +00:00
  • TIMESTAMP and DATE types won't be affected by the session timezone. Session timezone only affects the TIMESTAMP WITH TIMEZONE type.

Known Issue

  • MySQL can't recognize the timestamp and timestamp with timezone. We show them with timezone always.
  • Snowflake supports the timestamp tz literal using the offset only.

Summary by CodeRabbit

  • New Features

    • Switched SQL execution backend from DuckDB to DataFusion, improving performance and compatibility.
    • Added session timezone support for query results, ensuring timestamp fields reflect the correct timezone.
  • Bug Fixes

    • Updated timestamp and interval formatting in query results for consistency across data sources.
    • Improved handling and formatting of numeric and floating-point values in query outputs.
  • Chores

    • Updated type annotations to modern Python syntax.
    • Added DataFusion as a new dependency.
    • Enhanced test cases to reflect changes in data formatting and timezone representation.
    • Modernized string formatting style in core components for improved code clarity.

@github-actions github-actions bot added core bigquery ibis dependencies Pull requests that update a dependency file rust Pull requests that update Rust code python Pull requests that update Python code labels Jun 27, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jun 27, 2025

Walkthrough

This update migrates SQL execution on Arrow tables from DuckDB to DataFusion, introduces session timezone handling for timestamp fields, and updates SQL formatting to align with DataFusion's dialect. It also modernizes Python type hints, adjusts test expectations for timestamp and numeric formatting, adds DataFusion as a dependency, and refactors Rust code to set session timezone via properties and update string formatting.

Changes

File(s) Change Summary
app/model/connector.py Added # noqa: PLC0415 to suppress linter warnings on inline imports.
app/model/data_source.py
app/query_cache/init.py
tests/util/sql_generator.py
Updated type hints to use modern union syntax (`T
app/routers/v2/connector.py
app/routers/v3/connector.py
Passes data_source to to_json in query endpoints.
app/util.py Migrated SQL execution from DuckDB to DataFusion, added session timezone handling, updated SQL formatting and identifier quoting, and adjusted function signatures.
pyproject.toml Added datafusion = "^47.0.0" as a dependency.
tests/routers/v2/connector/test_bigquery.py
tests/routers/v2/connector/test_trino.py
tests/routers/v3/connector/bigquery/test_query.py
tests/routers/v3/connector/local_file/test_query.py
tests/routers/v3/connector/postgres/test_query.py
Updated test assertions for numeric and interval formatting, and timestamp timezone representation.
tests/routers/v2/connector/test_clickhouse.py
tests/routers/v2/connector/test_oracle.py
tests/routers/v2/connector/test_postgres.py
tests/routers/v2/connector/test_redshift.py
tests/routers/v2/connector/test_snowflake.py
tests/routers/v3/connector/oracle/test_query.py
Updated expected timestamp strings from "UTC" to "+00:00" format.
tests/routers/v2/connector/test_mysql.py Updated expected timestamp values and types to be timezone-aware.
wren-core/core/src/mdl/context.rs Sets session timezone in DataFusion session context using session properties.
wren-core/core/src/mdl/mod.rs Test refactor: injects timezone via headers map and updates timestamp coercion assertions; updates string formatting style.
wren-core/core/src/logical_plan/analyze/access_control.rs
wren-core/core/src/logical_plan/analyze/plan.rs
wren-core/core/src/logical_plan/utils.rs
wren-core/core/src/mdl/function.rs
wren-core/core/src/mdl/utils.rs
Modernized Rust string formatting syntax to inline variable interpolation without changing logic or behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Router
    participant Util
    participant DataFusion
    participant DataSource

    Router->>Util: to_json(arrow_table, headers, data_source)
    Util->>Util: _with_session_timezone(arrow_table, headers, data_source)
    Util->>DataFusion: get_datafusion_context(headers)
    Util->>DataFusion: Register "arrow_table"
    Util->>DataFusion: Execute SQL query
    DataFusion-->>Util: Result Table
    Util->>Router: JSON response
Loading

Possibly related PRs

Suggested reviewers

  • douenergy

Poem

In the warren of code, a new path we weave,
DuckDB hops out, DataFusion takes leave.
Timezones align as the sun starts to set,
With tests now expecting the best formats yet.
A carrot for progress, a hop for each test,
In the garden of queries, this patch is the best! 🥕✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6938e4 and d55ed04.

📒 Files selected for processing (1)
  • wren-core/wren-example/examples/calculation-invoke-calculation.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-core/wren-example/examples/calculation-invoke-calculation.rs
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: cargo test (amd64)
  • GitHub Check: clippy
  • GitHub Check: check Cargo.toml formatting
  • GitHub Check: ci
  • GitHub Check: test
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo test (macos-aarch64)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
ibis-server/app/model/connector.py (3)

155-155: Remove unused noqa directive.

The static analysis indicates this noqa directive is unused because the PLC0415 rule is not enabled in the linter configuration.

-            import ibis.backends.bigquery  # noqa: PLC0415
+            import ibis.backends.bigquery

193-193: Remove unused noqa directive.

The static analysis indicates this noqa directive is unused because the PLC0415 rule is not enabled in the linter configuration.

-        import duckdb  # noqa: PLC0415
+        import duckdb

224-224: Remove unused noqa directive.

The static analysis indicates this noqa directive is unused because the PLC0415 rule is not enabled in the linter configuration.

-        import redshift_connector  # noqa: PLC0415
+        import redshift_connector
ibis-server/app/util.py (3)

70-111: Comprehensive timezone handling implementation.

The function properly handles:

  • Session timezone adjustments for timestamp fields
  • MySQL special case for missing timezone information
  • Nullable field handling to prevent casting errors

Note the TODOs indicate known limitations with MySQL and ClickHouse connectors.

Would you like me to create tracking issues for the MySQL timezone and ClickHouse nullable field TODOs?


207-211: Decimal formatting workaround implemented.

The conditional logic prevents unnecessary scientific notation for zero values. Consider implementing the suggested to_char UDF for better decimal formatting control as noted in the TODO.

Would you like me to create an issue to track the implementation of a custom to_char UDF for better decimal formatting?


215-220: Consider removing unnecessary else block.

The timestamp formatting is correct, but the code can be simplified:

 elif pa.types.is_timestamp(field.type):
     if field.type.tz is None:
         return f"to_char({column_name}, '%Y-%m-%d %H:%M:%S%.6f') as {column_name}"
-    else:
-        return (
-            f"to_char({column_name}, '%Y-%m-%d %H:%M:%S%.6f %Z') as {column_name}"
-        )
+    return (
+        f"to_char({column_name}, '%Y-%m-%d %H:%M:%S%.6f %Z') as {column_name}"
+    )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16f19b0 and c668c82.

⛔ Files ignored due to path filters (1)
  • ibis-server/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (22)
  • ibis-server/app/model/connector.py (3 hunks)
  • ibis-server/app/model/data_source.py (1 hunks)
  • ibis-server/app/query_cache/__init__.py (2 hunks)
  • ibis-server/app/routers/v2/connector.py (1 hunks)
  • ibis-server/app/routers/v3/connector.py (1 hunks)
  • ibis-server/app/util.py (4 hunks)
  • ibis-server/pyproject.toml (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_bigquery.py (4 hunks)
  • ibis-server/tests/routers/v2/connector/test_clickhouse.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_mysql.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_oracle.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_redshift.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_snowflake.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_trino.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/bigquery/test_query.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/local_file/test_query.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/oracle/test_query.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (4 hunks)
  • ibis-server/tests/util/sql_generator.py (1 hunks)
  • wren-core/core/src/mdl/context.rs (2 hunks)
  • wren-core/core/src/mdl/mod.rs (6 hunks)
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object via its "replacement scan" mechanism that recognizes Python variables referencing Arrow objects as SQL tables. No conn.register() call is required.
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object without requiring conn.register().
Learnt from: goldmedal
PR: Canner/wren-engine#1029
File: ibis-server/app/model/metadata/object_storage.py:44-44
Timestamp: 2025-01-07T03:56:21.741Z
Learning: When working with DuckDB in Python, use `conn.execute("DESCRIBE SELECT * FROM table").fetchall()` to get column types instead of accessing DataFrame-style attributes like `dtype` or `dtypes`.
ibis-server/app/model/connector.py (1)
Learnt from: goldmedal
PR: Canner/wren-engine#1029
File: ibis-server/app/model/metadata/object_storage.py:44-44
Timestamp: 2025-01-07T03:56:21.741Z
Learning: When working with DuckDB in Python, use `conn.execute("DESCRIBE SELECT * FROM table").fetchall()` to get column types instead of accessing DataFrame-style attributes like `dtype` or `dtypes`.
ibis-server/app/routers/v3/connector.py (1)
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:50-57
Timestamp: 2025-06-18T02:12:43.570Z
Learning: In the `to_json` function in `ibis-server/app/util.py`, the code intentionally uses `fetch_df()` to get a pandas DataFrame and then calls `to_dict(orient='split')` because this specific format is required for `orjson` serialization. The pandas conversion step is necessary to generate the correct dictionary structure for orjson.
ibis-server/app/routers/v2/connector.py (1)
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:50-57
Timestamp: 2025-06-18T02:12:43.570Z
Learning: In the `to_json` function in `ibis-server/app/util.py`, the code intentionally uses `fetch_df()` to get a pandas DataFrame and then calls `to_dict(orient='split')` because this specific format is required for `orjson` serialization. The pandas conversion step is necessary to generate the correct dictionary structure for orjson.
wren-core/core/src/mdl/context.rs (1)
Learnt from: goldmedal
PR: Canner/wren-engine#1161
File: ibis-server/app/routers/v3/connector.py:78-83
Timestamp: 2025-05-05T02:27:29.829Z
Learning: The row-level access control implementation in Wren Engine filters headers with the prefix `X_WREN_VARIABLE_PREFIX` in `EmbeddedEngineRewriter.get_session_properties` and validates session property expressions in `access_control.rs` to ensure they only contain literal values, preventing SQL injection.
wren-core/core/src/mdl/mod.rs (1)
Learnt from: goldmedal
PR: Canner/wren-engine#1161
File: ibis-server/app/routers/v3/connector.py:78-83
Timestamp: 2025-05-05T02:27:29.829Z
Learning: The row-level access control implementation in Wren Engine filters headers with the prefix `X_WREN_VARIABLE_PREFIX` in `EmbeddedEngineRewriter.get_session_properties` and validates session property expressions in `access_control.rs` to ensure they only contain literal values, preventing SQL injection.
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)
Learnt from: goldmedal
PR: Canner/wren-engine#1029
File: ibis-server/app/model/metadata/object_storage.py:44-44
Timestamp: 2025-01-07T03:56:21.741Z
Learning: When working with DuckDB in Python, use `conn.execute("DESCRIBE SELECT * FROM table").fetchall()` to get column types instead of accessing DataFrame-style attributes like `dtype` or `dtypes`.
ibis-server/app/util.py (4)
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:50-57
Timestamp: 2025-06-18T02:12:43.570Z
Learning: In the `to_json` function in `ibis-server/app/util.py`, the code intentionally uses `fetch_df()` to get a pandas DataFrame and then calls `to_dict(orient='split')` because this specific format is required for `orjson` serialization. The pandas conversion step is necessary to generate the correct dictionary structure for orjson.
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object via its "replacement scan" mechanism that recognizes Python variables referencing Arrow objects as SQL tables. No conn.register() call is required.
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object without requiring conn.register().
Learnt from: goldmedal
PR: Canner/wren-engine#1029
File: ibis-server/app/model/metadata/object_storage.py:44-44
Timestamp: 2025-01-07T03:56:21.741Z
Learning: When working with DuckDB in Python, use `conn.execute("DESCRIBE SELECT * FROM table").fetchall()` to get column types instead of accessing DataFrame-style attributes like `dtype` or `dtypes`.
🧬 Code Graph Analysis (2)
ibis-server/tests/util/sql_generator.py (1)
ibis-server/tests/model/__init__.py (1)
  • Function (4-10)
ibis-server/app/routers/v2/connector.py (1)
ibis-server/app/util.py (1)
  • to_json (43-67)
🪛 Ruff (0.11.9)
ibis-server/app/model/connector.py

155-155: Unused noqa directive (non-enabled: PLC0415)

(RUF100)


193-193: Unused noqa directive (non-enabled: PLC0415)

(RUF100)


224-224: Unused noqa directive (non-enabled: PLC0415)

(RUF100)

🪛 Pylint (3.3.7)
ibis-server/app/util.py

[refactor] 215-220: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 205-205: Too many return statements (7/6)

(R0911)

⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: ci
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo check
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: test
🔇 Additional comments (56)
ibis-server/pyproject.toml (1)

49-49: LGTM! DataFusion dependency addition aligns with PR objectives.

The addition of DataFusion as a dependency is necessary for the migration from DuckDB to DataFusion for query result formatting, as mentioned in the PR objectives.

ibis-server/tests/util/sql_generator.py (1)

12-12: LGTM! Type hint modernization improves consistency.

The change from Optional[str] to str | None modernizes the type annotation to use the newer union syntax, improving code consistency across the codebase.

ibis-server/app/model/data_source.py (1)

243-243: LGTM! Type hint modernization improves consistency.

The change from Optional[ssl.SSLContext] to ssl.SSLContext | None modernizes the type annotation to use the newer union syntax, consistent with similar updates across the codebase.

ibis-server/app/query_cache/__init__.py (2)

3-3: LGTM! Import cleanup removes unused Optional.

Removing the unused Optional import after modernizing the type hints is good housekeeping.


20-20: LGTM! Type hint modernization improves consistency.

The change from Optional[Any] to Any | None modernizes the type annotation to use the newer union syntax, consistent with similar updates across the codebase.

ibis-server/tests/routers/v2/connector/test_trino.py (1)

114-114: LGTM! Numeric type assertion updated correctly.

The change from string to float representation aligns with the DataFusion migration and properly reflects the column's float type definition in the manifest.

ibis-server/tests/routers/v2/connector/test_oracle.py (1)

197-197: LGTM! Timezone format updated to use offset notation.

The change from "UTC" to "+00:00" correctly implements the timezone standardization described in the PR objectives.

ibis-server/tests/routers/v2/connector/test_redshift.py (2)

113-113: LGTM! Timezone format standardized across test scenarios.

The changes from "UTC" to "+00:00" correctly implement the timezone offset notation as described in the PR objectives.


153-153: LGTM! Consistent timezone format implementation.

This change maintains consistency with the timezone format standardization applied in the primary test function.

ibis-server/tests/routers/v3/connector/local_file/test_query.py (1)

144-144: LGTM! Calculated field type assertion updated correctly.

The change from string to float representation aligns with the column's float type definition and the new DataFusion serialization behavior.

ibis-server/tests/routers/v2/connector/test_snowflake.py (1)

97-97: LGTM! Timezone format standardization completed.

The change from "UTC" to "+00:00" ensures consistent timezone offset notation across all connector tests, completing the standardization effort.

ibis-server/tests/routers/v2/connector/test_clickhouse.py (1)

186-186: Timestamp format updated correctly for DataFusion compatibility.

The test expectation correctly reflects the new timezone format where timestamps with timezone display the offset (+00:00) instead of the named timezone (UTC). This aligns with the migration from DuckDB to DataFusion for query result formatting.

ibis-server/tests/routers/v3/connector/oracle/test_query.py (1)

100-100: Timestamp format updated correctly for DataFusion compatibility.

The test expectation correctly reflects the new timezone format where timestamps with timezone display the offset (+00:00) instead of the named timezone (UTC). This change is consistent with the DataFusion migration for improved timezone handling.

ibis-server/app/routers/v3/connector.py (1)

161-161: Data source parameter correctly added for timezone handling.

The addition of data_source=data_source parameter to the to_json function call is necessary for the DataFusion migration. This parameter enables session timezone adjustments and proper timestamp formatting in query results.

ibis-server/app/routers/v2/connector.py (1)

175-175: Data source parameter correctly added for timezone handling.

The addition of data_source=data_source parameter to the to_json function call aligns with the DataFusion migration changes. This ensures consistent timezone handling across v2 and v3 API endpoints.

ibis-server/tests/routers/v2/connector/test_mysql.py (2)

165-166: Timestamp values updated correctly for DataFusion timezone format.

The timestamp values now include explicit timezone offsets (+00:00) instead of naive timestamps, which aligns with the DataFusion migration's approach to consistent timezone representation in query results.


177-179: Data types updated to include timezone information.

The timestamp data types now correctly include timezone information (timestamp[us, tz=UTC]) instead of timezone-naive types (timestamp[us]). This reflects DataFusion's enhanced timezone handling and provides more accurate type information for timestamp columns.

ibis-server/tests/routers/v3/connector/bigquery/test_query.py (3)

85-85: LGTM: Numeric type change reflects DataFusion behavior.

The change from string "356711.63" to float 356711.63 correctly reflects DataFusion's native numeric type handling, which is more accurate than DuckDB's string representation.


89-91: LGTM: Timezone format standardized to offset notation.

The changes from "UTC" suffix to "+00:00" offset format align with the PR objective to standardize timezone representation using DataFusion. The DST calculations appear correct:

  • EST (UTC-5) for January 15th
  • EDT (UTC-4) for July 15th

296-298: LGTM: Timestamp function tests updated consistently.

The timezone format changes are consistently applied to all timestamp function test results, maintaining the same offset notation pattern throughout the test suite.

ibis-server/tests/routers/v2/connector/test_postgres.py (2)

191-191: LGTM: Timezone format standardized across connectors.

The change from "UTC" suffix to "+00:00" offset notation maintains consistency with the DataFusion timezone handling implementation across all connector tests.


1010-1010: LGTM: Improved numeric precision with DataFusion.

The change from string "74.6626535" to precise float 74.66265347816136 reflects DataFusion's improved numeric precision handling for PostGIS geometry calculations.

wren-core/core/src/mdl/context.rs (3)

42-42: LGTM: Required import for DataFusion configuration.

The ScalarValue import is necessary for setting DataFusion session configuration values.


55-58: LGTM: Safe timezone extraction with fallback.

The timezone extraction properly handles the optional nature of session properties:

  • Uses "x-wren-timezone" key as documented in PR objectives
  • Provides safe fallback to "UTC" when not specified
  • Properly handles nested Option types

61-64: LGTM: Correct DataFusion timezone configuration.

The timezone configuration correctly uses DataFusion's "datafusion.execution.time_zone" setting with ScalarValue::Utf8 wrapper, which is the proper way to set session timezone in DataFusion.

wren-core/core/src/mdl/mod.rs (9)

474-474: LGTM: SessionContext import added.

The import is necessary for the updated test functions that now create SessionContext instances with SessionContext::new().


1122-1125: LGTM: Timezone handling updated to use headers.

The test correctly implements the new approach of setting timezone via headers with the "x-wren-timezone" key instead of using SessionConfig. This aligns with the PR objective of switching from DuckDB to DataFusion for timezone handling.


1132-1132: LGTM: Using headers for session properties.

The test now correctly passes the headers containing timezone information to transform_sql_with_ctx instead of an empty HashMap.


1144-1144: LGTM: Consistent use of headers for timezone.

The test maintains consistency by using the same headers throughout the timestamp with timezone test.


1151-1157: LGTM: Timezone setup for America/New_York test case.

The test correctly creates a new headers map for testing America/New_York timezone handling, following the same pattern as the previous test case.


1165-1165: LGTM: Using America/New_York timezone headers.

The test correctly applies the timezone headers for the America/New_York test case.


1177-1177: LGTM: Consistent timezone header usage.

The test maintains consistency by using the same timezone headers for all related test cases.


1330-1331: LGTM: Updated assertion reflects stricter type casting.

The assertion now expects explicit casting of timestamp literals as TIMESTAMP WITH TIME ZONE before the final cast, which reflects stricter typing expectations in DataFusion compared to DuckDB.


1359-1361: LGTM: Consistent type casting in assertions.

The assertion follows the same pattern as the previous test case, expecting explicit casting of timestamp columns to TIMESTAMP WITH TIME ZONE before comparison operations.

ibis-server/tests/routers/v2/connector/test_bigquery.py (10)

94-94: LGTM!

The change from string to float representation for numeric values is correct and aligns with DataFusion's numeric type handling.


98-98: Timestamp format correctly updated to offset notation.

The change from "UTC" suffix to "+00:00" offset is consistent with the PR objectives for DataFusion timestamp formatting.


320-320: Interval format correctly updated for DataFusion.

The interval representation now uses DataFusion's format with months abbreviated as "mons" and simplified hour notation.


335-335: Average interval format correctly updated.

The new format explicitly labels time units (hours, mins, secs) which is DataFusion's more verbose but clearer interval representation.


365-365: Consistent interval formatting applied.

The interval format matches the earlier update, maintaining consistency across all interval test assertions.


94-94: LGTM: Numeric value formatting updated correctly.

The change from string "172799.49" to float 172799.49 aligns with proper data type representation in the DataFusion migration.


98-98: LGTM: Timezone format updated as documented.

The change from "UTC" to "+00:00" format is explicitly mentioned in the PR objectives as a key change when migrating to DataFusion for query result formatting.


320-320: LGTM: Interval format updated to DataFusion's representation.

The change from verbose format to condensed format ("112 mons 100 days 1 hours") reflects DataFusion's interval formatting conventions.


335-335: LGTM: Average interval format updated consistently.

The more explicit breakdown format ("10484 days 8 hours 54 mins 14.400000000 secs") is consistent with DataFusion's interval representation for aggregate functions.


365-365: LGTM: Consistent interval format across test cases.

This change mirrors the format update in line 320, ensuring consistency across all interval-related test assertions.

ibis-server/tests/routers/v3/connector/postgres/test_query.py (5)

144-144: Numeric formatting consistent with DataFusion.

Float representation matches the changes in other connector tests.


148-150: Timestamp formatting correctly handles timezones.

The changes properly show:

  • UTC timestamps with +00:00 offset notation
  • Correct DST conversions for America/New_York timezone

183-190: Data types correctly updated to Arrow format.

The dtype changes from pandas/numpy style to Arrow types align with DataFusion's type system.


627-627: Test cases updated to cover DataFusion's floating point formatting.

The modified test cases better reflect DataFusion's handling of scientific notation and decimal formatting thresholds.

Also applies to: 635-635


668-699: Test assertions correctly updated for DataFusion formatting.

The expected results properly reflect:

  • Consistent decimal string formatting
  • Scientific notation for small values
  • Special float values (NaN, Infinity) as null

Note the TODO comment about future scientific notation improvements.

ibis-server/app/util.py (7)

43-44: Function signature properly extended for timezone handling.

The optional data_source parameter enables data source-specific timezone adjustments while maintaining backward compatibility.


53-62: DataFusion integration correctly implemented.

The code properly:

  • Creates timezone-aware DataFusion context
  • Registers Arrow tables using DataFusion API
  • Formats SQL for DataFusion dialect

113-123: DataFusion context correctly configured with timezone.

The implementation properly sets the session timezone from headers with a sensible UTC default.


200-203: SQL identifier quoting correctly implemented.

Standard implementation that properly escapes double quotes and prevents SQL injection.


213-213: Date formatting correctly updated for DataFusion.

Uses PostgreSQL-style to_char function with appropriate format string.


222-222: Binary encoding correctly updated for DataFusion.

Uses PostgreSQL-style encode function which is the appropriate method for DataFusion.


3-3: ```shell
#!/bin/bash

Search for any reference to datafusion in the pyproject.toml to ensure it's listed under dependencies

grep -R "datafusion" -n ibis-server/pyproject.toml


</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@goldmedal
Copy link
Copy Markdown
Contributor Author

BigQuery test has been tested locally. Let's ignore CI failure.

@goldmedal goldmedal requested a review from douenergy June 30, 2025 02:34
@douenergy douenergy merged commit ad16d4d into Canner:main Jul 1, 2025
12 of 13 checks passed
@goldmedal goldmedal deleted the fix/df-formatting branch July 2, 2025 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery core dependencies Pull requests that update a dependency file ibis python Pull requests that update Python code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants