Skip to content

fix(ibis): handle pyarrow unsupported types (decimal and uuid)#1273

Merged
douenergy merged 7 commits intoCanner:mainfrom
goldmedal:fix/handle-pyarrow-unsupported-type
Jul 29, 2025
Merged

fix(ibis): handle pyarrow unsupported types (decimal and uuid)#1273
douenergy merged 7 commits intoCanner:mainfrom
goldmedal:fix/handle-pyarrow-unsupported-type

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Jul 24, 2025

Description

UUID

PyArrow does not currently support UUID. Users will get the following error:

pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'UUID' object", 'Conversion failed for column c1 with type object')

This PR tries to cast it to string type if the result includes UUID columns.

MSSQL decimal

Summary by CodeRabbit

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of unsupported data types (Decimal and UUID) in query results for multiple connectors, ensuring compatibility and consistent formatting.
    • Decimal columns are now rounded to a fixed scale, and UUID columns are returned as strings.
  • Tests

    • Added and updated tests to verify correct handling of decimal precision and UUID types in MSSQL and PostgreSQL connectors.
    • Adjusted expectations for order-by queries without explicit limits in MSSQL tests.
  • Chores

    • Internal adjustments to connection information handling for improved consistency.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jul 24, 2025

Walkthrough

This update refactors how unsupported PyArrow types (specifically Decimal and UUID) are handled in query results for several connectors. It introduces new internal methods for type conversion and rounding, removes the older round_decimal_columns utility, and updates related tests to verify decimal precision and UUID handling. Minor adjustments are made to connection info handling elsewhere.

Changes

Cohort / File(s) Change Summary
Connector refactoring and type handling
ibis-server/app/model/connector.py
Refactored type handling in SimpleConnector, MSSqlConnector, and CannerConnector for decimals and UUIDs; added internal helper methods for rounding and casting; removed old rounding utility usage; simplified MSSQL query logic.
Utility cleanup
ibis-server/app/util.py
Removed the round_decimal_columns utility function and related imports.
MSSQL connector tests
ibis-server/tests/routers/v2/connector/test_mssql.py
Extended MSSQL tests: created UUID test table/row, added tests for decimal precision and UUID type handling, renamed and updated order-by test to reflect new behavior without limit errors.
Postgres connector tests
ibis-server/tests/routers/v2/connector/test_postgres.py
Added a new test for UUID type handling in PostgreSQL connector.
Miscellaneous minor fixes
ibis-server/wren/__main__.py
Added trailing comma in get_connection_info call argument list.
Data source connection info
ibis-server/app/model/data_source.py
Made headers parameter optional with default None in get_connection_info method of DataSource class, preserving existing logic.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Connector
    participant PyArrow

    Client->>Connector: query(sql, limit)
    Connector->>Connector: _handle_pyarrow_unsupported_type(ibis_table)
    alt Decimal columns present
        Connector->>Connector: _round_decimal_columns(...)
    end
    alt UUID columns present
        Connector->>Connector: _cast_uuid_columns(...)
    end
    Connector->>PyArrow: Convert Ibis table to PyArrow table
    Connector-->>Client: Return processed table
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–20 minutes

Assessment against linked issues

Objective Addressed Explanation
MSSQL: SELECT expressions without alias cause syntax error (#1246) Test test_decimal_precision verifies queries with expressions without alias work correctly.

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes detected related to the linked issue objectives.

Possibly related PRs

Suggested reviewers

  • douenergy

Poem

A decimal here, a UUID there,
The connectors now handle them with flair.
Old rounding’s gone, new helpers shine,
PyArrow types align just fine.
Tests ensure all works as planned—
A bunny’s hop through data’s land! 🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43a0ece and 863d170.

📒 Files selected for processing (1)
  • ibis-server/app/model/data_source.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/app/model/data_source.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels Jul 24, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
ibis-server/app/model/connector.py (1)

240-263: Consider performance implications of pandas conversion for decimal rounding

The new implementation converts the entire result to pandas DataFrame for decimal rounding, which could have performance implications for large result sets. Consider:

  1. Only converting if decimal columns exist (which is already done ✓)
  2. Document why pandas conversion is necessary over Ibis expressions
  3. Consider batch processing for very large results

Add a comment explaining why pandas conversion is used:

     def _round_decimal_columns(self, ibis_table: Table, scale: int = 9) -> pa.Table:
+        # Using pandas for decimal rounding because PyArrow's decimal type
+        # requires exact precision handling that Ibis expressions don't provide
         def round_decimal(val):
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f74b3c and 976aa6a.

📒 Files selected for processing (5)
  • ibis-server/app/model/connector.py (6 hunks)
  • ibis-server/app/util.py (0 hunks)
  • ibis-server/tests/routers/v2/connector/test_mssql.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
  • ibis-server/wren/__main__.py (1 hunks)
💤 Files with no reviewable changes (1)
  • ibis-server/app/util.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object without requiring conn.register().
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object via its "replacement scan" mechanism that recognizes Python variables referencing Arrow objects as SQL tables. No conn.register() call is required.
ibis-server/app/model/connector.py (3)

Learnt from: goldmedal
PR: #1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object via its "replacement scan" mechanism that recognizes Python variables referencing Arrow objects as SQL tables. No conn.register() call is required.

Learnt from: goldmedal
PR: #1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object without requiring conn.register().

Learnt from: goldmedal
PR: #1029
File: ibis-server/app/model/metadata/object_storage.py:44-44
Timestamp: 2025-01-07T03:56:21.741Z
Learning: When working with DuckDB in Python, use conn.execute("DESCRIBE SELECT * FROM table").fetchall() to get column types instead of accessing DataFrame-style attributes like dtype or dtypes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (7)
ibis-server/wren/__main__.py (1)

36-43: Explicit headers={} is required
The get_connection_info method signature is defined as:

def get_connection_info(
    self,
    data: dict | ConnectionInfo,
    headers: dict
) -> ConnectionInfo:
    …

Since headers has no default value, callers must supply it explicitly. The existing headers={} arguments are therefore necessary.

ibis-server/tests/routers/v2/connector/test_postgres.py (1)

1049-1064: LGTM! Good test coverage for UUID type handling.

The test properly validates that UUID types are handled correctly in PostgreSQL queries, aligning with the PR's objective to handle PyArrow unsupported types.

ibis-server/tests/routers/v2/connector/test_mssql.py (4)

124-130: LGTM! Proper UUID test data setup.

The UUID test table creation with a uniqueidentifier column and test UUID value is well-structured for testing UUID type handling in MSSQL.


501-517: Test correctly reflects the removal of ORDER BY limitation.

The test now expects a successful response (200) for ORDER BY queries without LIMIT, indicating that the previous MSSQL-specific restriction has been removed. This aligns with the simplified query handling in the connector.


519-534: Good test for decimal precision handling.

The test verifies that decimal division results are properly formatted with the expected precision. The comment about not giving the expression an alias is helpful for understanding the test intent.


536-570: Comprehensive UUID type test for MSSQL.

The test properly validates:

  1. UUID values are returned as uppercase strings (MSSQL convention)
  2. The data type is reported as "string" in the response

This aligns with the connector's UUID to string casting implementation.

ibis-server/app/model/connector.py (1)

141-146: Good implementation of UUID to string casting

The UUID casting implementation is clean and follows the same pattern as decimal handling. This ensures compatibility with PyArrow which doesn't natively support UUID types.

@goldmedal goldmedal requested a review from douenergy July 25, 2025 02:07
@goldmedal goldmedal force-pushed the fix/handle-pyarrow-unsupported-type branch from aa838d9 to 43a0ece Compare July 25, 2025 07:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
ibis-server/app/model/data_source.py (1)

88-88: Typo in docstring.

requriedrequired.

ibis-server/tests/routers/v2/connector/test_mssql.py (3)

124-130: Qualify fixture table with schema to avoid clashes

Consider creating the UUID fixture table as dbo.uuid_test (or quoting "dbo"."uuid_test") so tests remain isolated even if a user already has an object named uuid_test in another schema.

-conn.execute(text("CREATE TABLE uuid_test (order_uuid uniqueidentifier)"))
+conn.execute(text('CREATE TABLE "dbo"."uuid_test" (order_uuid uniqueidentifier)'))

Purely defensive, but helps when the database isn’t freshly created every run.


501-517: Add an explicit LIMIT to keep the test future-proof

null_test currently has only three rows, so omitting LIMIT works. If more rows are ever inserted, this assertion will start failing. Adding params={"limit": 3} (as in the earlier test) makes the intent crystal-clear without affecting current behaviour.


536-569: Avoid shadowing the module-level manifest variable

Inside test_uuid_type a new dict is assigned to manifest, shadowing the one defined at file top. While legal, using a distinct name (e.g. uuid_manifest) avoids confusion when scanning the file.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbefd04 and 43a0ece.

📒 Files selected for processing (6)
  • ibis-server/app/model/connector.py (5 hunks)
  • ibis-server/app/model/data_source.py (1 hunks)
  • ibis-server/app/util.py (0 hunks)
  • ibis-server/tests/routers/v2/connector/test_mssql.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
  • ibis-server/wren/__main__.py (1 hunks)
💤 Files with no reviewable changes (1)
  • ibis-server/app/util.py
✅ Files skipped from review due to trivial changes (1)
  • ibis-server/wren/main.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • ibis-server/tests/routers/v2/connector/test_postgres.py
  • ibis-server/app/model/connector.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (1)
ibis-server/tests/routers/v2/connector/test_mssql.py (1)

519-534: Verify decimal rounding expectation

SQL Server returns 0.33333333 for DECIMAL(38,8) division. The test expects 0.333333 (6 dp), implying the connector now rounds/truncates two extra digits. Please double-check that this matches the new rounding logic and doesn’t mask precision loss.

If 8-dp precision is still desired, update either the connector or the assertion accordingly.

@douenergy
Copy link
Copy Markdown
Contributor

Thanks @goldmedal

@douenergy douenergy merged commit e8d92f4 into Canner:main Jul 29, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MSSQL: SELECT expressions without alias cause syntax error

2 participants