fix(ibis): handle pyarrow unsupported types (decimal and uuid)#1273
fix(ibis): handle pyarrow unsupported types (decimal and uuid)#1273douenergy merged 7 commits intoCanner:mainfrom
Conversation
WalkthroughThis update refactors how unsupported PyArrow types (specifically Decimal and UUID) are handled in query results for several connectors. It introduces new internal methods for type conversion and rounding, removes the older Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Connector
participant PyArrow
Client->>Connector: query(sql, limit)
Connector->>Connector: _handle_pyarrow_unsupported_type(ibis_table)
alt Decimal columns present
Connector->>Connector: _round_decimal_columns(...)
end
alt UUID columns present
Connector->>Connector: _cast_uuid_columns(...)
end
Connector->>PyArrow: Convert Ibis table to PyArrow table
Connector-->>Client: Return processed table
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15–20 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes detected related to the linked issue objectives. Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
ibis-server/app/model/connector.py (1)
240-263: Consider performance implications of pandas conversion for decimal roundingThe new implementation converts the entire result to pandas DataFrame for decimal rounding, which could have performance implications for large result sets. Consider:
- Only converting if decimal columns exist (which is already done ✓)
- Document why pandas conversion is necessary over Ibis expressions
- Consider batch processing for very large results
Add a comment explaining why pandas conversion is used:
def _round_decimal_columns(self, ibis_table: Table, scale: int = 9) -> pa.Table: + # Using pandas for decimal rounding because PyArrow's decimal type + # requires exact precision handling that Ibis expressions don't provide def round_decimal(val):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
ibis-server/app/model/connector.py(6 hunks)ibis-server/app/util.py(0 hunks)ibis-server/tests/routers/v2/connector/test_mssql.py(2 hunks)ibis-server/tests/routers/v2/connector/test_postgres.py(1 hunks)ibis-server/wren/__main__.py(1 hunks)
💤 Files with no reviewable changes (1)
- ibis-server/app/util.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object without requiring conn.register().
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object via its "replacement scan" mechanism that recognizes Python variables referencing Arrow objects as SQL tables. No conn.register() call is required.
ibis-server/app/model/connector.py (3)
Learnt from: goldmedal
PR: #1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object via its "replacement scan" mechanism that recognizes Python variables referencing Arrow objects as SQL tables. No conn.register() call is required.
Learnt from: goldmedal
PR: #1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object without requiring conn.register().
Learnt from: goldmedal
PR: #1029
File: ibis-server/app/model/metadata/object_storage.py:44-44
Timestamp: 2025-01-07T03:56:21.741Z
Learning: When working with DuckDB in Python, use conn.execute("DESCRIBE SELECT * FROM table").fetchall() to get column types instead of accessing DataFrame-style attributes like dtype or dtypes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (7)
ibis-server/wren/__main__.py (1)
36-43: Explicitheaders={}is required
Theget_connection_infomethod signature is defined as:def get_connection_info( self, data: dict | ConnectionInfo, headers: dict ) -> ConnectionInfo: …Since
headershas no default value, callers must supply it explicitly. The existingheaders={}arguments are therefore necessary.ibis-server/tests/routers/v2/connector/test_postgres.py (1)
1049-1064: LGTM! Good test coverage for UUID type handling.The test properly validates that UUID types are handled correctly in PostgreSQL queries, aligning with the PR's objective to handle PyArrow unsupported types.
ibis-server/tests/routers/v2/connector/test_mssql.py (4)
124-130: LGTM! Proper UUID test data setup.The UUID test table creation with a
uniqueidentifiercolumn and test UUID value is well-structured for testing UUID type handling in MSSQL.
501-517: Test correctly reflects the removal of ORDER BY limitation.The test now expects a successful response (200) for ORDER BY queries without LIMIT, indicating that the previous MSSQL-specific restriction has been removed. This aligns with the simplified query handling in the connector.
519-534: Good test for decimal precision handling.The test verifies that decimal division results are properly formatted with the expected precision. The comment about not giving the expression an alias is helpful for understanding the test intent.
536-570: Comprehensive UUID type test for MSSQL.The test properly validates:
- UUID values are returned as uppercase strings (MSSQL convention)
- The data type is reported as "string" in the response
This aligns with the connector's UUID to string casting implementation.
ibis-server/app/model/connector.py (1)
141-146: Good implementation of UUID to string castingThe UUID casting implementation is clean and follows the same pattern as decimal handling. This ensures compatibility with PyArrow which doesn't natively support UUID types.
aa838d9 to
43a0ece
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
ibis-server/app/model/data_source.py (1)
88-88: Typo in docstring.
requried→required.ibis-server/tests/routers/v2/connector/test_mssql.py (3)
124-130: Qualify fixture table with schema to avoid clashesConsider creating the UUID fixture table as
dbo.uuid_test(or quoting"dbo"."uuid_test") so tests remain isolated even if a user already has an object nameduuid_testin another schema.-conn.execute(text("CREATE TABLE uuid_test (order_uuid uniqueidentifier)")) +conn.execute(text('CREATE TABLE "dbo"."uuid_test" (order_uuid uniqueidentifier)'))Purely defensive, but helps when the database isn’t freshly created every run.
501-517: Add an explicit LIMIT to keep the test future-proof
null_testcurrently has only three rows, so omittingLIMITworks. If more rows are ever inserted, this assertion will start failing. Addingparams={"limit": 3}(as in the earlier test) makes the intent crystal-clear without affecting current behaviour.
536-569: Avoid shadowing the module-levelmanifestvariableInside
test_uuid_typea new dict is assigned tomanifest, shadowing the one defined at file top. While legal, using a distinct name (e.g.uuid_manifest) avoids confusion when scanning the file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
ibis-server/app/model/connector.py(5 hunks)ibis-server/app/model/data_source.py(1 hunks)ibis-server/app/util.py(0 hunks)ibis-server/tests/routers/v2/connector/test_mssql.py(2 hunks)ibis-server/tests/routers/v2/connector/test_postgres.py(1 hunks)ibis-server/wren/__main__.py(1 hunks)
💤 Files with no reviewable changes (1)
- ibis-server/app/util.py
✅ Files skipped from review due to trivial changes (1)
- ibis-server/wren/main.py
🚧 Files skipped from review as they are similar to previous changes (2)
- ibis-server/tests/routers/v2/connector/test_postgres.py
- ibis-server/app/model/connector.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (1)
ibis-server/tests/routers/v2/connector/test_mssql.py (1)
519-534: Verify decimal rounding expectationSQL Server returns
0.33333333forDECIMAL(38,8)division. The test expects0.333333(6 dp), implying the connector now rounds/truncates two extra digits. Please double-check that this matches the new rounding logic and doesn’t mask precision loss.If 8-dp precision is still desired, update either the connector or the assertion accordingly.
|
Thanks @goldmedal |
Description
UUID
PyArrow does not currently support UUID. Users will get the following error:
This PR tries to cast it to
stringtype if the result includes UUID columns.MSSQL decimal
There are some rounding issues for the mssql decimal column. We need to round them before transform the result to pyarrow table.
Summary by CodeRabbit
Summary by CodeRabbit
Bug Fixes
Tests
Chores