feat(core): support UNNEST syntax for Snowflake#1357
Conversation
WalkthroughAdds Snowflake test fixtures and an integration test, renames a TPCH fixture, and implements a Snowflake-specific InnerDialect to emit FLATTEN table functions for UNNEST with passthroughs in WrenDialect and a unit test for the transformation. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Tests
participant Analyzer as Analyzer/Planner
participant Inner as InnerDialect (Snowflake)
participant Unparser as SQL Unparser
Test->>Analyzer: analyze manifest with UNNEST targeting Snowflake
Analyzer->>Inner: unnest_as_table_factor()?
Inner-->>Analyzer: true/false
alt true (can flatten)
Analyzer->>Inner: unparse_unnest_table_factor(unnest, columns, unparser)
Note right of Inner #DDEBF7: validate args, build FLATTEN(...) TableFactor
Inner->>Unparser: provide TableFactorBuilder (FLATTEN)
Unparser-->>Analyzer: incorporate FLATTEN table factor
else false (fallback)
Inner-->>Analyzer: None (retain UNNEST)
end
Analyzer->>Inner: relation_alias_overrides(alias)
Inner-->>Analyzer: true/false
Analyzer-->>Test: produced SQL (FLATTEN or UNNEST)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
wren-core/core/src/mdl/mod.rs (1)
3777-3803: Snowflake UNNEST→FLATTEN test looks goodCovers the core translation and aliasing path. Consider adding one negative case to assert error messaging for multiple UNNEST args or multiple output columns to lock behavior.
ibis-server/tests/routers/v3/connector/snowflake/test_query.py (1)
34-56: Fix test name typo and harden assertions
- Typo: rename test_qeury → test_query.
- Optional: also assert the returned column name(s) to stabilize the contract.
-async def test_qeury(client, manifest_str, snowflake_connection_info): +async def test_query(client, manifest_str, snowflake_connection_info):ibis-server/tests/routers/v3/connector/snowflake/conftest.py (1)
82-91: Connection info fixtures look finetpch_connection_info and snowflake_connection_info are consistent and scoped to module. Consider making WAREHOUSE configurable via env for CI portability.
Also applies to: 94-103
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)
349-475: UNNEST→FLATTEN mapping: validate alias position and broaden handling
- The alias list sets the flattened value at index 4 then relation_alias_overrides swaps only that index. Please confirm that index 4 corresponds to VALUE in Snowflake for all versions; otherwise SELECT t.item could bind to the wrong column. If needed, reorder aliases to match documented output or add a comment clarifying the contract.
- Current support is limited to Unnest(Projection(EmptyRelation)) i.e., UNNEST([...]). If future cases include UNNEST(column_expr) coming from a row source, consider extending the guard with additional safe shapes.
Would you like me to add a focused unit test asserting that SELECT …, UNNEST(col) AS t(item) produces TABLE(FLATTEN(col)) AS t(..., item, ...) that actually selects the JSON value at runtime (not only string compare)?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
wren-core-py/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
ibis-server/tests/routers/v3/connector/snowflake/conftest.py(3 hunks)ibis-server/tests/routers/v3/connector/snowflake/test_functions.py(2 hunks)ibis-server/tests/routers/v3/connector/snowflake/test_query.py(1 hunks)wren-core/core/src/mdl/dialect/inner_dialect.rs(4 hunks)wren-core/core/src/mdl/dialect/wren_dialect.rs(1 hunks)wren-core/core/src/mdl/mod.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)
wren-core/core/src/mdl/dialect/inner_dialect.rs (4)
unparse_unnest_table_factor(73-80)unparse_unnest_table_factor(358-443)relation_alias_overrides(82-88)relation_alias_overrides(445-474)
ibis-server/tests/routers/v3/connector/snowflake/test_query.py (3)
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (1)
manifest_str(31-32)ibis-server/tests/conftest.py (1)
client(18-23)ibis-server/tests/routers/v3/connector/snowflake/conftest.py (1)
snowflake_connection_info(95-103)
wren-core/core/src/mdl/dialect/inner_dialect.rs (2)
wren-core/core/src/mdl/dialect/wren_dialect.rs (4)
unparse_unnest_table_factor(104-111)relation_alias_overrides(113-120)new(130-134)unnest_as_table_factor(73-75)ibis-server/app/model/data_source.py (1)
DataSource(58-209)
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (2)
ibis-server/tests/routers/v3/connector/snowflake/test_query.py (1)
manifest_str(30-31)ibis-server/tests/routers/v3/connector/snowflake/conftest.py (1)
tpch_connection_info(83-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: ci
- GitHub Check: test
- GitHub Check: cargo check
🔇 Additional comments (4)
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)
103-120: Passthrough hooks for unnest and alias overrides are appropriateClean delegation to the inner dialect; keeps behavior centralized per backend.
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (1)
74-79: Fixture rename to tpch_connection_info is correctMatches conftest export; requests remain unchanged.
Also applies to: 92-97
wren-core/core/src/mdl/dialect/inner_dialect.rs (2)
73-88: New InnerDialect hooks are well-shapedDefault no-op implementations keep non-overriding dialects simple and backward compatible.
91-101: Snowflake dialect selection wired correctlyget_inner_dialect returns SnowflakeDialect for DataSource::Snowflake.
|
This PR has been tested locally. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (1)
33-62: Test state management conflicts with autouse fixture.This test explicitly toggles the
remote_function_list_pathto verify behavior in different states (None, set, None). However, theset_remote_function_list_pathautouse fixture inconftest.py(lines 94-100) will set the path at module setup, interfering with this test's state management.See the review comment on
conftest.pylines 94-100 for the recommended fix (removeautouse=Truefrom the fixture).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ibis-server/justfile(1 hunks)ibis-server/tests/routers/v3/connector/snowflake/conftest.py(3 hunks)ibis-server/tests/routers/v3/connector/snowflake/test_functions.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
ibis-server/tests/routers/v3/connector/snowflake/conftest.py (3)
ibis-server/app/config.py (2)
get_config(98-99)set_remote_function_list_path(82-83)ibis-server/tests/conftest.py (1)
file_path(10-11)ibis-server/tests/routers/v3/connector/mssql/test_functions.py (1)
set_remote_function_list_path(36-40)
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (1)
ibis-server/tests/routers/v3/connector/snowflake/conftest.py (1)
tpch_connection_info(103-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: check Cargo.toml formatting
- GitHub Check: clippy
- GitHub Check: ci
- GitHub Check: test
🔇 Additional comments (8)
ibis-server/tests/routers/v3/connector/snowflake/conftest.py (5)
5-8: LGTM!The imports are appropriate for setting up Snowflake integration tests and configuring the remote function list.
13-13: LGTM!The function_list_path is correctly defined using the file_path helper.
23-92: Verify hardcoded warehouse name.The fixture correctly implements credential checking and resource cleanup (addressing the past review). However, the warehouse name
"COMPUTE_WH"is hardcoded on line 35. This warehouse might not exist in all Snowflake accounts, potentially causing test failures in different environments.Consider parameterizing the warehouse name via an environment variable with a default:
+ warehouse = os.getenv("SNOWFLAKE_WAREHOUSE", "COMPUTE_WH") conn = snowflake.connector.connect( user=user, account=account, private_key=private_key, - warehouse="COMPUTE_WH", + warehouse=warehouse, )
102-112: LGTM!The fixture renaming from
connection_infototpch_connection_infoimproves clarity by explicitly indicating this is for TPCH sample data testing. The implementation correctly pulls credentials from environment variables and uses module scope for efficiency.
114-123: LGTM!The new fixture provides connection info for the WREN database created by
init_snowflake, enabling tests to work with the customcar_salestable. The structure mirrorstpch_connection_info, maintaining consistency.ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (3)
7-8: LGTM!The import changes correctly reference
DATAFUSION_FUNCTION_COUNTfrom the shared test configuration and importbase_urlandfunction_list_pathfrom the Snowflake-specific conftest, improving code organization.
64-79: LGTM!The test correctly uses the renamed
tpch_connection_infofixture and properly passes it in the request payload. The test logic for verifying scalar function execution is sound.
82-97: LGTM!The test correctly uses the renamed
tpch_connection_infofixture and properly passes it in the request payload. The test logic for verifying aggregate function execution is sound.
36789c3 to
ccac62d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)
404-414: Consider clarifying the comment on line 405.The comment states "The 4th column corresponds to the flattened value," but the code assigns
columns[0]to index 4, which is the 5th column (0-indexed). Snowflake's FLATTEN output is: SEQ (0), KEY (1), PATH (2), INDEX (3), VALUE (4), THIS (5).Consider this diff for clarity:
- // To get the flattened result, we need to override the output columns of the FLATTEN function. - // The 4th column corresponds to the flattened value, which we will alias to the desired output column name. + // Override the output columns of the FLATTEN function per Snowflake's schema. + // The VALUE column (5th column, index 4) is aliased to the user-specified output column name.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
wren-core-py/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
ibis-server/justfile(2 hunks)ibis-server/tests/routers/v3/connector/snowflake/conftest.py(3 hunks)ibis-server/tests/routers/v3/connector/snowflake/test_functions.py(3 hunks)ibis-server/tests/routers/v3/connector/snowflake/test_query.py(1 hunks)wren-core/core/src/mdl/dialect/inner_dialect.rs(4 hunks)wren-core/core/src/mdl/dialect/wren_dialect.rs(1 hunks)wren-core/core/src/mdl/mod.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- ibis-server/justfile
- ibis-server/tests/routers/v3/connector/snowflake/conftest.py
🧰 Additional context used
🧬 Code graph analysis (5)
wren-core/core/src/mdl/dialect/inner_dialect.rs (2)
wren-core/core/src/mdl/dialect/wren_dialect.rs (4)
unparse_unnest_table_factor(104-112)relation_alias_overrides(114-121)new(131-135)unnest_as_table_factor(73-75)ibis-server/app/model/data_source.py (1)
DataSource(58-209)
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (1)
ibis-server/tests/routers/v3/connector/snowflake/conftest.py (1)
tpch_connection_info(103-111)
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)
wren-core/core/src/mdl/dialect/inner_dialect.rs (4)
unparse_unnest_table_factor(75-82)unparse_unnest_table_factor(360-445)relation_alias_overrides(84-90)relation_alias_overrides(447-476)
wren-core/core/src/mdl/mod.rs (1)
wren-core-base/src/mdl/builder.rs (5)
new(46-58)new(105-119)new(190-205)new(289-298)new(325-337)
ibis-server/tests/routers/v3/connector/snowflake/test_query.py (3)
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (1)
manifest_str(29-30)ibis-server/tests/conftest.py (1)
client(18-23)ibis-server/tests/routers/v3/connector/snowflake/conftest.py (1)
snowflake_connection_info(115-123)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test
- GitHub Check: ci
- GitHub Check: cargo check
🔇 Additional comments (9)
wren-core/core/src/mdl/dialect/wren_dialect.rs (2)
104-112: LGTM! Clean delegation pattern.The passthrough to
inner_dialect.unparse_unnest_table_factorcorrectly enables dialect-specific UNNEST handling (e.g., Snowflake's FLATTEN). The underscore-prefixed parameters appropriately indicate they're forwarded without inspection.
114-121: LGTM! Proper delegation for alias overrides.The passthrough enables dialect-specific relation alias customization, supporting the Snowflake FLATTEN column mapping implemented in the inner dialect.
wren-core/core/src/mdl/dialect/inner_dialect.rs (4)
355-358: LGTM! Enables FLATTEN as table factor.Returning
truecorrectly signals that Snowflake supports UNNEST (via FLATTEN) as a table factor in FROM clauses.
360-402: LGTM! Proper validation for FLATTEN transformation.The implementation correctly:
- Validates the
Unnest(Projection(EmptyRelation))pattern to ensure safe transformation- Enforces Snowflake FLATTEN's single-argument limitation with clear error messages
- Enforces single output column per PR scope
416-444: LGTM! Correct FLATTEN function construction.The code properly constructs a
TABLE(FLATTEN(...))expression with the 6-column alias, matching Snowflake's FLATTEN output schema and generating the appropriate unnamed subquery alias.
447-476: LGTM! Alias override logic is correct.The implementation correctly updates the VALUE column alias (index 4) when a user-provided table alias is present, preserving the other FLATTEN output columns. The prefix check ensures this only applies to auto-generated FLATTEN subqueries.
wren-core/core/src/mdl/mod.rs (1)
3814-3840: LGTM! Comprehensive Snowflake UNNEST test.The test correctly validates the transformation of UNNEST syntax into Snowflake's
TABLE(FLATTEN(...))with all six output columns (SEQ, KEY, PATH, INDEX, item, THIS). The assertion matches the expected Snowflake-specific SQL output.ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (2)
64-79: LGTM! Clearer parameter naming.Renaming
connection_infototpch_connection_infoimproves clarity by indicating that this test uses the TPCH sample dataset connection, distinguishing it from the newsnowflake_connection_infofixture.
82-97: LGTM! Consistent parameter naming.The rename to
tpch_connection_infomatches the pattern established intest_scalar_functionand clarifies the data source being tested.
| async def test_qeury(client, manifest_str, snowflake_connection_info): | ||
| response = await client.post( | ||
| url=f"{base_url}/query", | ||
| json={ | ||
| "connectionInfo": snowflake_connection_info, | ||
| "manifestStr": manifest_str, | ||
| "sql": "select t.a from car_sales c, UNNEST(to_array(get_path(c.src, 'customer'))) t(a)", | ||
| }, | ||
| headers={ | ||
| X_WREN_FALLBACK_DISABLE: "true", | ||
| }, | ||
| ) | ||
| assert response.status_code == 200 | ||
| result = response.json() | ||
| assert len(result["data"]) == 2 | ||
| assert result["data"] == [ | ||
| [ | ||
| '{"address":"San Francisco, CA","name":"Joyce Ridgely","phone":"16504378889"}' | ||
| ], | ||
| [ | ||
| '{"address":"New York, NY","name":"Bradley Greenbloom","phone":"12127593751"}' | ||
| ], | ||
| ] |
There was a problem hiding this comment.
Fix typo in function name.
The test function is named test_qeury but should be test_query.
Apply this diff:
-async def test_qeury(client, manifest_str, snowflake_connection_info):
+async def test_query(client, manifest_str, snowflake_connection_info):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_qeury(client, manifest_str, snowflake_connection_info): | |
| response = await client.post( | |
| url=f"{base_url}/query", | |
| json={ | |
| "connectionInfo": snowflake_connection_info, | |
| "manifestStr": manifest_str, | |
| "sql": "select t.a from car_sales c, UNNEST(to_array(get_path(c.src, 'customer'))) t(a)", | |
| }, | |
| headers={ | |
| X_WREN_FALLBACK_DISABLE: "true", | |
| }, | |
| ) | |
| assert response.status_code == 200 | |
| result = response.json() | |
| assert len(result["data"]) == 2 | |
| assert result["data"] == [ | |
| [ | |
| '{"address":"San Francisco, CA","name":"Joyce Ridgely","phone":"16504378889"}' | |
| ], | |
| [ | |
| '{"address":"New York, NY","name":"Bradley Greenbloom","phone":"12127593751"}' | |
| ], | |
| ] | |
| async def test_query(client, manifest_str, snowflake_connection_info): | |
| response = await client.post( | |
| url=f"{base_url}/query", | |
| json={ | |
| "connectionInfo": snowflake_connection_info, | |
| "manifestStr": manifest_str, | |
| "sql": "select t.a from car_sales c, UNNEST(to_array(get_path(c.src, 'customer'))) t(a)", | |
| }, | |
| headers={ | |
| X_WREN_FALLBACK_DISABLE: "true", | |
| }, | |
| ) | |
| assert response.status_code == 200 | |
| result = response.json() | |
| assert len(result["data"]) == 2 | |
| assert result["data"] == [ | |
| [ | |
| '{"address":"San Francisco, CA","name":"Joyce Ridgely","phone":"16504378889"}' | |
| ], | |
| [ | |
| '{"address":"New York, NY","name":"Bradley Greenbloom","phone":"12127593751"}' | |
| ], | |
| ] |
🤖 Prompt for AI Agents
In ibis-server/tests/routers/v3/connector/snowflake/test_query.py around lines
34 to 56, the test function is misspelled as `test_qeury`; rename the function
to `test_query` (update the `def` line only) so the test runner recognizes it,
and ensure there are no other references to the old name in this file or imports
that need updating.
To support
UNNESTfor Snowflake, we transformUNNESTtoFLATTENsyntax for Snowflake. For example:It will be transformed to
There are some requirements to operate a
variantvalue byUNNEST. BecauseUNNESTonly acceptsARRAYin DataFusion, after usingGET_PATHto extract an array from a variant ('src), we need to useto_arrayto provide the type information forUNNEST`.Limitation
Cunnetly, we only support unnesting an
ARRAYvalue.structorobjectcan't be unnested correctly.Summary by CodeRabbit
New Features
Tests
Chores