fix(core): disable ReplaceDistinctWithAggregate rule#1294
fix(core): disable ReplaceDistinctWithAggregate rule#1294douenergy merged 1 commit intoCanner:mainfrom
ReplaceDistinctWithAggregate rule#1294Conversation
WalkthroughIntroduces a separate execution context in PySessionContext alongside the unparsing context, updates default and constructor wiring, and routes function discovery through the execution context. In core MDL, enables information_schema and disables the ReplaceDistinctWithAggregate rule for unparse optimization. Adds a snapshot test verifying DISTINCT is not rewritten to GROUP BY. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor PyCaller as Python Caller
participant PyCtx as PySessionContext::new
participant MDLCtx as create_ctx_with_mdl(Mode::Unparse)
participant Exec as LocalRuntime::SessionContext
PyCaller->>PyCtx: new(mdl_base64: Option<String>)
alt mdl_base64 is Some
PyCtx->>MDLCtx: build unparsing ctx
PyCtx->>Exec: build execution ctx
PyCtx-->>PyCaller: PySessionContext{ ctx=unparser_ctx, exec_ctx=exec_ctx }
else mdl_base64 is None
PyCtx->>Exec: reuse provided ctx as exec_ctx
PyCtx-->>PyCaller: PySessionContext{ ctx=ctx.clone(), exec_ctx=ctx }
end
PyCaller->>PySessionContext: get_available_functions()
PySessionContext->>Exec: query registered functions
Exec-->>PySessionContext: functions
PySessionContext-->>PyCaller: functions
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
wren-core/core/src/mdl/context.rs (1)
227-229: Improve disable comment with tracking linkTo ensure future maintainability, add a reference to the upstream issue where this fragility is tracked so it’s easy to revert once fixed.
• File: wren-core/core/src/mdl/context.rs
Lines: 227–229Suggested update:
-// Unparser has some issues for handling derived table generated by ReplaceDistinctWithAggregate rule +// Unparser has issues with derived tables generated by ReplaceDistinctWithAggregate. // See: https://github.com/apache/datafusion/issues/<issue_number> (re-enable when fixed) // Arc::new(ReplaceDistinctWithAggregate::new()),wren-core/core/src/mdl/mod.rs (1)
1709-1743: New DISTINCT test covers the intended behavior; add a cheap guard against GROUP BY regressionsSnapshot is good. Add an explicit assertion to ensure GROUP BY isn't introduced and (optionally) validate the generated SQL still round-trips via DataFusion to catch unparser regressions early.
Apply within this test:
let result = transform_sql_with_ctx( &ctx, Arc::clone(&analyzed_mdl), &[], Arc::new(HashMap::new()), sql, ) .await?; + // Ensure DISTINCT wasn't rewritten into a GROUP BY + assert!( + !result.contains(" GROUP BY "), + "Unexpected GROUP BY rewrite for DISTINCT" + ); + // Optional: sanity check the output SQL is still executable/round-trippable + // (helper is already available in this module) + assert_sql_valid_executable(&result).await?; assert_snapshot!( result, @"SELECT DISTINCT customer.c_custkey, customer.c_name FROM (SELECT customer.c_custkey, customer.c_name FROM (SELECT __source.c_custkey AS c_custkey, __source.c_name AS c_name FROM customer AS __source) AS customer) AS customer" );Follow-up ideas (no change required now):
- Add coverage for DISTINCT on a single column and for DISTINCT nested in a subquery.
wren-core-py/src/context.rs (1)
171-188: Creating distinct unparser_ctx and exec_ctx from the same base state is correct; minor readability tweakUsing create_ctx_with_mdl for both modes ensures consistent registrations (UDFs, config) via new_from_existing(state). Consider renaming the local variable ctx (the base context) to base_ctx for clarity in this constructor.
- let ctx = wren_core::SessionContext::new_with_config(config); + let base_ctx = wren_core::SessionContext::new_with_config(config); ... - .block_on(create_ctx_with_mdl( - &ctx, + .block_on(create_ctx_with_mdl( + &base_ctx, Arc::clone(&analyzed_mdl), Arc::clone(&properties_ref), mdl::context::Mode::Unparse, )) ... - .block_on(create_ctx_with_mdl( - &ctx, + .block_on(create_ctx_with_mdl( + &base_ctx, Arc::clone(&analyzed_mdl), Arc::clone(&properties_ref), mdl::context::Mode::LocalRuntime, ))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
wren-core-py/src/context.rs(6 hunks)wren-core/core/src/mdl/context.rs(2 hunks)wren-core/core/src/mdl/mod.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
wren-core-py/src/context.rs (1)
wren-core/core/src/mdl/context.rs (2)
new(319-341)create_ctx_with_mdl(45-107)
wren-core/core/src/mdl/mod.rs (1)
wren-core-py/src/context.rs (2)
new(81-203)default(62-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: cargo test (win64)
- GitHub Check: test
- GitHub Check: cargo test (macos)
- GitHub Check: cargo check
- GitHub Check: cargo test (macos-aarch64)
- GitHub Check: ci
🔇 Additional comments (5)
wren-core/core/src/mdl/context.rs (1)
69-71: Enable information_schema for all derived contexts — good alignment with Python function discoveryTurning on information_schema in create_ctx_with_mdl ensures every Mode (Unparse, LocalRuntime, PermissionAnalyze) exposes routines/views that PySessionContext enumerates. This matches the Python side’s query against information_schema.routines.
wren-core-py/src/context.rs (4)
49-53: exec_ctx separation from ctx (unparser) is a solid design choiceClear split between transformation (ctx) and execution (exec_ctx) avoids optimizer rule leakage into unparser behavior and enables accurate function enumeration from the runtime context.
61-66: Default initializers keep both contexts usable out of the boxInitializing exec_ctx alongside ctx in Default is consistent and avoids None-guarding elsewhere. Good.
120-126: When no MDL is provided, reusing the same configured SessionContext for both ctx and exec_ctx is fineBecause the base ctx is created with SessionConfig::default().with_information_schema(true), get_available_functions will work in this path as well.
229-235: Rename typo inget_regietered_functionstoget_registered_functionsTo keep the codebase consistent and improve discoverability, update all three occurrences:
• wren-core-py/src/context.rs:98
Change- .block_on(Self::get_regietered_functions(&ctx)) + .block_on(Self::get_registered_functions(&ctx))• wren-core-py/src/context.rs:229
Change- .block_on(Self::get_regietered_functions(&self.exec_ctx)) + .block_on(Self::get_registered_functions(&self.exec_ctx))• wren-core-py/src/context.rs:320 (function definition)
Change- async fn get_regietered_functions( + async fn get_registered_functions( ctx: &wren_core::SessionContext, ) -> PyResult<Vec<RemoteFunctionDto>> { … }All references to the old, misspelled name have been identified (rg found only these three instances), so applying this patch completes the rename.
DataFusion unparser has some issues when unparsing the derived table generated by ReplaceDistinctWithAggregate. Disable it to avoid generating invalid plans.
Summary by CodeRabbit
New Features
Bug Fixes
Tests