feat(core): introduce dialect-specific function list and refactor BigQuery function lists by goldmedal · Pull Request #1366 · Canner/wren-engine

goldmedal · 2025-11-11T05:48:50Z

Description

To provide accurate functions, this PR redefines the function list by the signature of DataFusion. Most functions could have multiple signatures; We defined the function lists by a simple record of csv. It isn't enough to define multiple signatures.

The InnerDialect has 3 methods for the supported functions, which will be registered when the session context is created

    fn supported_udfs(&self) -> Vec<Arc<datafusion::logical_expr::ScalarUDF>> {
        scalar_functions()
    }

    fn supported_udafs(&self) -> Vec<Arc<datafusion::logical_expr::AggregateUDF>> {
        aggregate_functions()
    }

    fn supported_udwfs(&self) -> Vec<Arc<datafusion::logical_expr::WindowUDF>> {
        window_functions()
    }

We can define the supported function for each data source. If the list isn't defined, we register the default function list.

BigQuery Function list

Remove the CSV-based function list.
Try to register all the common functions of BigQuery in wren-core/core/src/mdl/function/dialect/bigquery.
- BigQuery Function
- Some special functions are excluded: AEAD, geo functions, search functions, ...
Backward support the white list function in ibis-server/resources/white_function_list/bigquery.csv

Summary by CodeRabbit

New Features
- BigQuery dialect added with a large set of native scalar, aggregate and window functions for richer SQL.
Bug Fixes
- HTTP 422 mapping updated for more consistent error responses.
Improvements
- Session/context and rewrite flows now propagate data-source info; remote-function registration is gated by data source (BigQuery skips remote function registration).

coderabbitai · 2025-11-11T05:49:03Z

Warning

Rate limit exceeded

@goldmedal has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 0030b62 and cbe2cd1.

📒 Files selected for processing (1)

wren-core/core/src/mdl/dialect/inner_dialect.rs (5 hunks)

Walkthrough

Threads an optional data_source through ibis-server, wren-core-py, and wren-core; gates remote-function registration for BigQuery; adds DataSource parsing/error; exposes a BigQuery dialect with many scalar/aggregate/window UDFs; updates UDF macros, ReturnType handling, and call-sites to accept data_source.

Changes

Cohort / File(s)	Summary
ibis-server: data_source threading & remote-function gating `ibis-server/app/config.py`, `ibis-server/app/mdl/core.py`, `ibis-server/app/mdl/rewriter.py`, `ibis-server/tools/query_local_run.py`, `ibis-server/wren/session/__init__.py`	Thread `data_source` into session/context and rewriter APIs; special-case `bigquery` in remote function list path (return `None` to skip); update calls to pass `data_source` into EmbeddedEngineRewriter.rewrite_sync and SessionContext.
ibis-server: HTTP status constant `ibis-server/app/model/error.py`	Replace `HTTP_422_UNPROCESSABLE_ENTITY` with `HTTP_422_UNPROCESSABLE_CONTENT` in import and fallback mapping.
wren-core-base: DataSource parsing & error `wren-core-base/src/mdl/manifest.rs`	Add `ParsedDataSourceError` and `FromStr` impl for `DataSource` to parse strings → DataSource with explicit error.
wren-core-py: PySessionContext & registration gating `wren-core-py/src/context.rs`, `wren-core-py/src/errors.rs`	PySessionContext::new gains `data_source` param; resolve data_source from manifest or arg; register remote functions only when appropriate (skip for BigQuery); add From → CoreError mapping.
wren-core (core): create_wren_ctx & dialect-aware UDF registration `wren-core/core/src/mdl/mod.rs`, `wren-core/core/src/mdl/dialect/mod.rs`, `wren-core/core/src/mdl/dialect/inner_dialect.rs`	`create_wren_ctx` now accepts optional `data_source`; selects inner dialect and registers dialect-specific scalar/aggregate/window UDFs; make `inner_dialect` public; add supported_udfs/udafs/udwfs trait methods; refactor BigQuery date-diff handling.
wren-core: function dialect & utilities `wren-core/core/src/mdl/function/mod.rs`, `wren-core/core/src/mdl/function/dialect/mod.rs`, `wren-core/core/src/mdl/function/dialect/utils.rs`	Add public `dialect` module and `bigquery` submodule; add `build_document` helper for UDF docs; expose bigquery registry entry points.
wren-core: BigQuery scalar functions (large addition) `wren-core/core/src/mdl/function/dialect/bigquery/scalar.rs`	Add a comprehensive collection of BigQuery-like scalar UDF definitions (arrays, date/time, JSON, string, numeric, encodings, regex, etc.) with signatures and docs.
wren-core: BigQuery aggregate & window functions `wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs`, `wren-core/core/src/mdl/function/dialect/bigquery/window.rs`	Add BigQuery aggregate UDAFs and window UDWFs (e.g., any_value, approx_, group_concat, percentile_), with signatures, aliases, and documentation.
wren-core: function registration, ReturnType & macros `wren-core/core/src/mdl/function/remote_function.rs`, `wren-core/core/src/mdl/function/macros.rs`, `wren-core/core/src/mdl/function/scalar/mod.rs`	Add `ArrayOfInputFirstArgument` ReturnType; update ByPass* constructors to accept ReturnType/Signature/Documentation and aggregate aliases; rename `make_udf_function`→`make_datafusion_udf_function` and add expression-based `make_udf_function`/`make_udaf_function`/`make_udwf_function` using LazyLock singletons.
Examples & tests: call-site updates `wren-core/sqllogictest/src/test_context.rs`, `wren-core/wren-example/examples/*.rs`, `wren-core-py/tests/test_modeling_core.py`	Update call sites to pass new `data_source` argument (often `None`); adjust a test manifest dataSource from `"bigquery"` → `"datafusion"`; minor formatting adjustments.

Sequence Diagram(s)

sequenceDiagram
    participant ibis as ibis-server
    participant wren_py as wren-core-py
    participant wren_rs as wren-core (Rust)
    participant df as DataFusion

    ibis->>wren_py: PySessionContext::new(mdl_base64, remote_path, props, data_source)
    wren_py->>wren_py: parse manifest (if present)
    wren_py->>wren_py: resolve data_source (manifest preferred)
    alt data_source == "bigquery"
        wren_py->>wren_py: skip remote function registration
    else
        wren_py->>wren_py: read & register remote functions (CSV)
    end
    wren_py->>wren_rs: create_wren_ctx(Some(config), data_source)
    wren_rs->>wren_rs: get_inner_dialect(data_source)
    alt dialect present
        wren_rs->>df: register dialect-supported scalar/aggregate/window UDFs
    else
        wren_rs->>df: register default function sets
    end
    wren_rs-->>wren_py: return SessionContext
    wren_py-->>ibis: context initialized

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas to focus during review:

BigQuery scalar UDFs file (.../bigquery/scalar.rs) — large volume, many signatures and docs.
Macros and function-registration changes (macros.rs, remote_function.rs) — macro semantics, LazyLock singletons, constructor signature migrations.
DataSource parsing and error mapping (manifest.rs, wren-core-py/src/errors.rs) — FromStr and error conversions.
PySessionContext changes (wren-core-py/src/context.rs) — data_source resolution precedence and gating of remote-function registration.
create_wren_ctx and cross-language call-sites — ensure consistent data_source propagation and dialect registration.

Possibly related PRs

fix(ibis): refactor the available function list #1112 — modifies remote-function registration and data-source-aware session/context creation (BigQuery gating and data_source propagation).
refactor(core): self-manage the list of the core functions #1346 — overlaps on function-registration surface and create_wren_ctx API changes (UDF/dialect wiring).
chore: fix BigQuery v3 test cases #1031 — related to skipping BigQuery remote-function registration and test adjustments for BigQuery function lists.

Suggested reviewers

douenergy
wwwy3y3

Poem

🐰 I hopped through manifests and threaded tiny strings,

nudged data_source down tunnels to brighter things.
BigQuery hid its list, while UDF meadows grew,
crates, macros, and docs — a rabbit's tidy view.
Cheers for the code — carrots for every new queue!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.52% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: introducing dialect-specific function lists and refactoring BigQuery functions, which aligns with the extensive changes across dialect registration, UDF macros, and BigQuery-specific function definitions.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

goldmedal · 2025-11-11T05:52:56Z

BigQuery has been tested locally

poetry run pytest -m 'bigquery'
========================================================================================================= test session starts ==========================================================================================================
platform darwin -- Python 3.11.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /Users/jax/git/wren-engine/ibis-server
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 396 items / 349 deselected / 47 selected                                                                                                                                                                                     

tests/routers/v2/connector/test_bigquery.py ..................                                                                                                                                                                   [ 38%]
tests/routers/v3/connector/bigquery/test_functions.py ............                                                                                                                                                               [ 63%]
tests/routers/v3/connector/bigquery/test_query.py .................

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

ibis-server/app/mdl/rewriter.py (1)
159-169: Ensure consistency: rewrite_sync should also pass data_source.

The async rewrite method passes data_source to get_session_context (line 149), but rewrite_sync does not (line 164-166). This inconsistency could lead to different behavior between sync and async execution paths.

Apply this diff to align both methods:
     def rewrite_sync(
         self, manifest_str: str, sql: str, properties: dict | None = None
     ) -> str:
         try:
             processed_properties = self.get_session_properties(properties)
             session_context = get_session_context(
-                manifest_str, self.function_path, processed_properties
+                manifest_str,
+                self.function_path,
+                processed_properties,
+                self.data_source.name if self.data_source else None,
             )
             return session_context.transform_sql(sql)
         except Exception as e:
             raise WrenError(ErrorCode.INVALID_SQL, str(e), ErrorPhase.SQL_PLANNING)
wren-core-py/src/context.rs (1)

90-117: Honor the data_source argument when no manifest is supplied.

When mdl_base64 is None you currently discard the caller‑provided data_source. That leaves us unable to bootstrap a dialect-aware context (e.g. BigQuery) without first constructing a manifest, and it also means the later BigQuery remote-function skip never triggers. Please thread the parsed data_source through this branch as well (convert the &str to DataSource and pass it into create_wren_ctx).

🧹 Nitpick comments (2)

wren-core/core/src/mdl/function/dialect/utils.rs (1)

3-5: Consider unifying with remote_function.rs's build_document.

A similar build_document function exists in wren-core/core/src/mdl/function/remote_function.rs (lines 227-245) with richer parameter handling. If the two functions serve overlapping purposes, consider extracting a common builder utility to reduce duplication.

wren-core/core/src/mdl/dialect/mod.rs (1)

20-20: Consider re-exports instead of exposing inner_dialect directly.

Making inner_dialect public exposes its internal implementation. If only specific items from inner_dialect need to be public, consider using pub use re-exports to maintain a cleaner API surface and preserve encapsulation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 774e6fd and 83221b3.

⛔ Files ignored due to path filters (1)

ibis-server/resources/function_list/bigquery.csv is excluded by !**/*.csv

📒 Files selected for processing (24)

ibis-server/app/config.py (1 hunks)
ibis-server/app/mdl/core.py (1 hunks)
ibis-server/app/mdl/rewriter.py (3 hunks)
ibis-server/app/model/error.py (2 hunks)
ibis-server/tools/query_local_run.py (1 hunks)
wren-core-base/src/mdl/manifest.rs (6 hunks)
wren-core-py/src/context.rs (3 hunks)
wren-core-py/src/errors.rs (2 hunks)
wren-core/core/src/mdl/dialect/inner_dialect.rs (5 hunks)
wren-core/core/src/mdl/dialect/mod.rs (1 hunks)
wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs (1 hunks)
wren-core/core/src/mdl/function/dialect/bigquery/mod.rs (1 hunks)
wren-core/core/src/mdl/function/dialect/bigquery/scalar.rs (1 hunks)
wren-core/core/src/mdl/function/dialect/bigquery/window.rs (1 hunks)
wren-core/core/src/mdl/function/dialect/mod.rs (1 hunks)
wren-core/core/src/mdl/function/dialect/utils.rs (1 hunks)
wren-core/core/src/mdl/function/macros.rs (2 hunks)
wren-core/core/src/mdl/function/mod.rs (1 hunks)
wren-core/core/src/mdl/function/remote_function.rs (16 hunks)
wren-core/core/src/mdl/function/scalar/mod.rs (1 hunks)
wren-core/core/src/mdl/mod.rs (79 hunks)
wren-core/sqllogictest/src/test_context.rs (1 hunks)
wren-core/wren-example/examples/plan-sql.rs (2 hunks)
wren-core/wren-example/examples/to-many-calculation.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (16)

wren-core/core/src/mdl/function/dialect/utils.rs (1)

wren-core/core/src/mdl/function/remote_function.rs (1)

build_document (228-246)

wren-core-py/src/errors.rs (1)

wren-core-base/src/mdl/manifest.rs (1)

new (110-114)

wren-core/core/src/mdl/function/dialect/bigquery/window.rs (2)

wren-core/core/src/mdl/function/dialect/utils.rs (1)

build_document (3-5)

wren-core/core/src/mdl/function/remote_function.rs (4)

build_document (228-246)

new (187-199)

new (286-299)

new (386-398)

wren-core/wren-example/examples/to-many-calculation.rs (1)

wren-core/core/src/mdl/mod.rs (1)

create_wren_ctx (368-396)

ibis-server/tools/query_local_run.py (1)

wren-core/core/src/mdl/mod.rs (1)

data_source (327-329)

wren-core/sqllogictest/src/test_context.rs (1)

wren-core/core/src/mdl/mod.rs (1)

create_wren_ctx (368-396)

wren-core-py/src/context.rs (4)

wren-core/core/src/mdl/mod.rs (6)

mdl (198-233)

data_source (327-329)

new (129-184)

new (567-569)

default (65-73)

create_wren_ctx (368-396)

wren-core-base/manifest-macro/src/lib.rs (2)

data_source (61-116)

manifest (26-56)

wren-core-base/src/mdl/manifest.rs (2)

new (110-114)

from_str (158-183)

wren-core-py/src/manifest.rs (1)

to_manifest (18-23)

ibis-server/app/mdl/core.py (3)

wren-core/core/src/mdl/mod.rs (1)

data_source (327-329)

wren-core-base/manifest-macro/src/lib.rs (1)

data_source (61-116)

wren-core-base/src/mdl/builder.rs (1)

data_source (90-93)

wren-core/wren-example/examples/plan-sql.rs (2)

wren-core/core/src/mdl/mod.rs (3)

mdl (198-233)

create_wren_ctx (368-396)

transform_sql_with_ctx (416-476)

wren-core-base/manifest-macro/src/lib.rs (1)

manifest (26-56)

wren-core/core/src/mdl/function/remote_function.rs (1)

wren-core-py/src/context.rs (1)

new (82-226)

wren-core-base/src/mdl/manifest.rs (2)

wren-core-py/src/context.rs (1)

new (82-226)

wren-core-base/src/mdl/builder.rs (7)

new (46-58)

new (105-119)

new (190-205)

new (289-298)

new (325-337)

new (374-382)

new (404-411)

wren-core/core/src/mdl/dialect/inner_dialect.rs (5)

wren-core/core/src/mdl/dialect/utils.rs (2)

scalar_function_to_sql_internal (51-76)

args (30-48)

wren-core/core/src/mdl/function/dialect/bigquery/mod.rs (3)

bigquery_aggregate_functions (300-338)

bigquery_scalar_functions (73-298)

bigquery_window_functions (340-356)

wren-core/core/src/mdl/function/aggregate/mod.rs (1)

aggregate_functions (11-51)

wren-core/core/src/mdl/function/scalar/mod.rs (1)

scalar_functions (18-192)

wren-core/core/src/mdl/function/window/mod.rs (1)

window_functions (5-19)

ibis-server/app/mdl/rewriter.py (1)

ibis-server/app/model/data_source.py (1)

DataSource (60-214)

wren-core/core/src/mdl/mod.rs (5)

wren-core/core/src/mdl/dialect/inner_dialect.rs (1)

get_inner_dialect (112-121)

wren-core-base/manifest-macro/src/lib.rs (2)

data_source (61-116)

manifest (26-56)

wren-core-py/src/context.rs (1)

new (82-226)

wren-core/core/src/mdl/function/remote_function.rs (3)

new_with_return_type (201-211)

new_with_return_type (317-328)

new_with_return_type (400-410)

wren-core/core/src/logical_plan/utils.rs (1)

try_map_data_type (102-107)

wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs (2)

wren-core/core/src/mdl/function/remote_function.rs (8)

build_document (228-246)

new (187-199)

new (286-299)

new (386-398)

from (215-225)

from (332-342)

from (414-423)

new_with_alias (301-315)

wren-core/core/src/mdl/function/dialect/utils.rs (1)

build_document (3-5)

wren-core/core/src/mdl/function/dialect/bigquery/scalar.rs (2)

wren-core/core/src/mdl/function/remote_function.rs (4)

build_document (228-246)

new (187-199)

new (286-299)

new (386-398)

wren-core/core/src/mdl/function/dialect/utils.rs (1)

build_document (3-5)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: test
GitHub Check: cargo check
GitHub Check: ci

🔇 Additional comments (19)

ibis-server/app/config.py (1)

64-67: LGTM! BigQuery function list handled in Wren Core.

The early return for BigQuery is correct and aligns with the PR's introduction of dialect-specific function lists in Wren Core.

ibis-server/app/mdl/rewriter.py (1)

45-47: LGTM! Data source awareness added to embedded rewriter.

The change correctly threads data_source through to EmbeddedEngineRewriter for dialect-aware context creation.

ibis-server/app/model/error.py (1)

8-8: LGTM! Updated to correct Starlette constant.

The change from HTTP_422_UNPROCESSABLE_ENTITY to HTTP_422_UNPROCESSABLE_CONTENT aligns with modern Starlette conventions.

wren-core-py/src/errors.rs (1)

92-96: LGTM! Error conversion follows established patterns.

The From<ParsedDataSourceError> implementation is consistent with other error conversions in this file and properly integrates the new data source error type.

wren-core/core/src/mdl/function/dialect/mod.rs (1)

1-2: LGTM! Clean module organization.

The public bigquery module and private utils module are appropriately scoped for the dialect-specific function list feature.

wren-core/core/src/mdl/function/scalar/mod.rs (1)

12-12: LGTM! Macro rename improves clarity.

The rename from make_udf_function to make_datafusion_udf_function makes the DataFusion dependency explicit and improves code readability.

wren-core/wren-example/examples/to-many-calculation.rs (1)

20-20: LGTM: API update aligns with new data_source parameter.

The addition of the second None parameter correctly adapts to the updated create_wren_ctx signature, which now accepts an optional data_source.

wren-core/core/src/mdl/function/mod.rs (1)

2-2: LGTM: Public dialect module exposure.

The addition of pub mod dialect; appropriately exposes dialect-specific function registries for external access, supporting the BigQuery dialect implementation.

ibis-server/tools/query_local_run.py (1)

80-80: LGTM: data_source parameter threaded to SessionContext.

The addition of the data_source argument properly threads the data source context from the environment variable to the session initialization, enabling dialect-aware behavior.

wren-core/sqllogictest/src/test_context.rs (1)

66-66: LGTM: Test context updated for new API.

The addition of None as the second parameter correctly adapts the test harness to the updated create_wren_ctx signature, maintaining default (non-dialect-specific) behavior for tests.

ibis-server/app/mdl/core.py (1)

7-15: LGTM: Public API extended with data_source parameter.

The addition of the optional data_source parameter to get_session_context properly extends the public API while maintaining backward compatibility through the default None value. The parameter is correctly threaded to the underlying wren_core.SessionContext.

wren-core/wren-example/examples/plan-sql.rs (2)

7-8: LGTM: Updated imports for dialect support.

The addition of DataSource and create_wren_ctx imports correctly supports the dialect-aware context creation demonstrated in this example.

22-22: LGTM: Demonstrates BigQuery dialect usage.

The explicit use of Some(&DataSource::BigQuery) in the context creation effectively demonstrates the new dialect-aware functionality, ensuring BigQuery-specific UDFs are registered.

wren-core/core/src/mdl/function/macros.rs (4)

6-21: LGTM: Renamed macro for clarity.

Renaming the original macro to make_datafusion_udf_function clearly distinguishes it from the new expression-based make_udf_function macro, preventing confusion between the type-based (<$UDF>::new()) and expression-based approaches.

23-39: LGTM: Expression-based UDF macro added.

The new make_udf_function macro appropriately accepts a pre-constructed UDF expression ($UDF:expr) and wraps it in a singleton pattern. This enables more flexible UDF registration, particularly for dialect-specific functions.

41-57: LGTM: Expression-based UDAF macro added.

The make_udaf_function macro follows the same pattern as make_udf_function, providing consistent expression-based aggregate function registration.

59-75: LGTM: Expression-based UDWF macro added.

The make_udwf_function macro completes the set of expression-based macros, providing consistent window function registration following the same singleton pattern.

wren-core/core/src/mdl/function/dialect/bigquery/window.rs (2)

8-19: LGTM: percentile_cont window function definition.

The percentile_cont function is correctly defined with:

ReturnType::SameAsInput (appropriate for continuous percentiles)

Signature::any(2, Volatility::Immutable) (correct for 2-argument percentile function)

Clear documentation with SQL example

21-32: LGTM: percentile_disc window function definition.

The percentile_disc function is correctly defined with:

ReturnType::SameAsInput (appropriate for discrete percentiles)

Signature::any(2, Volatility::Immutable) (correct for 2-argument percentile function)

Clear documentation with SQL example

wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

wren-core-py/src/context.rs (1)

116-132: Catch disagreements between manifest and explicit data_source.

Right now we silently prefer the manifest value when both sources are present. If callers accidentally pass a conflicting data_source, they won’t discover the mismatch until much later. Please fail fast in that case so we surface the configuration error immediately.

You can parse the optional argument up front, compare, and only proceed when they agree:

-        // If the manifest has a data source, use it.
-        // Otherwise, if the data_source parameter is provided, use it.
-        // Otherwise, use None.
-        let data_source = if let Some(ds) = &manifest.data_source {
-            Some(*ds)
-        } else if let Some(ds_str) = data_source {
-            Some(DataSource::from_str(ds_str).map_err(CoreError::from)?)
-        } else {
-            None
-        };
+        let param_data_source = data_source
+            .map(|ds| DataSource::from_str(ds).map_err(CoreError::from))
+            .transpose()?;
+
+        if let (Some(manifest_ds), Some(param_ds)) = (manifest.data_source, param_data_source) {
+            if manifest_ds != param_ds {
+                return Err(CoreError::new(
+                    format!(
+                        "Manifest data_source '{manifest_ds:?}' conflicts with explicit data_source '{param_ds:?}'"
+                    )
+                    .as_str(),
+                )
+                .into());
+            }
+        }
+
+        // If the manifest has a data source, use it; otherwise fall back to the parsed argument.
+        let data_source = manifest.data_source.or(param_data_source);

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83221b3 and 253ffc0.

📒 Files selected for processing (4)

wren-core-py/src/context.rs (4 hunks)
wren-core-py/tests/test_modeling_core.py (2 hunks)
wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs (1 hunks)
wren-core/core/src/mdl/mod.rs (79 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

wren-core/core/src/mdl/mod.rs (3)

wren-core/core/src/mdl/dialect/inner_dialect.rs (1)

get_inner_dialect (112-121)

wren-core-py/src/context.rs (1)

new (82-208)

wren-core/core/src/mdl/function/remote_function.rs (3)

new_with_return_type (201-211)

new_with_return_type (317-328)

new_with_return_type (400-410)

wren-core-py/src/context.rs (6)

wren-core/core/src/mdl/mod.rs (7)

mdl (198-233)

data_source (327-329)

new (129-184)

new (567-569)

default (65-73)

create_wren_ctx (368-396)

register_remote_function (529-554)

wren-core-py/src/extractor.rs (2)

mdl_base64 (174-234)

new (22-29)

wren-core-base/manifest-macro/src/lib.rs (2)

data_source (61-116)

manifest (26-56)

wren-core-py/src/errors.rs (12)

new (18-22)

from (26-28)

from (32-34)

from (38-40)

from (44-46)

from (50-52)

from (56-65)

from (69-71)

from (75-77)

from (81-83)

from (87-89)

from (93-95)

wren-core-base/src/mdl/manifest.rs (2)

new (110-114)

from_str (158-183)

wren-core-py/src/manifest.rs (1)

to_manifest (18-23)

wren-core-py/tests/test_modeling_core.py (4)

wren-core/core/src/mdl/mod.rs (1)

data_source (327-329)

wren-core-base/manifest-macro/src/lib.rs (1)

data_source (61-116)

wren-core-base/src/mdl/py_method.rs (1)

data_source (49-51)

wren-core-base/src/mdl/builder.rs (1)

data_source (90-93)

wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs (2)

wren-core/core/src/mdl/function/remote_function.rs (8)

build_document (228-246)

new (187-199)

new (286-299)

new (386-398)

from (215-225)

from (332-342)

from (414-423)

new_with_alias (301-315)

wren-core/core/src/mdl/function/dialect/utils.rs (1)

build_document (3-5)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: ci
GitHub Check: test
GitHub Check: cargo check

🔇 Additional comments (4)

wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs (2)

93-104: Past issue resolved: array_concat_agg return type is now correct.

The previous review flagged that array_concat_agg was incorrectly using ReturnType::ArrayOfInputFirstArgument, which would double-wrap array types. The function now correctly uses ReturnType::SameAsInput, ensuring that input arrays maintain their type structure (e.g., ARRAY<INT64> stays ARRAY<INT64> rather than becoming ARRAY<ARRAY<INT64>>).

1-209: Well-structured BigQuery aggregate function definitions.

The implementation is solid:

All 14 UDAFs have correct return types matching BigQuery semantics

Signatures appropriately balance flexibility (any(n)) with type safety (coercible signatures for logical_and/logical_or)

Consistent, clear documentation with SQL examples for each function

Proper handling of the deprecated group_concat legacy function

The countif alias registration follows convention

wren-core-py/tests/test_modeling_core.py (1)

20-21: DataFusion fixture keeps the coverage relevant.

Switching the shared manifest to datafusion means the CSV-driven remote-function tests still exercise the registration path now that BigQuery relies on the dialect registry. Nice catch.

wren-core/core/src/mdl/mod.rs (1)

369-387: Dialect-aware registration looks solid.

Passing the manifest’s data source into create_wren_ctx and letting the inner dialect supply the UDF registries keeps the BigQuery lists isolated without disturbing the generic fallback.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ibis-server/wren/session/__init__.py (1)
29-31: Constructor call missing data_source argument for consistency.

The EmbeddedEngineRewriter constructor now accepts an optional data_source parameter (see rewriter.py:135-136), and the Context has self.data_source available. However, this constructor call doesn't pass it, relying instead on passing data_source as a parameter to rewrite_sync at line 100.

This is related to the inconsistency flagged in rewriter.py. For consistency with the Rewriter usage pattern (which passes data_source to the constructor), consider updating this call.

Apply this diff:
 self.rewriter = EmbeddedEngineRewriter(
-    get_config().get_remote_function_list_path(data_source)
+    get_config().get_remote_function_list_path(data_source),
+    data_source
 )
Then update line 100 to remove the parameter (assuming rewrite_sync is refactored per the previous comment):
 self.planned_sql = self.context.rewriter.rewrite_sync(
-    self.manifest, self.wren_sql, self.properties, self.context.data_source
+    self.manifest, self.wren_sql, self.properties
 )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 253ffc0 and c17a6a5.

📒 Files selected for processing (2)

ibis-server/app/mdl/rewriter.py (4 hunks)
ibis-server/wren/session/__init__.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

ibis-server/wren/session/__init__.py (1)

wren-core-base/manifest-macro/src/lib.rs (1)

manifest (26-56)

ibis-server/app/mdl/rewriter.py (2)

ibis-server/app/model/data_source.py (1)

DataSource (60-214)

ibis-server/app/mdl/core.py (1)

get_session_context (7-15)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: ci

ibis-server/app/mdl/rewriter.py

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

wren-core/core/src/mdl/dialect/inner_dialect.rs (1)

98-111: Consider adding documentation for the new trait methods.

The implementation correctly provides default methods for dialect-specific function registration, which aligns well with the PR objectives. However, adding doc comments would improve maintainability by explaining when and how these methods should be overridden.

Example documentation:

/// Define the supported UDFs for the dialect which will be registered in the execution context.
/// 
/// By default, returns the standard DataFusion scalar functions. Override this method
/// to provide dialect-specific scalar functions.
fn supported_udfs(&self) -> Vec<Arc<datafusion::logical_expr::ScalarUDF>> {
    scalar_functions()
}

/// Define the supported UDAFs for the dialect which will be registered in the execution context.
/// 
/// By default, returns the standard DataFusion aggregate functions. Override this method
/// to provide dialect-specific aggregate functions.
fn supported_udafs(&self) -> Vec<Arc<datafusion::logical_expr::AggregateUDF>> {
    aggregate_functions()
}

/// Define the supported UDWFs for the dialect which will be registered in the execution context.
/// 
/// By default, returns the standard DataFusion window functions. Override this method
/// to provide dialect-specific window functions.
fn supported_udwfs(&self) -> Vec<Arc<datafusion::logical_expr::WindowUDF>> {
    window_functions()
}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 269eed0 and 0030b62.

📒 Files selected for processing (3)

ibis-server/app/mdl/rewriter.py (4 hunks)
ibis-server/wren/session/__init__.py (1 hunks)
wren-core/core/src/mdl/dialect/inner_dialect.rs (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

ibis-server/app/mdl/rewriter.py

🧰 Additional context used

🧬 Code graph analysis (1)

wren-core/core/src/mdl/dialect/inner_dialect.rs (5)

wren-core/core/src/mdl/dialect/utils.rs (2)

scalar_function_to_sql_internal (51-76)

args (30-48)

wren-core/core/src/mdl/function/dialect/bigquery/mod.rs (3)

bigquery_aggregate_functions (300-338)

bigquery_scalar_functions (73-298)

bigquery_window_functions (340-356)

wren-core/core/src/mdl/function/aggregate/mod.rs (1)

aggregate_functions (11-51)

wren-core/core/src/mdl/function/scalar/mod.rs (1)

scalar_functions (18-192)

wren-core/core/src/mdl/function/window/mod.rs (1)

window_functions (5-19)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: test
GitHub Check: ci
GitHub Check: cargo check

🔇 Additional comments (4)

ibis-server/wren/session/__init__.py (1)

99-103: LGTM! Formatting improvement enhances readability.

The multi-line formatting of the rewrite_sync call improves code readability while preserving the original semantics.

wren-core/core/src/mdl/dialect/inner_dialect.rs (3)

20-21: LGTM!

The Arc import is necessary for the new trait method return types.

23-26: LGTM!

The imports correctly bring in both BigQuery-specific and default function lists needed for the new dialect-specific registration mechanism.

152-162: LGTM!

The BigQuery dialect correctly overrides the new methods to provide BigQuery-specific function lists, enabling proper function registration for BigQuery data sources.

wren-core/core/src/mdl/dialect/inner_dialect.rs

goldmedal · 2025-11-11T08:41:14Z

The Oracle test is flaky in github flow. I have tested it locally

poetry run pytest -m 'oracle'
=========================================================================== test session starts ============================================================================
platform darwin -- Python 3.11.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /Users/jax/git/wren-engine/ibis-server
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 396 items / 377 deselected / 19 selected                                                                                                                         

tests/routers/v2/connector/test_oracle.py ...........                                                                                                                             [ 57%]
tests/routers/v3/connector/oracle/test_function.py ...                                                                                                                            [ 73%]
tests/routers/v3/connector/oracle/test_query.py .....

douenergy · 2025-11-18T08:19:38Z

Thanks @goldmedal

…Query function lists (Canner#1366)

goldmedal added 3 commits November 11, 2025 13:37

redefine bigquery function in wren core

459f7b6

modify the usage for SessionContext

52293a7

remove debug info

83221b3

github-actions bot added core bigquery ibis rust Pull requests that update Rust code python Pull requests that update Python code labels Nov 11, 2025

coderabbitai bot reviewed Nov 11, 2025

View reviewed changes

wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs Show resolved Hide resolved

goldmedal added 5 commits November 11, 2025 13:56

fix check

0ffc84d

fix test

1475847

fix return type for array_concat_agg

3a2b69d

fix function test

253ffc0

fix rewrite_sync

c17a6a5

coderabbitai bot reviewed Nov 11, 2025

View reviewed changes

ibis-server/app/mdl/rewriter.py Show resolved Hide resolved

goldmedal added 2 commits November 11, 2025 15:00

fix btrim test

269eed0

address comments and add doc

0030b62

coderabbitai bot reviewed Nov 11, 2025

View reviewed changes

wren-core/core/src/mdl/dialect/inner_dialect.rs Show resolved Hide resolved

wren-core/core/src/mdl/dialect/inner_dialect.rs Outdated Show resolved Hide resolved

goldmedal added 2 commits November 11, 2025 15:32

fix typo

243acd0

enhance error message

cbe2cd1

goldmedal requested a review from douenergy November 11, 2025 08:41

douenergy approved these changes Nov 18, 2025

View reviewed changes

douenergy merged commit 180af86 into Canner:main Nov 18, 2025
9 of 10 checks passed

coderabbitai bot mentioned this pull request Dec 16, 2025

feat(knowledge): introduce the generic SQL knowledge #1389

Merged

coderabbitai bot mentioned this pull request Jan 5, 2026

feat(ibis): Add Spark connector #1398

Merged

coderabbitai bot mentioned this pull request Mar 9, 2026

feat(wren-engine): add Apache Doris connector support #1430

Merged

nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026

feat(core): introduce dialect-specific function list and refactor Big…

9b78648

…Query function lists (Canner#1366)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): introduce dialect-specific function list and refactor BigQuery function lists#1366

feat(core): introduce dialect-specific function list and refactor BigQuery function lists#1366
douenergy merged 12 commits intoCanner:mainfrom
goldmedal:chore/enhance-bigquery-func

goldmedal commented Nov 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 11, 2025 •

edited

Loading

Rate limit exceeded

Uh oh!

goldmedal commented Nov 11, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

goldmedal commented Nov 11, 2025

Uh oh!

douenergy commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

goldmedal commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

BigQuery Function list

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

goldmedal commented Nov 11, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

goldmedal commented Nov 11, 2025

Uh oh!

douenergy commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

goldmedal commented Nov 11, 2025 •

edited

Loading

coderabbitai bot commented Nov 11, 2025 •

edited

Loading