Skip to content

feat(core): add BigQuery EXTRACT function support and window frame validation#1279

Merged
douenergy merged 7 commits intoCanner:mainfrom
goldmedal:chore/support-bgquery-extract
Aug 14, 2025
Merged

feat(core): add BigQuery EXTRACT function support and window frame validation#1279
douenergy merged 7 commits intoCanner:mainfrom
goldmedal:chore/support-bgquery-extract

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Aug 12, 2025

Description

  • DataFusion planner will plan a EXTRACT statement to a function call date_part. However, it's not a valid function for BigQuery. We should unparse the date_part function back to EXTRACT statement
  • BigQuery only allows the aggregation to have the window frame clause. Use window_func_support_window_frame to check if a window function call should generate the window frame clause.

Summary by CodeRabbit

  • New Features

    • BigQuery: date_part now emits EXTRACT(field FROM expr) SQL with broader field-name support and stricter argument validation.
    • BigQuery: adjusted window-function framing behavior for several ranking/analytic functions to match BigQuery semantics.
  • Refactor

    • Safer, more idiomatic internal logic and a minor signature/lifetime clarification (no behavioral change).
  • Tests

    • Added BigQuery EXTRACT and window-function round-trip tests and minor test cleanups.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Aug 12, 2025

Walkthrough

BigQuery-specific rewrite of date_part to EXTRACT, a dialect hook for window-frame support with WrenDialect forwarding, added BigQuery tests, and small refactors: explicit Cow lifetime, Option accumulation refactor, and a test allocation change.

Changes

Cohort / File(s) Summary
Utils: explicit lifetime in Cow
wren-core-base/src/mdl/utils.rs
quote_identifier return type changed from Cow<str> to Cow<'_, str>; implementation unchanged.
Analyze: option-combining refactor
wren-core/core/src/logical_plan/analyze/model_generation.rs
Rewrote rls_filter accumulation to pattern-match Option values (avoids unwrap); simplified error propagation in CalculationPlan branch; no behavior change.
BigQuery: date_part → EXTRACT + helpers + imports
wren-core/core/src/mdl/dialect/inner_dialect.rs
BigQueryDialect now rewrites date_part into EXTRACT(field FROM expr) via scalar_function_to_sql_overrides, validates args, maps datetime field strings through datetime_field_from_expr/datetime_field_from_str, and returns plan_err on misuse; adjusted imports.
Dialect API: window frame hook
wren-core/core/src/mdl/dialect/inner_dialect.rs, wren-core/core/src/mdl/dialect/wren_dialect.rs
Added window_func_support_window_frame(&self, func_name, start_bound, end_bound) -> bool to InnerDialect (default true); BigQueryDialect overrides to disable framing for certain functions; WrenDialect forwards to inner dialect.
Tests: BigQuery extract & window-frame behavior
wren-core/core/src/mdl/mod.rs
Added async tests test_extract_roundtrip_bigquery and test_window_functions_without_frame_bigquery validating BigQuery EXTRACT rendering and window-function frame behavior in unparse mode.
Tests: avoid clone in construction
wren-core/core/src/mdl/function.rs
Test changed from &[list_type.clone()] to std::slice::from_ref(&list_type).unwrap() to avoid cloning in the test.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Test/Client
  participant Analyzer as Analyzer
  participant Dialect as BigQueryDialect
  participant Helper as datetime_field_from_*
  participant Unparser as Unparser

  Test->>Analyzer: analyze/unparse scalar function call date_part(kind, expr)
  Analyzer->>Dialect: scalar_function_to_sql_overrides("date_part", args)
  Dialect->>Helper: datetime_field_from_expr(kind)
  Helper-->>Dialect: DateTimeField or plan_err
  Dialect->>Unparser: build EXTRACT(DateTimeField FROM unparse(expr))
  Unparser-->>Analyzer: SQL AST / SQL string
  Analyzer-->>Test: transformed SQL
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

Suggested labels

bigquery

Suggested reviewers

  • douenergy
  • wwwy3y3

Poem

In burrows of bytes I hop and prance,
EXTRACTs and lifetimes join the dance.
Options fold tidy, tests applaud the sight,
Dialects whisper frames left or right.
A little rabbit cheers the code tonight. 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4657ec4 and 274e820.

📒 Files selected for processing (2)
  • wren-core/core/src/mdl/dialect/inner_dialect.rs (3 hunks)
  • wren-core/core/src/mdl/mod.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-core/core/src/mdl/mod.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)
wren-core/core/src/mdl/dialect/wren_dialect.rs (4)
  • window_func_support_window_frame (99-110)
  • scalar_function_to_sql_overrides (60-89)
  • args (127-130)
  • args (166-181)
🔇 Additional comments (7)
wren-core/core/src/mdl/dialect/inner_dialect.rs (7)

22-22: LGTM! Import added for error handling.

The plan_err import from datafusion::common is correctly added to support the error handling in the new date_part validation logic.


26-27: LGTM! New imports properly support BigQuery EXTRACT functionality.

The addition of ScalarValue for literal handling and ExtractSyntax, WindowFrameBound from sqlparser AST correctly supports the new BigQuery-specific functionality.


57-64: LGTM! Well-designed extension to InnerDialect trait.

The new window_func_support_window_frame method provides a clean way for dialects to specify whether window functions should include frame clauses. The default implementation returning true maintains backward compatibility.


130-152: LGTM! Robust date_part to EXTRACT conversion with proper validation.

The implementation correctly:

  • Validates exactly 2 arguments to prevent out-of-bounds access
  • Converts DataFusion's date_part function calls to BigQuery-compatible EXTRACT statements
  • Uses proper error handling with descriptive messages
  • Leverages helper methods for field conversion

This addresses the core requirement of making DataFusion-generated SQL compatible with BigQuery.


154-179: LGTM! Accurate BigQuery window frame support implementation.

The implementation correctly identifies BigQuery window functions that don't support frame clauses. The comprehensive list matches BigQuery's documented limitations, and the logic properly returns false for unsupported functions while defaulting to true for aggregation functions that do support frames.


182-193: LGTM! Comprehensive literal type support for datetime fields.

The implementation correctly handles both Utf8 and LargeUtf8 scalar values, preventing rejection of valid DataFusion plans that use large string literals. The error message clearly indicates the expected input type.


195-231: LGTM! Comprehensive BigQuery date part validation with proper WEEK handling.

The implementation correctly:

  • Restricts to BigQuery-supported date parts only, preventing runtime SQL errors
  • Handles the complex WEEK(WEEKDAY) syntax with proper validation
  • Provides clear error messages for unsupported parts and invalid formats
  • Uses case-insensitive matching for robustness

This ensures generated EXTRACT statements remain valid on BigQuery while supporting all documented date/time fields.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added core rust Pull requests that update Rust code labels Aug 12, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)

145-195: Avoid allocation and locale-sensitive case transforms in mapping

to_uppercase() allocates and is locale-sensitive. These tokens are ASCII; prefer to_ascii_uppercase() to avoid surprises (e.g., Turkish locale) and reduce alloc churn.

Apply this minimal diff:

-fn datetime_field_from_str(s: &str) -> Result<ast::DateTimeField> {
-    match s.to_uppercase().as_str() {
+fn datetime_field_from_str(s: &str) -> Result<ast::DateTimeField> {
+    let s_upper = s.to_ascii_uppercase();
+    match s_upper.as_str() {

Optionally, consider switching to eq_ignore_ascii_case per arm, but the above change keeps the current structure with safer semantics.

wren-core/core/src/mdl/mod.rs (1)

933-959: Good BigQuery EXTRACT roundtrip coverage

This test validates the intended unparse behavior for BigQuery. Consider adding a case that starts with date_part('year', o_orderdate) to ensure it is also rendered as EXTRACT(YEAR FROM ...), and a couple more fields (e.g., MONTH, WEEK) to broaden coverage.

wren-core/core/src/logical_plan/analyze/model_generation.rs (1)

100-108: Simplify filter combination by flattening first

Your rewrite is correct and avoids unwraps. You can make it more concise and maintain identical semantics by flattening first and reducing with Expr::and.

Apply this diff:

-                    let rls_filter = filters
-                        .into_iter()
-                        .reduce(|acc, filter| {
-                            if let Some(acc) = acc {
-                                if let Some(filter) = filter {
-                                    Some(acc.and(filter))
-                                } else {
-                                    Some(acc)
-                                }
-                            } else {
-                                filter
-                            }
-                        })
-                        .flatten();
+                    let rls_filter = filters
+                        .into_iter()
+                        .flatten()
+                        .reduce(Expr::and);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68b8153 and 9fad2a1.

📒 Files selected for processing (4)
  • wren-core-base/src/mdl/utils.rs (1 hunks)
  • wren-core/core/src/logical_plan/analyze/model_generation.rs (2 hunks)
  • wren-core/core/src/mdl/dialect/inner_dialect.rs (2 hunks)
  • wren-core/core/src/mdl/mod.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
wren-core/core/src/mdl/dialect/inner_dialect.rs (2)
wren-core/core/src/mdl/dialect/wren_dialect.rs (3)
  • scalar_function_to_sql_overrides (60-89)
  • args (114-117)
  • args (153-168)
wren-core/core/src/mdl/dialect/utils.rs (1)
  • args (30-48)
wren-core/core/src/mdl/mod.rs (3)
wren-core/core/src/mdl/dialect/wren_dialect.rs (2)
  • new (107-111)
  • default (101-103)
wren-core/core/src/mdl/context.rs (1)
  • new (339-361)
wren-core-base/manifest-macro/src/lib.rs (1)
  • manifest (26-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: test
  • GitHub Check: ci
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo check
🔇 Additional comments (2)
wren-core-base/src/mdl/utils.rs (1)

46-52: No downstream impacts from explicit lifetime on quote_identifier
The change to pub fn quote_identifier(s: &str) -> Cow<'_, str> is semantically identical to the elided form. A quick scan shows only one Rust call site which immediately calls .to_string(), converting the Cow<'_, str> into an owned String:

• wren-core-base/src/mdl/manifest.rs:153 — .map(|x| quote_identifier(&x).to_string())

Since .to_string() always produces an owned String, there’s no behavioral change or breakage.

wren-core/core/src/logical_plan/analyze/model_generation.rs (1)

236-237: Return a planning error when measure lacks alias (style/flow)

Switching to a direct plan_err! in the else branch is fine and keeps control flow clear. No change requested.

@goldmedal goldmedal requested a review from douenergy August 12, 2025 02:29
@goldmedal goldmedal changed the title fix(core): support EXTRACT for BigQuery feat(core): add BigQuery EXTRACT function support and window frame validation Aug 12, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)

172-180: LGTM! Proper handling of both UTF8 variants.

The function correctly handles both Utf8 and LargeUtf8 scalar value types, preventing potential issues with large string literals.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f22542 and a030079.

📒 Files selected for processing (3)
  • wren-core/core/src/mdl/dialect/inner_dialect.rs (3 hunks)
  • wren-core/core/src/mdl/dialect/wren_dialect.rs (2 hunks)
  • wren-core/core/src/mdl/mod.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-core/core/src/mdl/mod.rs
🧰 Additional context used
🧬 Code Graph Analysis (2)
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)
wren-core/core/src/mdl/dialect/inner_dialect.rs (2)
  • window_func_support_window_frame (57-64)
  • window_func_support_window_frame (157-169)
wren-core/core/src/mdl/dialect/inner_dialect.rs (2)
wren-core/core/src/mdl/dialect/wren_dialect.rs (4)
  • window_func_support_window_frame (100-108)
  • scalar_function_to_sql_overrides (60-89)
  • args (125-128)
  • args (164-179)
wren-core/core/src/mdl/dialect/utils.rs (1)
  • args (30-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: cargo test (macos)
  • GitHub Check: test
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo check
  • GitHub Check: ci
🔇 Additional comments (4)
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)

100-108: LGTM! Clean delegation pattern for window frame support.

The implementation correctly delegates window frame support checks to the inner dialect, maintaining consistency with the existing delegation pattern used throughout the WrenDialect implementation.

wren-core/core/src/mdl/dialect/inner_dialect.rs (3)

57-64: LGTM! Well-designed trait method with sensible default.

The new trait method follows the established pattern with underscore-prefixed parameters and provides a reasonable default of true for window frame support.


130-152: LGTM! Robust implementation with proper error handling.

The date_part to EXTRACT conversion correctly addresses the BigQuery compatibility issue mentioned in the PR objectives. The implementation includes proper arity validation and leverages the helper functions for field parsing.


155-169: LGTM! Comprehensive list of unsupported window functions for BigQuery.

The implementation correctly identifies window functions that don't support framing in BigQuery, aligning with BigQuery's documented limitations.

@douenergy
Copy link
Copy Markdown
Contributor

BigQuery will show an error message like: Window framing clause is not allowed for analytic function ROW_NUMBER.

@douenergy douenergy merged commit 82174fa into Canner:main Aug 14, 2025
15 checks passed
@goldmedal goldmedal deleted the chore/support-bgquery-extract branch August 14, 2025 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants