Skip to content

fix(core): introduce DATE_DIFF function#1303

Merged
goldmedal merged 7 commits intoCanner:mainfrom
goldmedal:feat/support-date-diff
Sep 2, 2025
Merged

fix(core): introduce DATE_DIFF function#1303
goldmedal merged 7 commits intoCanner:mainfrom
goldmedal:feat/support-date-diff

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Sep 2, 2025

Description

Use the Canner DataFusion fork for support the DATE_DIFF function:

Now, we support the following SQL syntax:

SELECT date_diff('day', date_1, date_2)

and

SELECT date_diff(day, date_1, date_2)

BigQuery dialect

BigQuery have their own usage for DATE_DIFF. We need to change the ordering when converting for BigQuery.

 DATE_DIFF(end_date, start_date, granularity)

Summary by CodeRabbit

  • New Features

    • BigQuery date/time diff support expanded to additional granularities (microsecond through hour) with consistent integer results.
  • Bug Fixes

    • Improved validation and clearer error messages listing supported date parts for BigQuery.
  • Tests

    • Added/expanded tests covering DATE_DIFF/DATEDIFF syntaxes and granularities, including negative cases; test expectations updated to reflect added functions and manifest-driven function count changes.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 2, 2025

Warning

Rate limit exceeded

@goldmedal has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 33 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 2faec4f and 4a8aa3c.

📒 Files selected for processing (1)
  • ibis-server/tests/routers/v3/connector/mysql/test_functions.py (1 hunks)

Walkthrough

Adds BigQuery DATE_DIFF/DATEDIFF translation and supported date-part mappings, updates tests (unit/integration) to cover valid/invalid granularities and types, renames one bypass UDF test, adjusts function-count expectations, and adds MDL transform tests (duplicate block present).

Changes

Cohort / File(s) Summary
BigQuery router tests
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py
Adds async def test_date_diff_function(...) exercising DATE_DIFF/DATEDIFF syntaxes, dtype/shape assertions, a 422 error for unsupported parts, and an HOUR/timestamp case; adds o_orderdate column to manifest used by tests.
BigQuery dialect translation
wren-core/core/src/mdl/dialect/inner_dialect.rs
Implements date_diffDATE_DIFF translation (expects 3 args, first as PART/string), maps additional parts (MICROSECOND, MILLISECOND, SECOND, MINUTE, HOUR), refines unsupported-part error text, and imports AST constructs to build the call.
MDL transform tests
wren-core/core/src/mdl/mod.rs
Adds test_date_diff_bigquery tokio test asserting SQL transforms to DATE_DIFF(...) for various syntaxes and asserts specific error for invalid granularity; test block appears duplicated.
Bypass UDF test rename
wren-core/core/src/mdl/function.rs
Renames bypass UDF in test from date_diff to date_test and updates expected projection accordingly.
Test constants / function list tests
ibis-server/tests/conftest.py, wren-core-py/tests/test_modeling_core.py, ibis-server/tests/routers/v3/connector/local_file/test_functions.py
Updates expected function counts: DATAFUSION_FUNCTION_COUNT 283→285, related test expected lengths 283→285 and 290→292; adjusts one local_file test expectation reducing added items by 2.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Server as Ibis Server
  participant MDL as MDL Analyzer
  participant Dialect as BigQuery Dialect
  participant AST as SQL AST Builder

  Client->>Server: send query with date_diff/datediff(...)
  Server->>MDL: analyze & transform expression
  MDL->>Dialect: request translation (args[0]=PART, args[1], args[2])
  alt valid PART and 3 args
    Dialect->>AST: build DATE_DIFF(args[2], args[1], PART)
    AST-->>MDL: transformed SQL with DATE_DIFF(...)
    MDL-->>Server: SQL ready
    Server-->>Client: 200 + results
  else invalid PART
    Dialect-->>MDL: raise unsupported-part error (lists valid values)
    MDL-->>Server: propagate error
    Server-->>Client: 422 Unprocessable Entity
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Suggested reviewers

  • douenergy
  • wwwy3y3

Poem

I nibble parts of DAY and HOUR,
I swap the args and craft the power.
If 'DAYS' trips me, I thump and say—
"Unsupported!" — then hop away. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added core ibis rust Pull requests that update Rust code python Pull requests that update Python code labels Sep 2, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (2)

16-32: Manifest should declare o_orderdate to match queries.

Queries reference orders.o_orderdate but the manifest declares only o_orderkey. Declare o_orderdate to keep the test self-consistent and reduce surprises if schema validation tightens.

Apply:

@@
         {
             "name": "orders",
             "tableReference": {
                 "schema": "tpch_tiny",
                 "table": "orders",
             },
             "columns": [
-                {"name": "o_orderkey", "type": "integer"},
+                {"name": "o_orderkey", "type": "integer"},
+                {"name": "o_orderdate", "type": "date"},
             ],
         },

165-245: Solid coverage; add arity error cases and a mixed-case part to harden it.

  • Add a negative test for wrong arity (e.g., DATE_DIFF(DAY, CURRENT_DATE()) and DATE_DIFF(DAY, a, b, c)) to assert the 3-arg contract.
  • Add a case with mixed/lowercase part (e.g., DATEDIFF('day', ...)) to verify case-insensitive handling end-to-end.

Happy to push a patch with the extra assertions if you want.

wren-core/core/src/mdl/mod.rs (4)

3119-3137: Nice coverage of DATE_DIFF/DATEDIFF variants; add a lowercase date part case.

PR text mentions support for date_diff('day', …). Consider adding one assertion for 'day' (lowercase) to lock in case-insensitivity.

Apply this diff to add a check:

+        let sql = "select date_diff('day', date_1, date_2) from date_table";
+        assert_snapshot!(
+            transform_sql_with_ctx(&ctx, Arc::clone(&analyzed_mdl), &[], Arc::clone(&headers), sql).await?,
+            @"SELECT DATE_DIFF(date_table.date_2, date_table.date_1, DAY) FROM (SELECT date_table.date_1, date_table.date_2 FROM (SELECT __source.date_1 AS date_1, __source.date_2 AS date_2 FROM date_table AS __source) AS date_table) AS date_table"
+        );

3146-3164: Error message: consider including WEEK() or clarify unsupported subparts.

BigQuery allows WEEK(weekday) (e.g., WEEK(MONDAY)) in DIFF functions. If the dialect supports it (as with EXTRACT tests), list that form in “Valid values” or explicitly state WEEK(weekday) is unsupported here to avoid user confusion.


3166-3171: Prefer TIMESTAMP_DIFF for TIMESTAMP inputs (readability/idiomatic BigQuery).

DATE_DIFF works across datetime family per BigQuery docs, but TIMESTAMP_DIFF with TIMESTAMP args is clearer. Consider emitting TIMESTAMP_DIFF for TIMESTAMP types and updating the expectation.

Apply this diff if you switch the dialect to type-specific functions:

-            @"SELECT DATE_DIFF(timestamp_table.ts_2, timestamp_table.ts_1, HOUR) FROM (SELECT timestamp_table.ts_1, timestamp_table.ts_2 FROM (SELECT __source.ts_1 AS ts_1, __source.ts_2 AS ts_2 FROM timestamp_table AS __source) AS timestamp_table) AS timestamp_table"
+            @"SELECT TIMESTAMP_DIFF(timestamp_table.ts_2, timestamp_table.ts_1, HOUR) FROM (SELECT timestamp_table.ts_1, timestamp_table.ts_2 FROM (SELECT __source.ts_1 AS ts_1, __source.ts_2 AS ts_2 FROM timestamp_table AS __source) AS timestamp_table) AS timestamp_table"

For reference on function names/semantics, see BigQuery TIMESTAMP_DIFF and DATE_DIFF docs. (cloud.google.com)


3119-3171: Minor: reuse a single headers Arc instead of redeclaring.

Multiple let headers = Arc::new(HashMap::default()) re-decls add noise. Define once and reuse.

Apply once near the first assertion and drop subsequent redeclarations:

-        let headers: Arc<HashMap<String, Option<String>>> = Arc::new(HashMap::default());
+        let headers: Arc<HashMap<String, Option<String>>> = Arc::new(HashMap::default());
...
-        let headers: Arc<HashMap<String, Option<String>>> = Arc::new(HashMap::default());
...
-        let headers: Arc<HashMap<String, Option<String>>> = Arc::new(HashMap::default());
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 857f35e and 16598e4.

⛔ Files ignored due to path filters (1)
  • wren-core-py/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (1 hunks)
  • wren-core/core/src/mdl/dialect/inner_dialect.rs (3 hunks)
  • wren-core/core/src/mdl/function.rs (1 hunks)
  • wren-core/core/src/mdl/mod.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (2)
ibis-server/tests/conftest.py (1)
  • client (18-23)
ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)
  • connection_info (25-30)
wren-core/core/src/mdl/dialect/inner_dialect.rs (2)
wren-core/core/src/mdl/dialect/utils.rs (1)
  • args (30-48)
wren-core/core/src/logical_plan/analyze/access_control.rs (1)
  • expr_to_sql (658-661)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo check
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: ci
  • GitHub Check: test
🔇 Additional comments (4)
wren-core/core/src/mdl/function.rs (1)

390-399: Rename avoids collision with new date_diff handling — good change.

Switching the bypass UDF used in this unit test from date_diff to date_test prevents accidental coupling with the dialect rewrite. Expected plan string updated accordingly.

wren-core/core/src/mdl/dialect/inner_dialect.rs (2)

274-281: Nice: expanded date parts align with BigQuery’s supported units.

MICROSECOND through HOUR inclusion is helpful for TIMESTAMP diffs; error message lists valid values.


150-193: Confirm DATEDIFF alias normalization
No direct handling of DATEDIFF (no underscore) in inner_dialect.rs; verify that BigQueryDialect or the SQL parser normalizes DATEDIFF to date_diff to avoid backend‐specific surprises.

wren-core/core/src/mdl/mod.rs (1)

3091-3117: Good BigQuery-focused test scaffold.

Manifest setup and Unparse mode context look correct for exercising dialect rewrites without execution.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
wren-core-py/tests/test_modeling_core.py (2)

140-140: Strengthen the assertion to verify new functions explicitly.
Counting is brittle. Also assert the presence of the newly added functions to catch catalog drift earlier.

-    assert len(functions) == 292
+    assert len(functions) == 292
+    names = {f.name for f in functions}
+    assert {"date_diff", "date_test"}.issubset(names)

152-152: Also assert that DATE_DIFF is available in the default catalog.
Helps validate the DataFusion fork wiring beyond a raw count.

-    assert len(functions) == 285
+    assert len(functions) == 285
+    assert any(f.name == "date_diff" for f in functions)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5512709 and ab1e15f.

📒 Files selected for processing (2)
  • ibis-server/tests/conftest.py (1 hunks)
  • wren-core-py/tests/test_modeling_core.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
wren-core-py/tests/test_modeling_core.py (2)
ibis-server/app/routers/v3/connector.py (1)
  • functions (405-430)
wren-core-py/src/context.rs (1)
  • functions (100-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: cargo test (amd64)
  • GitHub Check: check Cargo.toml formatting
  • GitHub Check: clippy
  • GitHub Check: test
  • GitHub Check: ci
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo test (win64)
🔇 Additional comments (1)
ibis-server/tests/conftest.py (1)

14-14: LGTM: function count bumped to 285 matches the new DATE_DIFF support.
No issues spotted.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ab1e15f and 2faec4f.

⛔ Files ignored due to path filters (2)
  • ibis-server/resources/function_list/bigquery.csv is excluded by !**/*.csv
  • ibis-server/resources/white_function_list/bigquery.csv is excluded by !**/*.csv
📒 Files selected for processing (2)
  • ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (3 hunks)
  • ibis-server/tests/routers/v3/connector/local_file/test_functions.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/tests/routers/v3/connector/bigquery/test_functions.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: check Cargo.toml formatting
  • GitHub Check: clippy
  • GitHub Check: cargo test (amd64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (win64)
  • GitHub Check: test
  • GitHub Check: ci

@goldmedal
Copy link
Copy Markdown
Contributor Author

goldmedal commented Sep 2, 2025

BigQuery has been tested locally

poetry run pytest -m 'bigquery'
======================================================================================================================================= test session starts =======================================================================================================================================
platform darwin -- Python 3.11.11, pytest-8.4.1, pluggy-1.5.0
rootdir: /Users/jax/git/wren-engine/ibis-server
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 370 items / 330 deselected / 40 selected                                                                                                                                                                                                                                                

tests/routers/v2/connector/test_bigquery.py ..................                                                                                                                                                                                                                              [ 45%]
tests/routers/v3/connector/bigquery/test_functions.py .....                                                                                                                                                                                                                                 [ 57%]
tests/routers/v3/connector/bigquery/test_query.py .................   

@goldmedal goldmedal requested a review from douenergy September 2, 2025 05:44
@douenergy
Copy link
Copy Markdown
Contributor

Thanks @goldmedal

@goldmedal goldmedal merged commit d560ac3 into Canner:main Sep 2, 2025
12 of 13 checks passed
@goldmedal goldmedal deleted the feat/support-date-diff branch September 2, 2025 08:18
nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery core ibis python Pull requests that update Python code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants