feat(ibis): Introduce bigquery white function list by douenergy · Pull Request #1271 · Canner/wren-engine

douenergy · 2025-07-24T02:11:21Z

Add BigQuery Function Validation

Summary

Implements automated validation for BigQuery functions to prevent runtime errors from unsupported functions.

Changes

ibis-server/resources/fuzzing/bigquery.json - 311 SQL test cases for comprehensive function coverage
ibis-server/tools/white_remote_function.py - Validation tool that categorizes functions as:

✅ Supported
❌ Unsupported (function not found)
🔧 Syntax errors (need SQL fixes)

Summary by CodeRabbit

New Features
- Added a comprehensive set of 311 BigQuery SQL function test cases for improved coverage.
- Introduced a new script to automate testing of SQL queries against a remote API, with detailed error categorization and reporting.
- Enabled function retrieval from a whitelist CSV for specified data sources.
- Added support for configuring and using a remote whitelist path via environment variables.
Bug Fixes
- Updated test expectations and descriptions for BigQuery function listing to reflect current results.
Chores
- Adjusted configuration and test resource paths to support new function list management.

coderabbitai · 2025-07-24T02:11:28Z

"""

Walkthrough

This change introduces a new environment variable for configuring the BigQuery remote whitelist function list path, updates test resources and expected values to reflect this new path, and adds a comprehensive BigQuery SQL fuzzing dataset. Additionally, a new script is provided for automated testing of SQL queries against a remote API, with detailed error categorization and reporting. The /functions endpoint was enhanced to support loading function lists from a whitelist CSV file using DuckDB when enabled.

Changes

File(s)	Change Summary
ibis-server/app/config.py	Added support for `REMOTE_WHITE_FUNCTION_LIST_PATH` env variable; added methods to get/set whitelist path and check whitelist status for data sources.
ibis-server/app/routers/v3/connector.py	Modified `/functions` endpoint to optionally load function list from whitelist CSV file using DuckDB if enabled.
ibis-server/resources/fuzzing/bigquery.json	Added new JSON file with 311 BigQuery SQL function fuzzing test cases.
ibis-server/tests/routers/v3/connector/bigquery/conftest.py	Added `white_function_list_path` variable pointing to the whitelist resource path; updated fixture to set/unset whitelist path.
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py	Updated test to set remote whitelist function list path; changed expected function count and updated "string_agg" metadata.
ibis-server/tools/white_remote_function.py	New script for automated SQL query testing against a remote API with detailed error categorization/reporting.
ibis-server/Dockerfile	Added environment variable `REMOTE_WHITE_FUNCTION_LIST_PATH` to runtime environment.
ibis-server/tests/test_main.py	Extended `/config` endpoint test to check for `remote_white_function_list_path` key with expected value `None`.

Sequence Diagram(s)

sequenceDiagram
    participant Tester as white_remote_function.py
    participant API as Remote API
    participant Resource as bigquery.json

    Tester->>Resource: Load test queries (JSON)
    loop For each query
        Tester->>API: POST SQL query with connection info
        API-->>Tester: Response (success or error)
        Tester->>Tester: Categorize and log result
    end
    Tester->>Tester: Summarize and output results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Possibly related PRs

chore: fix BigQuery v3 test cases #1031: The main PR and the retrieved PR both modify the test_function_list test in ibis-server/tests/routers/v3/connector/bigquery/test_functions.py by changing the expected count of functions returned, indicating a direct relation in test adjustments for BigQuery function lists.
chore(ibis-bq): add timestamp function for BigQuery #1165: Both PRs modify test_function_list in the same test file, with overlapping but different scope of changes to expected counts and function metadata, showing close relation in test evolution.

Suggested reviewers

goldmedal

Poem

In the land of SQL, the rabbits play,
With whitelists and fuzzing, they hop all day.
A script to test, a path to tweak,
BigQuery’s secrets, they happily seek.
With every query, a carrot earned—
Oh what new functions have we learned?
🐇✨
"""

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89efa01 and ba79984.

📒 Files selected for processing (1)

ibis-server/tests/test_main.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: ci

🔇 Additional comments (1)

ibis-server/tests/test_main.py (1)

29-29: LGTM! Test correctly updated for new configuration field.

The addition of "remote_white_function_list_path": None properly updates the test assertion to include the new configuration field, ensuring the /config endpoint test validates the complete API contract.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (1)
48-48: Consider making the expected count more maintainable.

The hardcoded value 174 may become brittle if the white function list is updated. Consider calculating this dynamically from the actual function list file or using a constant that's updated alongside the list.
-    assert len(result) == 174
+    # Consider loading the actual function list and getting its length
+    # or defining a constant that's maintained alongside the list
+    assert len(result) == 174  # TODO: Make this dynamic
ibis-server/tools/white_remote_function.py (2)
212-212: Output filename could be more descriptive.

The output filename "api_test_results2.json" appears to have a version suffix. Consider using a timestamp or configuration-based naming.
-    output_file = "api_test_results2.json"
+    import datetime
+    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
+    output_file = f"api_test_results_{timestamp}.json"
164-164: Consider making the delay configurable.

The hardcoded 0.1 second delay might not be optimal for all environments. Consider making this configurable.
-        time.sleep(0.1)
+        delay = float(os.getenv("API_DELAY", "0.1"))
+        time.sleep(delay)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad9718b and fce3136.

⛔ Files ignored due to path filters (1)

ibis-server/resources/white_function_list/bigquery.csv is excluded by !**/*.csv

📒 Files selected for processing (5)

ibis-server/app/config.py (2 hunks)
ibis-server/resources/fuzzing/bigquery.json (1 hunks)
ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1 hunks)
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (2 hunks)
ibis-server/tools/white_remote_function.py (1 hunks)

🔇 Additional comments (8)

ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)

13-13: LGTM! Path update aligns with the new white function list structure.

The path change from ../resources/function_list to ../resources/white_function_list/function_list correctly reflects the new directory structure for the BigQuery function whitelist.

ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (1)

57-61: Function metadata updates look correct.

The updated description "Concatenates strings with a separator" and return type "string" for the string_agg function are more accurate and specific than the previous values.

ibis-server/app/config.py (2)

23-25: Environment variable addition looks good.

The new REMOTE_WHITE_FUNCTION_LIST_PATH environment variable follows the existing pattern and is properly loaded during initialization.

70-72: BigQuery-specific path override is well-implemented.

The conditional logic to use the white function list for BigQuery is clear and well-documented. The comment explains the purpose of the override.

ibis-server/resources/fuzzing/bigquery.json (1)

1-1556: Comprehensive BigQuery function test dataset.

This JSON file provides excellent coverage of BigQuery functions with 311 well-structured test cases. The consistent format with id, function_name, and sql fields makes it easy to process programmatically.

ibis-server/tools/white_remote_function.py (3)

6-15: Good error handling for file operations.

The load_test_queries function properly handles both FileNotFoundError and json.JSONDecodeError exceptions with informative error messages.

17-33: HTTP request handling is solid.

The test_sql_query function includes proper timeout configuration and exception handling for network errors. The payload structure matches the expected API format.

35-86: Comprehensive error categorization logic.

The categorize_error function provides thorough pattern matching for different error types. The patterns cover a wide range of potential error messages.

ibis-server/resources/fuzzing/bigquery.json

ibis-server/tools/white_remote_function.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

ibis-server/app/routers/v3/connector.py (1)
382-389: Consider caching the CSV data for better performance.

Reading the CSV file on every request could impact performance, especially for large whitelist files or high-frequency endpoint usage.

Consider implementing a simple caching mechanism:
# Add to module level
from functools import lru_cache
import os

@lru_cache(maxsize=128)
def _load_whitelist_functions(file_path: str, file_mtime: float) -> list:
    """Cache CSV loading based on file path and modification time."""
    return (
        duckdb.read_csv(file_path, header=True)
        .to_df()
        .to_dict("records")
    )

# In the functions endpoint:
if is_white_list:
    try:
        file_mtime = os.path.getmtime(white_function_list_path)
        func_list = _load_whitelist_functions(white_function_list_path, file_mtime)
    except Exception as e:
        # error handling
This approach caches results while invalidating the cache when the file is modified.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 167ee7d and 2fedd35.

📒 Files selected for processing (5)

ibis-server/Dockerfile (1 hunks)
ibis-server/app/config.py (2 hunks)
ibis-server/app/routers/v3/connector.py (2 hunks)
ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1 hunks)
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (3 hunks)

✅ Files skipped from review due to trivial changes (1)

ibis-server/Dockerfile

🚧 Files skipped from review as they are similar to previous changes (3)

ibis-server/tests/routers/v3/connector/bigquery/conftest.py
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py
ibis-server/app/config.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: ci

🔇 Additional comments (1)

ibis-server/app/routers/v3/connector.py (1)

3-3: ✅ DuckDB dependency verified

The duckdb import is declared in ibis-server/pyproject.toml (version 1.2.1).

No further changes to dependency management are needed; the import can be approved as is.

ibis-server/app/routers/v3/connector.py

goldmedal

Thanks @douenergy. Overall looks good 🚀

goldmedal · 2025-07-24T07:05:55Z

BigQuery has been tested locally.

github-actions bot added bigquery ibis python Pull requests that update Python code labels Jul 24, 2025

coderabbitai bot reviewed Jul 24, 2025

View reviewed changes

ibis-server/resources/fuzzing/bigquery.json Show resolved Hide resolved

ibis-server/tools/white_remote_function.py Show resolved Hide resolved

white list

167ee7d

douenergy force-pushed the bigquery-white-list branch from fce3136 to 167ee7d Compare July 24, 2025 04:59

fix bigquery test

2fedd35

coderabbitai bot reviewed Jul 24, 2025

View reviewed changes

ibis-server/app/routers/v3/connector.py Show resolved Hide resolved

douenergy added 2 commits July 24, 2025 13:55

linter

89efa01

config test

ba79984

goldmedal approved these changes Jul 24, 2025

View reviewed changes

goldmedal merged commit 5f74b3c into Canner:main Jul 24, 2025
3 of 4 checks passed

douenergy mentioned this pull request Jul 30, 2025

feat(ibis): remove to_char from BigQuery white function list #1275

Merged

coderabbitai bot mentioned this pull request Aug 27, 2025

feat(ibis): Introduce postgres function white list #1300

Merged

coderabbitai bot mentioned this pull request Sep 8, 2025

feat(ibis): Introduce mysql function white list #1311

Merged

coderabbitai bot mentioned this pull request Oct 29, 2025

feat(core): support UNNEST syntax for Snowflake #1357

Merged

coderabbitai bot mentioned this pull request Nov 11, 2025

feat(core): introduce dialect-specific function list and refactor BigQuery function lists #1366

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ibis): Introduce bigquery white function list#1271

feat(ibis): Introduce bigquery white function list#1271
goldmedal merged 4 commits intoCanner:mainfrom
douenergy:bigquery-white-list

douenergy commented Jul 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 24, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

goldmedal left a comment

Uh oh!

goldmedal commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

douenergy commented Jul 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add BigQuery Function Validation

Summary

Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

goldmedal left a comment

Choose a reason for hiding this comment

Uh oh!

goldmedal commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

douenergy commented Jul 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 24, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)