Skip to content

feat(ibis): Introduce bigquery white function list#1271

Merged
goldmedal merged 4 commits intoCanner:mainfrom
douenergy:bigquery-white-list
Jul 24, 2025
Merged

feat(ibis): Introduce bigquery white function list#1271
goldmedal merged 4 commits intoCanner:mainfrom
douenergy:bigquery-white-list

Conversation

@douenergy
Copy link
Copy Markdown
Contributor

@douenergy douenergy commented Jul 24, 2025

Add BigQuery Function Validation

Summary

Implements automated validation for BigQuery functions to prevent runtime errors from unsupported functions.

Changes

ibis-server/resources/fuzzing/bigquery.json - 311 SQL test cases for comprehensive function coverage
ibis-server/tools/white_remote_function.py - Validation tool that categorizes functions as:

✅ Supported
❌ Unsupported (function not found)
🔧 Syntax errors (need SQL fixes)

Summary by CodeRabbit

  • New Features

    • Added a comprehensive set of 311 BigQuery SQL function test cases for improved coverage.
    • Introduced a new script to automate testing of SQL queries against a remote API, with detailed error categorization and reporting.
    • Enabled function retrieval from a whitelist CSV for specified data sources.
    • Added support for configuring and using a remote whitelist path via environment variables.
  • Bug Fixes

    • Updated test expectations and descriptions for BigQuery function listing to reflect current results.
  • Chores

    • Adjusted configuration and test resource paths to support new function list management.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jul 24, 2025

"""

Walkthrough

This change introduces a new environment variable for configuring the BigQuery remote whitelist function list path, updates test resources and expected values to reflect this new path, and adds a comprehensive BigQuery SQL fuzzing dataset. Additionally, a new script is provided for automated testing of SQL queries against a remote API, with detailed error categorization and reporting. The /functions endpoint was enhanced to support loading function lists from a whitelist CSV file using DuckDB when enabled.

Changes

File(s) Change Summary
ibis-server/app/config.py Added support for REMOTE_WHITE_FUNCTION_LIST_PATH env variable; added methods to get/set whitelist path and check whitelist status for data sources.
ibis-server/app/routers/v3/connector.py Modified /functions endpoint to optionally load function list from whitelist CSV file using DuckDB if enabled.
ibis-server/resources/fuzzing/bigquery.json Added new JSON file with 311 BigQuery SQL function fuzzing test cases.
ibis-server/tests/routers/v3/connector/bigquery/conftest.py Added white_function_list_path variable pointing to the whitelist resource path; updated fixture to set/unset whitelist path.
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py Updated test to set remote whitelist function list path; changed expected function count and updated "string_agg" metadata.
ibis-server/tools/white_remote_function.py New script for automated SQL query testing against a remote API with detailed error categorization/reporting.
ibis-server/Dockerfile Added environment variable REMOTE_WHITE_FUNCTION_LIST_PATH to runtime environment.
ibis-server/tests/test_main.py Extended /config endpoint test to check for remote_white_function_list_path key with expected value None.

Sequence Diagram(s)

sequenceDiagram
    participant Tester as white_remote_function.py
    participant API as Remote API
    participant Resource as bigquery.json

    Tester->>Resource: Load test queries (JSON)
    loop For each query
        Tester->>API: POST SQL query with connection info
        API-->>Tester: Response (success or error)
        Tester->>Tester: Categorize and log result
    end
    Tester->>Tester: Summarize and output results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Possibly related PRs

  • chore: fix BigQuery v3 test cases #1031: The main PR and the retrieved PR both modify the test_function_list test in ibis-server/tests/routers/v3/connector/bigquery/test_functions.py by changing the expected count of functions returned, indicating a direct relation in test adjustments for BigQuery function lists.
  • chore(ibis-bq): add timestamp function for BigQuery #1165: Both PRs modify test_function_list in the same test file, with overlapping but different scope of changes to expected counts and function metadata, showing close relation in test evolution.

Suggested reviewers

  • goldmedal

Poem

In the land of SQL, the rabbits play,
With whitelists and fuzzing, they hop all day.
A script to test, a path to tweak,
BigQuery’s secrets, they happily seek.
With every query, a carrot earned—
Oh what new functions have we learned?
🐇✨
"""

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89efa01 and ba79984.

📒 Files selected for processing (1)
  • ibis-server/tests/test_main.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (1)
ibis-server/tests/test_main.py (1)

29-29: LGTM! Test correctly updated for new configuration field.

The addition of "remote_white_function_list_path": None properly updates the test assertion to include the new configuration field, ensuring the /config endpoint test validates the complete API contract.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added bigquery ibis python Pull requests that update Python code labels Jul 24, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (1)

48-48: Consider making the expected count more maintainable.

The hardcoded value 174 may become brittle if the white function list is updated. Consider calculating this dynamically from the actual function list file or using a constant that's updated alongside the list.

-    assert len(result) == 174
+    # Consider loading the actual function list and getting its length
+    # or defining a constant that's maintained alongside the list
+    assert len(result) == 174  # TODO: Make this dynamic
ibis-server/tools/white_remote_function.py (2)

212-212: Output filename could be more descriptive.

The output filename "api_test_results2.json" appears to have a version suffix. Consider using a timestamp or configuration-based naming.

-    output_file = "api_test_results2.json"
+    import datetime
+    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
+    output_file = f"api_test_results_{timestamp}.json"

164-164: Consider making the delay configurable.

The hardcoded 0.1 second delay might not be optimal for all environments. Consider making this configurable.

-        time.sleep(0.1)
+        delay = float(os.getenv("API_DELAY", "0.1"))
+        time.sleep(delay)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad9718b and fce3136.

⛔ Files ignored due to path filters (1)
  • ibis-server/resources/white_function_list/bigquery.csv is excluded by !**/*.csv
📒 Files selected for processing (5)
  • ibis-server/app/config.py (2 hunks)
  • ibis-server/resources/fuzzing/bigquery.json (1 hunks)
  • ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (2 hunks)
  • ibis-server/tools/white_remote_function.py (1 hunks)
🔇 Additional comments (8)
ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)

13-13: LGTM! Path update aligns with the new white function list structure.

The path change from ../resources/function_list to ../resources/white_function_list/function_list correctly reflects the new directory structure for the BigQuery function whitelist.

ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (1)

57-61: Function metadata updates look correct.

The updated description "Concatenates strings with a separator" and return type "string" for the string_agg function are more accurate and specific than the previous values.

ibis-server/app/config.py (2)

23-25: Environment variable addition looks good.

The new REMOTE_WHITE_FUNCTION_LIST_PATH environment variable follows the existing pattern and is properly loaded during initialization.


70-72: BigQuery-specific path override is well-implemented.

The conditional logic to use the white function list for BigQuery is clear and well-documented. The comment explains the purpose of the override.

ibis-server/resources/fuzzing/bigquery.json (1)

1-1556: Comprehensive BigQuery function test dataset.

This JSON file provides excellent coverage of BigQuery functions with 311 well-structured test cases. The consistent format with id, function_name, and sql fields makes it easy to process programmatically.

ibis-server/tools/white_remote_function.py (3)

6-15: Good error handling for file operations.

The load_test_queries function properly handles both FileNotFoundError and json.JSONDecodeError exceptions with informative error messages.


17-33: HTTP request handling is solid.

The test_sql_query function includes proper timeout configuration and exception handling for network errors. The payload structure matches the expected API format.


35-86: Comprehensive error categorization logic.

The categorize_error function provides thorough pattern matching for different error types. The patterns cover a wide range of potential error messages.

@douenergy douenergy force-pushed the bigquery-white-list branch from fce3136 to 167ee7d Compare July 24, 2025 04:59
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
ibis-server/app/routers/v3/connector.py (1)

382-389: Consider caching the CSV data for better performance.

Reading the CSV file on every request could impact performance, especially for large whitelist files or high-frequency endpoint usage.

Consider implementing a simple caching mechanism:

# Add to module level
from functools import lru_cache
import os

@lru_cache(maxsize=128)
def _load_whitelist_functions(file_path: str, file_mtime: float) -> list:
    """Cache CSV loading based on file path and modification time."""
    return (
        duckdb.read_csv(file_path, header=True)
        .to_df()
        .to_dict("records")
    )

# In the functions endpoint:
if is_white_list:
    try:
        file_mtime = os.path.getmtime(white_function_list_path)
        func_list = _load_whitelist_functions(white_function_list_path, file_mtime)
    except Exception as e:
        # error handling

This approach caches results while invalidating the cache when the file is modified.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 167ee7d and 2fedd35.

📒 Files selected for processing (5)
  • ibis-server/Dockerfile (1 hunks)
  • ibis-server/app/config.py (2 hunks)
  • ibis-server/app/routers/v3/connector.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • ibis-server/Dockerfile
🚧 Files skipped from review as they are similar to previous changes (3)
  • ibis-server/tests/routers/v3/connector/bigquery/conftest.py
  • ibis-server/tests/routers/v3/connector/bigquery/test_functions.py
  • ibis-server/app/config.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/routers/v3/connector.py (1)

3-3: ✅ DuckDB dependency verified

  • The duckdb import is declared in ibis-server/pyproject.toml (version 1.2.1).
  • No further changes to dependency management are needed; the import can be approved as is.

Copy link
Copy Markdown
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @douenergy. Overall looks good 🚀

@goldmedal
Copy link
Copy Markdown
Contributor

BigQuery has been tested locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants