feat(ibis): Introduce bigquery white function list#1271
feat(ibis): Introduce bigquery white function list#1271goldmedal merged 4 commits intoCanner:mainfrom
Conversation
|
""" WalkthroughThis change introduces a new environment variable for configuring the BigQuery remote whitelist function list path, updates test resources and expected values to reflect this new path, and adds a comprehensive BigQuery SQL fuzzing dataset. Additionally, a new script is provided for automated testing of SQL queries against a remote API, with detailed error categorization and reporting. The Changes
Sequence Diagram(s)sequenceDiagram
participant Tester as white_remote_function.py
participant API as Remote API
participant Resource as bigquery.json
Tester->>Resource: Load test queries (JSON)
loop For each query
Tester->>API: POST SQL query with connection info
API-->>Tester: Response (success or error)
Tester->>Tester: Categorize and log result
end
Tester->>Tester: Summarize and output results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~18 minutes Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (1)
48-48: Consider making the expected count more maintainable.The hardcoded value
174may become brittle if the white function list is updated. Consider calculating this dynamically from the actual function list file or using a constant that's updated alongside the list.- assert len(result) == 174 + # Consider loading the actual function list and getting its length + # or defining a constant that's maintained alongside the list + assert len(result) == 174 # TODO: Make this dynamicibis-server/tools/white_remote_function.py (2)
212-212: Output filename could be more descriptive.The output filename
"api_test_results2.json"appears to have a version suffix. Consider using a timestamp or configuration-based naming.- output_file = "api_test_results2.json" + import datetime + timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") + output_file = f"api_test_results_{timestamp}.json"
164-164: Consider making the delay configurable.The hardcoded 0.1 second delay might not be optimal for all environments. Consider making this configurable.
- time.sleep(0.1) + delay = float(os.getenv("API_DELAY", "0.1")) + time.sleep(delay)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
ibis-server/resources/white_function_list/bigquery.csvis excluded by!**/*.csv
📒 Files selected for processing (5)
ibis-server/app/config.py(2 hunks)ibis-server/resources/fuzzing/bigquery.json(1 hunks)ibis-server/tests/routers/v3/connector/bigquery/conftest.py(1 hunks)ibis-server/tests/routers/v3/connector/bigquery/test_functions.py(2 hunks)ibis-server/tools/white_remote_function.py(1 hunks)
🔇 Additional comments (8)
ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)
13-13: LGTM! Path update aligns with the new white function list structure.The path change from
../resources/function_listto../resources/white_function_list/function_listcorrectly reflects the new directory structure for the BigQuery function whitelist.ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (1)
57-61: Function metadata updates look correct.The updated description "Concatenates strings with a separator" and return type "string" for the
string_aggfunction are more accurate and specific than the previous values.ibis-server/app/config.py (2)
23-25: Environment variable addition looks good.The new
REMOTE_WHITE_FUNCTION_LIST_PATHenvironment variable follows the existing pattern and is properly loaded during initialization.
70-72: BigQuery-specific path override is well-implemented.The conditional logic to use the white function list for BigQuery is clear and well-documented. The comment explains the purpose of the override.
ibis-server/resources/fuzzing/bigquery.json (1)
1-1556: Comprehensive BigQuery function test dataset.This JSON file provides excellent coverage of BigQuery functions with 311 well-structured test cases. The consistent format with
id,function_name, andsqlfields makes it easy to process programmatically.ibis-server/tools/white_remote_function.py (3)
6-15: Good error handling for file operations.The
load_test_queriesfunction properly handles bothFileNotFoundErrorandjson.JSONDecodeErrorexceptions with informative error messages.
17-33: HTTP request handling is solid.The
test_sql_queryfunction includes proper timeout configuration and exception handling for network errors. The payload structure matches the expected API format.
35-86: Comprehensive error categorization logic.The
categorize_errorfunction provides thorough pattern matching for different error types. The patterns cover a wide range of potential error messages.
fce3136 to
167ee7d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
ibis-server/app/routers/v3/connector.py (1)
382-389: Consider caching the CSV data for better performance.Reading the CSV file on every request could impact performance, especially for large whitelist files or high-frequency endpoint usage.
Consider implementing a simple caching mechanism:
# Add to module level from functools import lru_cache import os @lru_cache(maxsize=128) def _load_whitelist_functions(file_path: str, file_mtime: float) -> list: """Cache CSV loading based on file path and modification time.""" return ( duckdb.read_csv(file_path, header=True) .to_df() .to_dict("records") ) # In the functions endpoint: if is_white_list: try: file_mtime = os.path.getmtime(white_function_list_path) func_list = _load_whitelist_functions(white_function_list_path, file_mtime) except Exception as e: # error handlingThis approach caches results while invalidating the cache when the file is modified.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
ibis-server/Dockerfile(1 hunks)ibis-server/app/config.py(2 hunks)ibis-server/app/routers/v3/connector.py(2 hunks)ibis-server/tests/routers/v3/connector/bigquery/conftest.py(1 hunks)ibis-server/tests/routers/v3/connector/bigquery/test_functions.py(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- ibis-server/Dockerfile
🚧 Files skipped from review as they are similar to previous changes (3)
- ibis-server/tests/routers/v3/connector/bigquery/conftest.py
- ibis-server/tests/routers/v3/connector/bigquery/test_functions.py
- ibis-server/app/config.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/routers/v3/connector.py (1)
3-3: ✅ DuckDB dependency verified
- The
duckdbimport is declared inibis-server/pyproject.toml(version 1.2.1).- No further changes to dependency management are needed; the import can be approved as is.
goldmedal
left a comment
There was a problem hiding this comment.
Thanks @douenergy. Overall looks good 🚀
|
BigQuery has been tested locally. |
Add BigQuery Function Validation
Summary
Implements automated validation for BigQuery functions to prevent runtime errors from unsupported functions.
Changes
ibis-server/resources/fuzzing/bigquery.json- 311 SQL test cases for comprehensive function coverageibis-server/tools/white_remote_function.py- Validation tool that categorizes functions as:✅ Supported
❌ Unsupported (function not found)
🔧 Syntax errors (need SQL fixes)
Summary by CodeRabbit
New Features
Bug Fixes
Chores