feat(ibis): Introduce mysql function white list #1311
Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (2)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the WalkthroughAdds MySQL to the config whitelist, introduces a MySQL fuzzing JSON dataset, and updates MySQL connector tests to use the remote white-list and adjust expected function metadata and counts. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant Router as Router (v3)
participant Config as Config.get_data_source_is_white_list
participant Source as Remote/List Provider
Client->>Router: Request function list (data_source="mysql")
Router->>Config: get_data_source_is_white_list("mysql", path)
alt path unset
Config-->>Router: false
Router-->>Client: Return default/filtered list (no remote white-list)
else path set
Config-->>Router: true
Router->>Source: Fetch function_list + white_function_list (uses `mysql.json`)
Source-->>Router: Functions (incl. updated lcase metadata)
Router-->>Client: Return functions (count = 135)
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ibis-server/app/config.py (1)
68-70: Harden path validation (startswith is unsafe for directory containment checks)Using string startswith for path containment can be bypassed (e.g., with "../base_dir"). Use commonpath on absolute paths instead.
Apply to both methods:
- base_path = os.path.normpath(self.remote_function_list_path) + base_path = os.path.abspath(os.path.normpath(self.remote_function_list_path)) path = os.path.normpath(os.path.join(base_path, f"{data_source}.csv")) - if not path.startswith(base_path): + base_common = os.path.commonpath([base_path, os.path.abspath(path)]) + if base_common != base_path: raise ValueError("Invalid data source path")- base_path = os.path.normpath(self.remote_white_function_list_path) + base_path = os.path.abspath(os.path.normpath(self.remote_white_function_list_path)) path = os.path.normpath(os.path.join(base_path, f"{data_source}.csv")) - if not path.startswith(base_path): + base_common = os.path.commonpath([base_path, os.path.abspath(path)]) + if base_common != base_path: raise ValueError("Invalid data source path")Also applies to: 78-80
🧹 Nitpick comments (5)
ibis-server/app/config.py (1)
88-93: Avoid hardcoding whitelist; centralize and make case-safeMinor maintainability improvement: pull the allowed set into a module-level constant and compare against a normalized (lowercased) data_source.
+ALLOWED_WHITE_LIST_SOURCES = frozenset({"bigquery", "postgres", "mysql"}) @@ def get_data_source_is_white_list(self, data_source: str) -> bool: if not self.remote_white_function_list_path: return False - - return data_source in {"bigquery", "postgres", "mysql"} + return data_source.lower() in ALLOWED_WHITE_LIST_SOURCESibis-server/tests/routers/v3/connector/mysql/test_functions.py (1)
57-66: Derive expected function count from CSV to avoid brittle magic numberAdd at top of
test_functions.py:import osReplace
assert len(result) == 135with
# Dynamically compute expected number of functions from CSV expected = sum(1 for _ in open(os.path.join(function_list_path, "mysql.csv"))) - 1 assert len(result) == expectedKeep the existing metadata assertion for
"lcase".ibis-server/resources/fuzzing/mysql.json (3)
658-661: Low-effort wins: swap a few to MySQL-native equivalentsWhere MySQL has direct equivalents, switch to reduce false negatives:
- today() → current_date()
- to_unixtime(...) → unix_timestamp(...)
- random() → rand()
- to_char(ts, fmt) → date_format(ts, fmt)
- to_date('YYYY-MM-DD') → str_to_date('YYYY-MM-DD','%Y-%m-%d')
Example diffs:
- "sql": "SELECT today() AS result" + "sql": "SELECT current_date() AS result"- "sql": "SELECT to_unixtime('2023-01-01T12:00:00') AS result" + "sql": "SELECT unix_timestamp('2023-01-01 12:00:00') AS result"- "sql": "SELECT random() AS result" + "sql": "SELECT rand() AS result"- "sql": "SELECT to_char(now(), '%Y-%m-%d') AS result" + "sql": "SELECT date_format(now(), '%Y-%m-%d') AS result"- "sql": "SELECT to_date('2023-01-15') AS result" + "sql": "SELECT str_to_date('2023-01-15','%Y-%m-%d') AS result"Also applies to: 679-681, 1088-1091, 1223-1226, 1313-1316
8-11: 'decode' entry is not MySQL-style base64MySQL's base64 functions are FROM_BASE64/TO_BASE64. If this case is meant for base64, adjust accordingly; if it's Oracle/Trino-style DECODE, leave it to be filtered by the checker.
- "sql": "SELECT decode('aGVsbG8=', 'base64') AS result" + "sql": "SELECT from_base64('aGVsbG8=') AS result"
1-21: Tag or pre-filter non-MySQL functions in mysql.json
- mysql.json includes functions unsupported by MySQL (array_max, list_extract, array_join, list_distance, array_union, decode('…','base64'), contains); tag entries with their supported dialects or filter for MySQL before execution to avoid remote-check failures.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
ibis-server/resources/function_list/mysql.csvis excluded by!**/*.csvibis-server/resources/white_function_list/mysql.csvis excluded by!**/*.csv
📒 Files selected for processing (3)
ibis-server/app/config.py(1 hunks)ibis-server/resources/fuzzing/mysql.json(1 hunks)ibis-server/tests/routers/v3/connector/mysql/test_functions.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/config.py (1)
92-92: Add MySQL to whitelist — OKIncluding "mysql" in the whitelist matches the PR goal. No functional concerns here.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
ibis-server/tests/routers/v3/connector/mysql/test_functions.py (2)
45-51: Consolidate autouse fixtures to avoid ordering pitfalls and duplicationMerge both autouse fixtures into one so they always set/unset in lockstep and reduce boilerplate.
Apply this diff to replace both fixtures at Lines 37-51:
-@pytest.fixture(autouse=True) -def set_remote_function_list_path(): - config = get_config() - config.set_remote_function_list_path(function_list_path) - yield - config.set_remote_function_list_path(None) - -@pytest.fixture(autouse=True) -def set_remote_white_function_list_path(): - config = get_config() - config.set_remote_white_function_list_path(white_function_list_path) - yield - config.set_remote_white_function_list_path(None) +@pytest.fixture(autouse=True) +def set_remote_function_lists(): + config = get_config() + config.set_remote_function_list_path(function_list_path) + config.set_remote_white_function_list_path(white_function_list_path) + yield + config.set_remote_function_list_path(None) + config.set_remote_white_function_list_path(None)
68-68: Avoid brittle magic number in assertionReplace 135 with a named constant to make intent clear and ease future updates.
function_list_path = file_path("../resources/function_list") white_function_list_path = file_path("../resources/white_function_list") +EXPECTED_MYSQL_FUNCTION_COUNT = 135 @@ - assert len(result) == 135 + assert len(result) == EXPECTED_MYSQL_FUNCTION_COUNT
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ibis-server/tests/routers/v3/connector/mysql/test_functions.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/mysql/test_functions.py (3)
ibis-server/tests/conftest.py (1)
file_path(10-11)ibis-server/app/config.py (3)
set_remote_white_function_list_path(85-86)get_config(98-99)set_remote_function_list_path(82-83)ibis-server/tests/routers/v3/connector/postgres/conftest.py (1)
set_remote_function_list_path(71-77)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (5)
ibis-server/tests/routers/v3/connector/mysql/test_functions.py (5)
29-29: LGTM: Added white function list pathPath naming aligns with config setters and mirrors Postgres tests.
57-57: Good: explicit reset to baseline before first callPrevents fixture defaults from affecting the baseline count assertion.
64-64: Correct: white-list path is set together with function-list pathKeeps config in a consistent state for the combined behavior under test.
75-76: LGTM: metadata checks for lcaseVerifies param_types and return_type corrections from the white list.
80-80: LGTM: explicit cleanup at test endRedundant with fixture teardown but harmless; maintains clarity.
goldmedal
left a comment
There was a problem hiding this comment.
Thanks @douenergy. Look great
Co-authored-by: Jax Liu <liugs963@gmail.com>
following #1300
All functions in the whitelist are valid MySQL functions.
Summary by CodeRabbit
New Features
Tests