Skip to content

feat(ibis): Introduce mysql function white list #1311

Merged
goldmedal merged 6 commits intoCanner:mainfrom
douenergy:mysql-function-white-list
Sep 10, 2025
Merged

feat(ibis): Introduce mysql function white list #1311
goldmedal merged 6 commits intoCanner:mainfrom
douenergy:mysql-function-white-list

Conversation

@douenergy
Copy link
Copy Markdown
Contributor

@douenergy douenergy commented Sep 8, 2025

following #1300

All functions in the whitelist are valid MySQL functions.

  1. We first create function test cases in ibis-server/resources/fuzzing/mysql.json, which are generated from the Wren Core original function list.
  2. Run remote_function_check.py with ibis-server/resources/fuzzing/mysql.json to get all valid mysql functions.
  3. Combine the functions from step 2 with the current function list, making sure no regression occurs.

Summary by CodeRabbit

  • New Features

    • MySQL is now an allowed data source.
    • Function catalog expanded; MySQL function coverage increased to 135 entries and lcase now reports varchar parameter/return types.
  • Tests

    • Added a comprehensive MySQL fuzz dataset to improve function coverage.
    • Tests updated to reflect the new function count and metadata.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 8, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (2)
  • ibis-server/resources/function_list/mysql.csv is excluded by !**/*.csv
  • ibis-server/resources/white_function_list/mysql.csv is excluded by !**/*.csv

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds MySQL to the config whitelist, introduces a MySQL fuzzing JSON dataset, and updates MySQL connector tests to use the remote white-list and adjust expected function metadata and counts.

Changes

Cohort / File(s) Summary of Changes
Config whitelist update
ibis-server/app/config.py
Expanded allowed data sources in get_data_source_is_white_list to include "mysql"; logic otherwise unchanged.
Fuzzing dataset (MySQL)
ibis-server/resources/fuzzing/mysql.json
Added static JSON with ~280 MySQL function fuzz cases (fields: id, function_name, SQL snippet); no runtime code changes.
MySQL tests adjustment
ibis-server/tests/routers/v3/connector/mysql/test_functions.py
Added white_function_list_path handling and autouse fixture; updated expected function count to 135 when remote white-list is used; adjusted "lcase" metadata (param_types and return_type now "varchar").

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Router as Router (v3)
  participant Config as Config.get_data_source_is_white_list
  participant Source as Remote/List Provider

  Client->>Router: Request function list (data_source="mysql")
  Router->>Config: get_data_source_is_white_list("mysql", path)
  alt path unset
    Config-->>Router: false
    Router-->>Client: Return default/filtered list (no remote white-list)
  else path set
    Config-->>Router: true
    Router->>Source: Fetch function_list + white_function_list (uses `mysql.json`)
    Source-->>Router: Functions (incl. updated lcase metadata)
    Router-->>Client: Return functions (count = 135)
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • goldmedal

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately summarizes the primary change by indicating the introduction of a MySQL function whitelist within the ibis scope, following conventional commit style and clearly reflecting the main purpose of the changeset.
Description Check ✅ Passed The description directly outlines the steps taken to generate the fuzzing test cases, validate MySQL functions, and integrate them with the existing list, clearly referencing relevant files and processes that align with the changes in the pull request.

Poem

A rabbit hops through code so spry,
MySQL joins the curated sky.
Fuzzing seeds and tests aligned,
Lcase learns its varchar mind.
I nibble bytes and hum—deploy! 🥕

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels Sep 8, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ibis-server/app/config.py (1)

68-70: Harden path validation (startswith is unsafe for directory containment checks)

Using string startswith for path containment can be bypassed (e.g., with "../base_dir"). Use commonpath on absolute paths instead.

Apply to both methods:

-        base_path = os.path.normpath(self.remote_function_list_path)
+        base_path = os.path.abspath(os.path.normpath(self.remote_function_list_path))
         path = os.path.normpath(os.path.join(base_path, f"{data_source}.csv"))
-        if not path.startswith(base_path):
+        base_common = os.path.commonpath([base_path, os.path.abspath(path)])
+        if base_common != base_path:
             raise ValueError("Invalid data source path")
-            base_path = os.path.normpath(self.remote_white_function_list_path)
+            base_path = os.path.abspath(os.path.normpath(self.remote_white_function_list_path))
             path = os.path.normpath(os.path.join(base_path, f"{data_source}.csv"))
-            if not path.startswith(base_path):
+            base_common = os.path.commonpath([base_path, os.path.abspath(path)])
+            if base_common != base_path:
                 raise ValueError("Invalid data source path")

Also applies to: 78-80

🧹 Nitpick comments (5)
ibis-server/app/config.py (1)

88-93: Avoid hardcoding whitelist; centralize and make case-safe

Minor maintainability improvement: pull the allowed set into a module-level constant and compare against a normalized (lowercased) data_source.

+ALLOWED_WHITE_LIST_SOURCES = frozenset({"bigquery", "postgres", "mysql"})
@@
     def get_data_source_is_white_list(self, data_source: str) -> bool:
         if not self.remote_white_function_list_path:
             return False
-
-        return data_source in {"bigquery", "postgres", "mysql"}
+        return data_source.lower() in ALLOWED_WHITE_LIST_SOURCES
ibis-server/tests/routers/v3/connector/mysql/test_functions.py (1)

57-66: Derive expected function count from CSV to avoid brittle magic number

Add at top of test_functions.py:

import os

Replace

assert len(result) == 135

with

# Dynamically compute expected number of functions from CSV
expected = sum(1 for _ in open(os.path.join(function_list_path, "mysql.csv"))) - 1
assert len(result) == expected

Keep the existing metadata assertion for "lcase".

ibis-server/resources/fuzzing/mysql.json (3)

658-661: Low-effort wins: swap a few to MySQL-native equivalents

Where MySQL has direct equivalents, switch to reduce false negatives:

  • today() → current_date()
  • to_unixtime(...) → unix_timestamp(...)
  • random() → rand()
  • to_char(ts, fmt) → date_format(ts, fmt)
  • to_date('YYYY-MM-DD') → str_to_date('YYYY-MM-DD','%Y-%m-%d')

Example diffs:

-    "sql": "SELECT today() AS result"
+    "sql": "SELECT current_date() AS result"
-    "sql": "SELECT to_unixtime('2023-01-01T12:00:00') AS result"
+    "sql": "SELECT unix_timestamp('2023-01-01 12:00:00') AS result"
-    "sql": "SELECT random() AS result"
+    "sql": "SELECT rand() AS result"
-    "sql": "SELECT to_char(now(), '%Y-%m-%d') AS result"
+    "sql": "SELECT date_format(now(), '%Y-%m-%d') AS result"
-    "sql": "SELECT to_date('2023-01-15') AS result"
+    "sql": "SELECT str_to_date('2023-01-15','%Y-%m-%d') AS result"

Also applies to: 679-681, 1088-1091, 1223-1226, 1313-1316


8-11: 'decode' entry is not MySQL-style base64

MySQL's base64 functions are FROM_BASE64/TO_BASE64. If this case is meant for base64, adjust accordingly; if it's Oracle/Trino-style DECODE, leave it to be filtered by the checker.

-    "sql": "SELECT decode('aGVsbG8=', 'base64') AS result"
+    "sql": "SELECT from_base64('aGVsbG8=') AS result"

1-21: Tag or pre-filter non-MySQL functions in mysql.json

  • mysql.json includes functions unsupported by MySQL (array_max, list_extract, array_join, list_distance, array_union, decode('…','base64'), contains); tag entries with their supported dialects or filter for MySQL before execution to avoid remote-check failures.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b08706d and b892069.

⛔ Files ignored due to path filters (2)
  • ibis-server/resources/function_list/mysql.csv is excluded by !**/*.csv
  • ibis-server/resources/white_function_list/mysql.csv is excluded by !**/*.csv
📒 Files selected for processing (3)
  • ibis-server/app/config.py (1 hunks)
  • ibis-server/resources/fuzzing/mysql.json (1 hunks)
  • ibis-server/tests/routers/v3/connector/mysql/test_functions.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/config.py (1)

92-92: Add MySQL to whitelist — OK

Including "mysql" in the whitelist matches the PR goal. No functional concerns here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
ibis-server/tests/routers/v3/connector/mysql/test_functions.py (2)

45-51: Consolidate autouse fixtures to avoid ordering pitfalls and duplication

Merge both autouse fixtures into one so they always set/unset in lockstep and reduce boilerplate.

Apply this diff to replace both fixtures at Lines 37-51:

-@pytest.fixture(autouse=True)
-def set_remote_function_list_path():
-    config = get_config()
-    config.set_remote_function_list_path(function_list_path)
-    yield
-    config.set_remote_function_list_path(None)
-
-@pytest.fixture(autouse=True)
-def set_remote_white_function_list_path():
-    config = get_config()
-    config.set_remote_white_function_list_path(white_function_list_path)
-    yield
-    config.set_remote_white_function_list_path(None)
+@pytest.fixture(autouse=True)
+def set_remote_function_lists():
+    config = get_config()
+    config.set_remote_function_list_path(function_list_path)
+    config.set_remote_white_function_list_path(white_function_list_path)
+    yield
+    config.set_remote_function_list_path(None)
+    config.set_remote_white_function_list_path(None)

68-68: Avoid brittle magic number in assertion

Replace 135 with a named constant to make intent clear and ease future updates.

 function_list_path = file_path("../resources/function_list")
 white_function_list_path = file_path("../resources/white_function_list")
+EXPECTED_MYSQL_FUNCTION_COUNT = 135
@@
-    assert len(result) == 135
+    assert len(result) == EXPECTED_MYSQL_FUNCTION_COUNT
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b892069 and 306da73.

📒 Files selected for processing (1)
  • ibis-server/tests/routers/v3/connector/mysql/test_functions.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/mysql/test_functions.py (3)
ibis-server/tests/conftest.py (1)
  • file_path (10-11)
ibis-server/app/config.py (3)
  • set_remote_white_function_list_path (85-86)
  • get_config (98-99)
  • set_remote_function_list_path (82-83)
ibis-server/tests/routers/v3/connector/postgres/conftest.py (1)
  • set_remote_function_list_path (71-77)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (5)
ibis-server/tests/routers/v3/connector/mysql/test_functions.py (5)

29-29: LGTM: Added white function list path

Path naming aligns with config setters and mirrors Postgres tests.


57-57: Good: explicit reset to baseline before first call

Prevents fixture defaults from affecting the baseline count assertion.


64-64: Correct: white-list path is set together with function-list path

Keeps config in a consistent state for the combined behavior under test.


75-76: LGTM: metadata checks for lcase

Verifies param_types and return_type corrections from the white list.


80-80: LGTM: explicit cleanup at test end

Redundant with fixture teardown but harmless; maintains clarity.

@douenergy douenergy requested a review from goldmedal September 9, 2025 01:00
@douenergy douenergy requested a review from goldmedal September 10, 2025 05:43
@douenergy douenergy requested a review from goldmedal September 10, 2025 06:40
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @douenergy. Look great

@goldmedal goldmedal merged commit 551f2c8 into Canner:main Sep 10, 2025
4 checks passed
nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026
Co-authored-by: Jax Liu <liugs963@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants