fix(core-py): extract the used tables using the case-sensitive table name by goldmedal · Pull Request #1320 · Canner/wren-engine

goldmedal · 2025-09-15T10:11:58Z

Description

close Table name case sensitivity causes "table not found" errors in MySQL connector #1319

We access the case-sensitive table name in SQL by #1305. However, we didn't use the same config for the ManifestExtractor. It causes the SQL submitted from the ibis server is still be normalized by the DataFusion parser.

This PR disables the normalization for the table extractor.

Summary by CodeRabbit

Bug Fixes
- Preserves case sensitivity for table and column names during SQL parsing and execution.
- Queries now correctly recognize mixed-case models and columns without requiring manual quoting.
- Transformed SQL automatically adds quotes where needed to match manifest casing.
- Postgres query results correctly map uppercase output columns to expected data types.
Tests
- Added test coverage verifying case-sensitive handling of identifiers and SQL transformation across connectors.

coderabbitai · 2025-09-15T10:12:06Z

Walkthrough

Introduces case-sensitivity handling for identifiers: disables SQL identifier normalization in the Rust extractor’s SessionConfig and adds tests in Python and Ibis server to validate unquoted, case-sensitive model and column names are preserved and correctly resolved/quoted during transformation and execution.

Changes

Cohort / File(s)	Summary
Rust extractor configuration `wren-core-py/src/extractor.rs`	Constructs SessionContext with SessionConfig and sets `options_mut().sql_parser.enable_ident_normalization = false` before resolving used table names.
Core modeling tests `wren-core-py/tests/test_modeling_core.py`	Adds `test_case_sensitive_without_quote` validating transformed SQL preserves case via quoted identifiers and correct nested SELECT structure for model and columns named with uppercase.
Ibis Postgres query tests `ibis-server/tests/routers/v3/connector/postgres/test_query.py`	Adds `test_case_sensitive_without_quote` ensuring output column names with uppercase map to expected dtypes and execute without quotes, with cache enabled and WREN fallback disabled.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant API as Ibis Server API
  participant Extractor as ManifestExtractor
  participant DF as DataFusion Parser
  participant SC as SessionContext (config)

  Client->>API: SQL (e.g., select O_orderkey from Orders)
  API->>Extractor: resolve_used_table_names(SQL)
  Extractor->>SC: create with SessionConfig\n(sql_parser.enable_ident_normalization=false)
  Extractor->>DF: parse(SQL) preserving identifier case
  DF-->>Extractor: AST with original cases
  Extractor->>Extractor: resolve tables vs manifest (case-sensitive/quoted)
  Extractor-->>API: used tables + extracted manifest
  API->>SC: transform SQL with manifest context
  SC-->>Client: transformed SQL / execution result
  note over SC,DF: Identifiers preserved unless explicitly quoted

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

chore(tool): add manifest extract for query_local_run #1191 — Both adjust manifest extraction and used-table resolution paths.
feat(core): disable enable_ident_normalization for case-sensitive identifier by default #1305 — Also modifies DataFusion identifier normalization behavior and adds related tests.
fix(core): fix the uppercase table name of the model tableReference #1173 — Addresses identifier case-sensitivity and quoting preservation across parsing/transform phases.

Suggested labels

core, python, rust, ibis

Suggested reviewers

douenergy

Poem

A hare with keen and twitchy nose,
Hops through models’ Uppercase rows;
“Don’t trim my names!” the bunny cries,
“Quote them true, preserve their guise.”
Now tables, columns keep their grace—
I thump approval, case by case. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues Check	⚠️ Warning	The PR removes DataFusion's identifier normalization in the ManifestExtractor and adds tests that validate case-preserving extraction, which resolves the parser/extractor mismatch described in issue #1319, but the linked issue explicitly requests case-insensitive table-name matching (queries should succeed regardless of identifier case) and this PR does not change model lookup to be case-insensitive, so it only partially addresses the reported requirement.	Either implement case-insensitive matching in model lookup functions (for example use eq_ignore_ascii_case in get_model/get_view/get_relationship) and add tests that assert mixed-case queries succeed, or update the issue/PR description to state that the chosen fix is to preserve SQL identifier case and add tests demonstrating the intended semantics across connectors.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title accurately and concisely describes the primary change—making the core Python extractor use case-sensitive table names when extracting used tables—so it reflects the code change in wren-core-py/src/extractor.rs and the added tests.
Out of Scope Changes Check	✅ Passed	The changes are focused on identifier case handling: the extractor configuration change and the new tests in core and the server exercise case-preserving behavior, and there are no unrelated feature additions or refactors present in the provided diffs.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

wren-core-py/src/extractor.rs (2)
46-48: Ensure parser config parity with other contexts

This disables ident normalization (good for preserving case), but context.rs initializes SessionConfig::default().with_information_schema(true). For consistency and to avoid subtle parse/resolution drift, set information_schema here too.

Apply:
-    let mut config = wren_core::SessionConfig::new();
-    config.options_mut().sql_parser.enable_ident_normalization = false;
+    let mut config = wren_core::SessionConfig::default().with_information_schema(true);
+    // Preserve identifier case during parsing (must mirror transform paths).
+    config.options_mut().sql_parser.enable_ident_normalization = false;
46-48: Guard against future regressions

Add a focused unit test for resolve_used_table_names covering:

unquoted mixed-case table (Orders)

fully qualified quoted table ("my_catalog"."my_schema"."Orders")

lower/upper variants when manifest uses mixed-case
This will catch DataFusion option or dialect changes early.
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)

1224-1265: Nice coverage for unquoted mixed‑case identifiers

Test correctly asserts preserved output dtypes for Orders/O_*. Consider asserting column names in result["columns"] to also lock casing on the wire, and add a negative (or separate) test documenting behavior when querying orders against manifest Orders (depending on confirmed product semantics).
wren-core-py/tests/test_modeling_core.py (2)
554-591: Good end‑to‑end assertion; reduce string‑match brittleness

The exact SQL string compare is brittle to harmless formatting/order changes. Prefer asserting key properties (quoted identifiers, source table path, selected columns) via regex or a lightweight parser/normalizer.

Example:
import re
assert re.search(r'SELECT\s+"Orders"\."O_orderkey"', actual)
assert re.search(r'FROM\s+main\.orders\s+AS\s+__source', actual)
for col in ["O_orderkey","O_custkey","O_orderdate"]:
    assert f'"Orders"."{col}"' in actual
577-584: Use helper for base64 encoding for consistency

You can use to_json_base64(extracted_manifest) directly to avoid double encoding logic divergence across tests.
-encoded_str = to_json_base64(extracted_manifest)
+encoded_str = to_json_base64(extracted_manifest)
(Keep using the helper earlier for the initial manifest as well, if desired.)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42a4779 and a5bb5a4.

⛔ Files ignored due to path filters (1)

wren-core-py/poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

ibis-server/tests/routers/v3/connector/postgres/test_query.py (1 hunks)
wren-core-py/src/extractor.rs (1 hunks)
wren-core-py/tests/test_modeling_core.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

wren-core-py/src/extractor.rs (1)

wren-core-py/src/context.rs (1)

new (81-203)

ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)

wren-core-py/tests/test_modeling_core.py (1)

test_case_sensitive_without_quote (554-590)

wren-core-py/tests/test_modeling_core.py (3)

wren-core-py/src/extractor.rs (4)

extractor (237-239)

resolve_used_table_names (32-34)

resolve_used_table_names (45-67)

extract_by (40-42)

wren-core-py/src/manifest.rs (1)

to_json_base64 (10-14)

wren-core-py/src/context.rs (1)

transform_sql (207-223)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test
GitHub Check: ci

🔇 Additional comments (1)

wren-core-py/src/extractor.rs (1)
46-48: ```shell
#!/bin/bash
set -euo pipefail

echo "PWD: $(pwd)"
echo
echo "=== Show file: wren-core-py/src/extractor.rs (first 300 lines) ==="
if [ -f wren-core-py/src/extractor.rs ]; then
sed -n '1,300p' wren-core-py/src/extractor.rs || true
else
echo "wren-core-py/src/extractor.rs not found"
fi

echo
echo "=== Search for enable_ident_normalization across repository ==="
rg -n --hidden --no-ignore -S "enable_ident_normalization" || true

echo
echo "=== Search for sql_parser, SqlParserOptions, normalization keywords ==="
rg -n --hidden --no-ignore -S "sql_parser|SqlParserOptions|enable_options_value_normalization|enable_ident_normalization|normaliz|normalize|to_lowercase" || true

echo
echo "=== Search for SessionConfig / SessionContext usage ==="
rg -n --hidden --no-ignore -S "SessionConfig|SessionContext::new_with_config|SessionContext" || true

echo
echo "=== Search for extract_by, extractor, matching logic ==="
rg -n --hidden --no-ignore -S "extract_by(|extract_by\b|extractor|match_model|match_table|case_sensitive" || true

echo
echo "=== Show any files mentioning 'ident' or 'identifier' around sql parser usage ==="
rg -n --hidden --no-ignore -S "ident|identifier|ident_normalization" || true

echo
echo "=== Completed ==="
</blockquote></details>

</blockquote></details>

</details>

…name (Canner#1320)

fix manifest extractor for case sensitive extracting

a5bb5a4

github-actions bot added ibis python Pull requests that update Python code labels Sep 15, 2025

goldmedal mentioned this pull request Sep 15, 2025

Table name case sensitivity causes "table not found" errors in MySQL connector #1319

Closed

coderabbitai bot reviewed Sep 15, 2025

View reviewed changes

goldmedal requested a review from douenergy September 16, 2025 02:06

douenergy approved these changes Sep 23, 2025

View reviewed changes

douenergy merged commit 9524684 into Canner:main Sep 23, 2025
6 checks passed

nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026

fix(core-py): extract the used tables using the case-sensitive table …

64318c0

…name (Canner#1320)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core-py): extract the used tables using the case-sensitive table name#1320

fix(core-py): extract the used tables using the case-sensitive table name#1320
douenergy merged 1 commit intoCanner:mainfrom
goldmedal:fix/manifest-extract-case-sensitive

goldmedal commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

goldmedal commented Sep 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

goldmedal commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 15, 2025 •

edited

Loading