Skip to content

fix(core-py): extract the used tables using the case-sensitive table name#1320

Merged
douenergy merged 1 commit intoCanner:mainfrom
goldmedal:fix/manifest-extract-case-sensitive
Sep 23, 2025
Merged

fix(core-py): extract the used tables using the case-sensitive table name#1320
douenergy merged 1 commit intoCanner:mainfrom
goldmedal:fix/manifest-extract-case-sensitive

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Sep 15, 2025

Description

We access the case-sensitive table name in SQL by #1305. However, we didn't use the same config for the ManifestExtractor. It causes the SQL submitted from the ibis server is still be normalized by the DataFusion parser.

This PR disables the normalization for the table extractor.

Summary by CodeRabbit

  • Bug Fixes

    • Preserves case sensitivity for table and column names during SQL parsing and execution.
    • Queries now correctly recognize mixed-case models and columns without requiring manual quoting.
    • Transformed SQL automatically adds quotes where needed to match manifest casing.
    • Postgres query results correctly map uppercase output columns to expected data types.
  • Tests

    • Added test coverage verifying case-sensitive handling of identifiers and SQL transformation across connectors.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 15, 2025

Walkthrough

Introduces case-sensitivity handling for identifiers: disables SQL identifier normalization in the Rust extractor’s SessionConfig and adds tests in Python and Ibis server to validate unquoted, case-sensitive model and column names are preserved and correctly resolved/quoted during transformation and execution.

Changes

Cohort / File(s) Summary
Rust extractor configuration
wren-core-py/src/extractor.rs
Constructs SessionContext with SessionConfig and sets options_mut().sql_parser.enable_ident_normalization = false before resolving used table names.
Core modeling tests
wren-core-py/tests/test_modeling_core.py
Adds test_case_sensitive_without_quote validating transformed SQL preserves case via quoted identifiers and correct nested SELECT structure for model and columns named with uppercase.
Ibis Postgres query tests
ibis-server/tests/routers/v3/connector/postgres/test_query.py
Adds test_case_sensitive_without_quote ensuring output column names with uppercase map to expected dtypes and execute without quotes, with cache enabled and WREN fallback disabled.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant API as Ibis Server API
  participant Extractor as ManifestExtractor
  participant DF as DataFusion Parser
  participant SC as SessionContext (config)

  Client->>API: SQL (e.g., select O_orderkey from Orders)
  API->>Extractor: resolve_used_table_names(SQL)
  Extractor->>SC: create with SessionConfig\n(sql_parser.enable_ident_normalization=false)
  Extractor->>DF: parse(SQL) preserving identifier case
  DF-->>Extractor: AST with original cases
  Extractor->>Extractor: resolve tables vs manifest (case-sensitive/quoted)
  Extractor-->>API: used tables + extracted manifest
  API->>SC: transform SQL with manifest context
  SC-->>Client: transformed SQL / execution result
  note over SC,DF: Identifiers preserved unless explicitly quoted
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

core, python, rust, ibis

Suggested reviewers

  • douenergy

Poem

A hare with keen and twitchy nose,
Hops through models’ Uppercase rows;
“Don’t trim my names!” the bunny cries,
“Quote them true, preserve their guise.”
Now tables, columns keep their grace—
I thump approval, case by case. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Linked Issues Check ⚠️ Warning The PR removes DataFusion's identifier normalization in the ManifestExtractor and adds tests that validate case-preserving extraction, which resolves the parser/extractor mismatch described in issue #1319, but the linked issue explicitly requests case-insensitive table-name matching (queries should succeed regardless of identifier case) and this PR does not change model lookup to be case-insensitive, so it only partially addresses the reported requirement. Either implement case-insensitive matching in model lookup functions (for example use eq_ignore_ascii_case in get_model/get_view/get_relationship) and add tests that assert mixed-case queries succeed, or update the issue/PR description to state that the chosen fix is to preserve SQL identifier case and add tests demonstrating the intended semantics across connectors.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title accurately and concisely describes the primary change—making the core Python extractor use case-sensitive table names when extracting used tables—so it reflects the code change in wren-core-py/src/extractor.rs and the added tests.
Out of Scope Changes Check ✅ Passed The changes are focused on identifier case handling: the extractor configuration change and the new tests in core and the server exercise case-preserving behavior, and there are no unrelated feature additions or refactors present in the provided diffs.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
wren-core-py/src/extractor.rs (2)

46-48: Ensure parser config parity with other contexts

This disables ident normalization (good for preserving case), but context.rs initializes SessionConfig::default().with_information_schema(true). For consistency and to avoid subtle parse/resolution drift, set information_schema here too.

Apply:

-    let mut config = wren_core::SessionConfig::new();
-    config.options_mut().sql_parser.enable_ident_normalization = false;
+    let mut config = wren_core::SessionConfig::default().with_information_schema(true);
+    // Preserve identifier case during parsing (must mirror transform paths).
+    config.options_mut().sql_parser.enable_ident_normalization = false;

46-48: Guard against future regressions

Add a focused unit test for resolve_used_table_names covering:

  • unquoted mixed-case table (Orders)
  • fully qualified quoted table ("my_catalog"."my_schema"."Orders")
  • lower/upper variants when manifest uses mixed-case
    This will catch DataFusion option or dialect changes early.
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)

1224-1265: Nice coverage for unquoted mixed‑case identifiers

Test correctly asserts preserved output dtypes for Orders/O_*. Consider asserting column names in result["columns"] to also lock casing on the wire, and add a negative (or separate) test documenting behavior when querying orders against manifest Orders (depending on confirmed product semantics).

wren-core-py/tests/test_modeling_core.py (2)

554-591: Good end‑to‑end assertion; reduce string‑match brittleness

The exact SQL string compare is brittle to harmless formatting/order changes. Prefer asserting key properties (quoted identifiers, source table path, selected columns) via regex or a lightweight parser/normalizer.

Example:

import re
assert re.search(r'SELECT\s+"Orders"\."O_orderkey"', actual)
assert re.search(r'FROM\s+main\.orders\s+AS\s+__source', actual)
for col in ["O_orderkey","O_custkey","O_orderdate"]:
    assert f'"Orders"."{col}"' in actual

577-584: Use helper for base64 encoding for consistency

You can use to_json_base64(extracted_manifest) directly to avoid double encoding logic divergence across tests.

-encoded_str = to_json_base64(extracted_manifest)
+encoded_str = to_json_base64(extracted_manifest)

(Keep using the helper earlier for the initial manifest as well, if desired.)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42a4779 and a5bb5a4.

⛔ Files ignored due to path filters (1)
  • wren-core-py/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (1 hunks)
  • wren-core-py/src/extractor.rs (1 hunks)
  • wren-core-py/tests/test_modeling_core.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
wren-core-py/src/extractor.rs (1)
wren-core-py/src/context.rs (1)
  • new (81-203)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)
wren-core-py/tests/test_modeling_core.py (1)
  • test_case_sensitive_without_quote (554-590)
wren-core-py/tests/test_modeling_core.py (3)
wren-core-py/src/extractor.rs (4)
  • extractor (237-239)
  • resolve_used_table_names (32-34)
  • resolve_used_table_names (45-67)
  • extract_by (40-42)
wren-core-py/src/manifest.rs (1)
  • to_json_base64 (10-14)
wren-core-py/src/context.rs (1)
  • transform_sql (207-223)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test
  • GitHub Check: ci
🔇 Additional comments (1)
wren-core-py/src/extractor.rs (1)

46-48: ```shell
#!/bin/bash
set -euo pipefail

echo "PWD: $(pwd)"
echo
echo "=== Show file: wren-core-py/src/extractor.rs (first 300 lines) ==="
if [ -f wren-core-py/src/extractor.rs ]; then
sed -n '1,300p' wren-core-py/src/extractor.rs || true
else
echo "wren-core-py/src/extractor.rs not found"
fi

echo
echo "=== Search for enable_ident_normalization across repository ==="
rg -n --hidden --no-ignore -S "enable_ident_normalization" || true

echo
echo "=== Search for sql_parser, SqlParserOptions, normalization keywords ==="
rg -n --hidden --no-ignore -S "sql_parser|SqlParserOptions|enable_options_value_normalization|enable_ident_normalization|normaliz|normalize|to_lowercase" || true

echo
echo "=== Search for SessionConfig / SessionContext usage ==="
rg -n --hidden --no-ignore -S "SessionConfig|SessionContext::new_with_config|SessionContext" || true

echo
echo "=== Search for extract_by, extractor, matching logic ==="
rg -n --hidden --no-ignore -S "extract_by(|extract_by\b|extractor|match_model|match_table|case_sensitive" || true

echo
echo "=== Show any files mentioning 'ident' or 'identifier' around sql parser usage ==="
rg -n --hidden --no-ignore -S "ident|identifier|ident_normalization" || true

echo
echo "=== Completed ==="


</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@goldmedal goldmedal requested a review from douenergy September 16, 2025 02:06
@douenergy douenergy merged commit 9524684 into Canner:main Sep 23, 2025
6 checks passed
nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Table name case sensitivity causes "table not found" errors in MySQL connector

2 participants