fix(core-py): extract the used tables using the case-sensitive table name#1320
Conversation
WalkthroughIntroduces case-sensitivity handling for identifiers: disables SQL identifier normalization in the Rust extractor’s SessionConfig and adds tests in Python and Ibis server to validate unquoted, case-sensitive model and column names are preserved and correctly resolved/quoted during transformation and execution. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as Ibis Server API
participant Extractor as ManifestExtractor
participant DF as DataFusion Parser
participant SC as SessionContext (config)
Client->>API: SQL (e.g., select O_orderkey from Orders)
API->>Extractor: resolve_used_table_names(SQL)
Extractor->>SC: create with SessionConfig\n(sql_parser.enable_ident_normalization=false)
Extractor->>DF: parse(SQL) preserving identifier case
DF-->>Extractor: AST with original cases
Extractor->>Extractor: resolve tables vs manifest (case-sensitive/quoted)
Extractor-->>API: used tables + extracted manifest
API->>SC: transform SQL with manifest context
SC-->>Client: transformed SQL / execution result
note over SC,DF: Identifiers preserved unless explicitly quoted
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
wren-core-py/src/extractor.rs (2)
46-48: Ensure parser config parity with other contextsThis disables ident normalization (good for preserving case), but
context.rsinitializesSessionConfig::default().with_information_schema(true). For consistency and to avoid subtle parse/resolution drift, set information_schema here too.Apply:
- let mut config = wren_core::SessionConfig::new(); - config.options_mut().sql_parser.enable_ident_normalization = false; + let mut config = wren_core::SessionConfig::default().with_information_schema(true); + // Preserve identifier case during parsing (must mirror transform paths). + config.options_mut().sql_parser.enable_ident_normalization = false;
46-48: Guard against future regressionsAdd a focused unit test for
resolve_used_table_namescovering:
- unquoted mixed-case table (
Orders)- fully qualified quoted table (
"my_catalog"."my_schema"."Orders")- lower/upper variants when manifest uses mixed-case
This will catch DataFusion option or dialect changes early.ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)
1224-1265: Nice coverage for unquoted mixed‑case identifiersTest correctly asserts preserved output dtypes for
Orders/O_*. Consider asserting column names inresult["columns"]to also lock casing on the wire, and add a negative (or separate) test documenting behavior when queryingordersagainst manifestOrders(depending on confirmed product semantics).wren-core-py/tests/test_modeling_core.py (2)
554-591: Good end‑to‑end assertion; reduce string‑match brittlenessThe exact SQL string compare is brittle to harmless formatting/order changes. Prefer asserting key properties (quoted identifiers, source table path, selected columns) via regex or a lightweight parser/normalizer.
Example:
import re assert re.search(r'SELECT\s+"Orders"\."O_orderkey"', actual) assert re.search(r'FROM\s+main\.orders\s+AS\s+__source', actual) for col in ["O_orderkey","O_custkey","O_orderdate"]: assert f'"Orders"."{col}"' in actual
577-584: Use helper for base64 encoding for consistencyYou can use
to_json_base64(extracted_manifest)directly to avoid double encoding logic divergence across tests.-encoded_str = to_json_base64(extracted_manifest) +encoded_str = to_json_base64(extracted_manifest)(Keep using the helper earlier for the initial manifest as well, if desired.)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
wren-core-py/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
ibis-server/tests/routers/v3/connector/postgres/test_query.py(1 hunks)wren-core-py/src/extractor.rs(1 hunks)wren-core-py/tests/test_modeling_core.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
wren-core-py/src/extractor.rs (1)
wren-core-py/src/context.rs (1)
new(81-203)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)
wren-core-py/tests/test_modeling_core.py (1)
test_case_sensitive_without_quote(554-590)
wren-core-py/tests/test_modeling_core.py (3)
wren-core-py/src/extractor.rs (4)
extractor(237-239)resolve_used_table_names(32-34)resolve_used_table_names(45-67)extract_by(40-42)wren-core-py/src/manifest.rs (1)
to_json_base64(10-14)wren-core-py/src/context.rs (1)
transform_sql(207-223)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test
- GitHub Check: ci
🔇 Additional comments (1)
wren-core-py/src/extractor.rs (1)
46-48: ```shell
#!/bin/bash
set -euo pipefailecho "PWD: $(pwd)"
echo
echo "=== Show file: wren-core-py/src/extractor.rs (first 300 lines) ==="
if [ -f wren-core-py/src/extractor.rs ]; then
sed -n '1,300p' wren-core-py/src/extractor.rs || true
else
echo "wren-core-py/src/extractor.rs not found"
fiecho
echo "=== Search for enable_ident_normalization across repository ==="
rg -n --hidden --no-ignore -S "enable_ident_normalization" || trueecho
echo "=== Search for sql_parser, SqlParserOptions, normalization keywords ==="
rg -n --hidden --no-ignore -S "sql_parser|SqlParserOptions|enable_options_value_normalization|enable_ident_normalization|normaliz|normalize|to_lowercase" || trueecho
echo "=== Search for SessionConfig / SessionContext usage ==="
rg -n --hidden --no-ignore -S "SessionConfig|SessionContext::new_with_config|SessionContext" || trueecho
echo "=== Search for extract_by, extractor, matching logic ==="
rg -n --hidden --no-ignore -S "extract_by(|extract_by\b|extractor|match_model|match_table|case_sensitive" || trueecho
echo "=== Show any files mentioning 'ident' or 'identifier' around sql parser usage ==="
rg -n --hidden --no-ignore -S "ident|identifier|ident_normalization" || trueecho
echo "=== Completed ==="</blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Description
We access the case-sensitive table name in SQL by #1305. However, we didn't use the same config for the
ManifestExtractor. It causes the SQL submitted from the ibis server is still be normalized by the DataFusion parser.This PR disables the normalization for the table extractor.
Summary by CodeRabbit
Bug Fixes
Tests