Skip to content

fix(core): support the Unicode literal for MSSQL#1338

Merged
douenergy merged 1 commit intoCanner:mainfrom
goldmedal:feat/mssql-nliteral
Oct 3, 2025
Merged

fix(core): support the Unicode literal for MSSQL#1338
douenergy merged 1 commit intoCanner:mainfrom
goldmedal:feat/mssql-nliteral

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Oct 2, 2025

Summary by CodeRabbit

  • New Features

    • Added support for Unicode string literals in MSSQL queries, ensuring correct handling of non-ASCII characters.
  • Bug Fixes

    • Fixed mismatches between Unicode and ASCII string comparisons in MSSQL queries, improving result accuracy.
  • Tests

    • Introduced end-to-end tests for MSSQL that validate Unicode and ASCII literal handling via the query API.
    • Seeded sample data and created a Unicode test table in the MSSQL test environment to ensure reliable coverage.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 2, 2025

Walkthrough

Adds MSSQL Unicode string literal support in the core dialect by introducing MsSqlDialect and a new dialect hook, wires it into dialect selection, and adds unit tests. Extends Ibis tests: seeds MSSQL with data and verifies Unicode query behavior via the v3 connector.

Changes

Cohort / File(s) Summary
MSSQL test fixture
ibis-server/tests/routers/v3/connector/mssql/conftest.py
Creates SQLAlchemy engine (ODBC Driver 18), seeds orders table from Parquet, creates unicode_test table, inserts Unicode/ASCII rows, retains container finalizer.
MSSQL connector query tests
ibis-server/tests/routers/v3/connector/mssql/test_query.py
Adds tests hitting /query validating Unicode literal handling and ASCII path with encoded manifest; asserts expected rows and 200 responses.
Core dialect: inner
wren-core/core/src/mdl/dialect/inner_dialect.rs
Adds MsSqlDialect implementing InnerDialect; introduces to_unicode_string_literal hook; updates get_inner_dialect to return MsSqlDialect for MSSQL; emits NationalStringLiteral for non-ASCII.
Core dialect: wren
wren-core/core/src/mdl/dialect/wren_dialect.rs
Delegates Dialect::to_unicode_string_literal to inner_dialect.
Core tests
wren-core/core/src/mdl/mod.rs
Adds unit test asserting Unicode literal SQL generation for default and MSSQL sources.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant API as Query Engine
  participant WD as WrenDialect
  participant ID as InnerDialect (MsSql)
  participant SQL as SQL Generator
  participant DB as MSSQL

  Client->>API: Submit SQL with Unicode literal
  API->>WD: Build dialect-specific AST
  WD->>ID: to_unicode_string_literal("…")
  alt Non-ASCII input
    ID-->>WD: NationalStringLiteral (N'…')
  else ASCII input
    ID-->>WD: None (use default)
  end
  WD->>SQL: Generate SQL
  SQL->>DB: Execute
  DB-->>API: Result rows
  API-->>Client: 200 OK + data
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

core, ibis, rust, python

Suggested reviewers

  • douenergy

Poem

A nibble of N and a sprinkle of rune,
I hop through queries by Unicode moon.
Parquet to tables, tests in a row,
Dialects decide where literals go.
With whiskers twitching, I ship this feat—
N'notes and rows, so crisp and neat. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly summarizes the primary change of adding Unicode literal support for MSSQL in the core engine, using clear and specific wording without unnecessary details. It accurately reflects the main focus of the pull request.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95bc169 and 91a59d4.

⛔ Files ignored due to path filters (1)
  • wren-core-py/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • ibis-server/tests/routers/v3/connector/mssql/conftest.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/mssql/test_query.py (1 hunks)
  • wren-core/core/src/mdl/dialect/inner_dialect.rs (3 hunks)
  • wren-core/core/src/mdl/dialect/wren_dialect.rs (1 hunks)
  • wren-core/core/src/mdl/mod.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
wren-core/core/src/mdl/dialect/wren_dialect.rs (1)
wren-core/core/src/mdl/dialect/inner_dialect.rs (2)
  • to_unicode_string_literal (68-70)
  • to_unicode_string_literal (318-326)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)
wren-core/core/src/mdl/dialect/wren_dialect.rs (2)
  • to_unicode_string_literal (100-102)
  • new (112-116)
wren-core/core/src/mdl/mod.rs (2)
ibis-server/tests/routers/v3/connector/mssql/test_query.py (1)
  • test_unicode_literal (36-65)
wren-core/core/src/mdl/dialect/wren_dialect.rs (2)
  • new (112-116)
  • default (106-108)
ibis-server/tests/routers/v3/connector/mssql/test_query.py (2)
ibis-server/tests/routers/v3/connector/mssql/conftest.py (2)
  • mssql (24-44)
  • connection_info (48-56)
ibis-server/tests/conftest.py (1)
  • client (18-23)
ibis-server/tests/routers/v3/connector/mssql/conftest.py (2)
ibis-server/tests/routers/v2/connector/test_mssql.py (1)
  • mssql (90-131)
ibis-server/tests/conftest.py (1)
  • file_path (10-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test
  • GitHub Check: cargo check
  • GitHub Check: ci
🔇 Additional comments (8)
wren-core/core/src/mdl/mod.rs (1)

3612-3644: LGTM! Well-structured test for Unicode literal handling.

The test correctly validates both the default behavior (no transformation) and MSSQL-specific behavior (N prefix for Unicode strings). The test uses snapshot assertions and follows the existing test patterns in the codebase.

wren-core/core/src/mdl/dialect/wren_dialect.rs (1)

100-102: LGTM! Clean delegation to inner dialect.

The method correctly delegates Unicode string literal handling to the inner dialect, following the established pattern for other dialect-specific behaviors in this implementation.

ibis-server/tests/routers/v3/connector/mssql/test_query.py (2)

36-65: LGTM! Comprehensive test for Unicode literal handling.

The test thoroughly validates Unicode literal handling by:

  1. Testing Unicode character query ('真夜中') expecting correct filtering
  2. Testing ASCII character query ('ZUTOMAYO') as a control
  3. Disabling fallback to ensure the feature is actually being tested

The test structure is clean and follows pytest best practices.


21-23: MSSQL type mapping covers both varchar and nvarchar – MSSQL_TYPE_MAPPING maps both to RustWrenEngineColumnType.VARCHAR, so Unicode data is handled correctly. Updating the manifest to "nvarchar" is purely cosmetic.

ibis-server/tests/routers/v3/connector/mssql/conftest.py (1)

28-41: LGTM! Proper MSSQL Unicode setup.

The database setup correctly:

  1. Uses NVARCHAR(10) for Unicode-capable column storage
  2. Uses N'真夜中' syntax for inserting Unicode literals (line 37)
  3. Inserts plain ASCII string without N prefix for comparison (line 40)

This setup properly supports the Unicode literal tests and follows MSSQL best practices.

wren-core/core/src/mdl/dialect/inner_dialect.rs (3)

68-70: LGTM! Appropriate default implementation.

The default implementation returning None is correct for dialects that don't require special Unicode literal handling. This allows the standard string literal format to be used unless overridden by specific dialects.


79-79: LGTM! Proper routing for MSSQL dialect.

The addition correctly routes MSSQL data source to the new MsSqlDialect implementation, enabling Unicode literal support for MSSQL.


315-327: LGTM! Correct Unicode literal implementation for MSSQL.

The implementation correctly:

  1. Uses is_ascii() to detect non-ASCII characters
  2. Returns NationalStringLiteral (N'...') for Unicode strings
  3. Returns None for ASCII strings to use default handling

This aligns with MSSQL's requirement for N-prefixed literals when handling Unicode data.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added core ibis rust Pull requests that update Rust code python Pull requests that update Python code labels Oct 2, 2025
@douenergy douenergy merged commit e6fb38d into Canner:main Oct 3, 2025
11 checks passed
@goldmedal goldmedal mentioned this pull request Oct 3, 2025
nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core ibis python Pull requests that update Python code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants