Skip to content

feat(clickhouse): add LowCardinality, Bool, DateTime64 and other type mapping support#1448

Merged
goldmedal merged 5 commits intoCanner:mainfrom
ahmedjawedaj:feat/clickhouse-type-mapping
Mar 17, 2026
Merged

feat(clickhouse): add LowCardinality, Bool, DateTime64 and other type mapping support#1448
goldmedal merged 5 commits intoCanner:mainfrom
ahmedjawedaj:feat/clickhouse-type-mapping

Conversation

@ahmedjawedaj
Copy link
Copy Markdown
Contributor

@ahmedjawedaj ahmedjawedaj commented Mar 15, 2026

Summary

Add comprehensive ClickHouse data type mapping support to _transform_column_type in ibis-server/app/model/metadata/clickhouse.py.

Problem

When connecting to a ClickHouse database, many columns are silently ignored because their data types are not recognized by the ibis-server metadata layer. Common ClickHouse types like LowCardinality(String), Bool, DateTime64(3), and Date32 produce "Unknown ClickHouse data type" warnings and result in columns being dropped from the schema.

Changes

Type mapping additions:

  • LowCardinality() wrapper — recursive unwrapping (similar to existing Nullable() handling)
  • Bool alias — maps to BOOL (ClickHouse reports Bool but only boolean was mapped)
  • Date32 — maps to DATE
  • DateTime64(precision) — maps to TIMESTAMP (with precision stripping)
  • FixedString(N) — maps to VARCHAR
  • Large integers: Int128, Int256, UInt128, UInt256 — map to NUMERIC
  • Complex types: Array(...), Map(...), Tuple(...) — map to VARCHAR
  • Nothing — maps to NULL
  • JSON — maps to JSON
  • Improved logging for truly unknown types

Tests:

  • 30+ unit tests covering basic types, wrapper types (LowCardinality, Nullable), nested wrappers (LowCardinality(Nullable(String))), parameterized types (DateTime64(3)), and edge cases.

Verification

All tests pass. Verified in a live WrenAI deployment with a ClickHouse database containing 60+ tables using LowCardinality(String), Bool, Nullable(UUID), DateTime64(3) etc. All previously-ignored columns are now correctly mapped.

Summary by CodeRabbit

  • New Features

    • Broader ClickHouse type support: added mappings for booleans, date32/datetime64 variants, large numeric types, JSON, NULL, fixed-length strings, enums, and improved handling for arrays, maps, and tuples.
  • Bug Fixes

    • More robust type normalization (trims whitespace) and consistent unwrapping of wrapper types (e.g., LowCardinality, Nullable), including nested wrappers.
  • Tests

    • Added comprehensive unit tests covering scalar, numeric, wrapper, complex, enum, nothing, unknown, and parameterized date/time/json cases.

… mapping support

Add comprehensive ClickHouse type mapping support to _transform_column_type:

- Handle LowCardinality() wrapper by unwrapping and recursing
- Add Bool type alias ('bool' -> BOOL)
- Add Date32 mapping (-> DATE)
- Add DateTime64 with precision handling (-> TIMESTAMP)
- Add large integer types: Int128, Int256, UInt128, UInt256 (-> NUMERIC)
- Add FixedString(N) support (-> VARCHAR)
- Add complex types: Array, Map, Tuple (-> VARCHAR)
- Add Nothing type (-> NULL)
- Add JSON type (-> JSON)
- Improve logging for unknown types

Also add unit tests covering 30+ type mapping scenarios including
wrapper types, basic types, and edge cases.

Fixes columns with LowCardinality(String), Bool, Nullable(UUID),
DateTime64(3) etc. being silently ignored during metadata discovery.
@github-actions github-actions bot added ibis python Pull requests that update Python code labels Mar 15, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d945ddfe-7089-48b5-867c-7fe5278e0c66

📥 Commits

Reviewing files that changed from the base of the PR and between d944510 and 5e80225.

📒 Files selected for processing (1)
  • ibis-server/tests/test_clickhouse_type_mapping.py

📝 Walkthrough

Walkthrough

Extends ClickHouse type mapping and parsing: adds scalar aliases and numeric widenings, introduces JSON/Nothing, normalizes and strips input, recursively unwraps LowCardinality/Nullable, and recognizes FixedString, DateTime variants, Enum, Array/Map/Tuple; adds comprehensive unit tests validating mappings.

Changes

Cohort / File(s) Summary
ClickHouse Type Mapping
ibis-server/app/model/metadata/clickhouse.py
Expanded CLICKHOUSE_TYPE_MAPPING with aliases (bool, date32, datetime64, int128/int256/uint128/uint256, nothing, json); enhanced _transform_column_type to strip input, recursively unwrap LowCardinality(...) and Nullable(...), and handle FixedString(...), DateTime64(...)/DateTime(...), JSON(...), Enum8/Enum16(...), and composite patterns (Array(...), Map(...), Tuple(...)).
Type Mapping Tests
ibis-server/tests/test_clickhouse_type_mapping.py
Added comprehensive tests for _transform_column_type covering scalar types, date/time variants (with/without tz), large numerics/decimals, FixedString, LowCardinality/Nullable (including nested), composite types (Array, Map, Tuple), enums, Nothing, unknown types, and JSON parameterizations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Suggested reviewers

  • goldmedal
  • douenergy

Poem

🐇 I nibble types from short to tall,

Bool and Date32 answer my call.
I peel LowCardinality's thin coat,
Unwrap Nullable—hop, float, and note.
Tests thump like drums: tidy, true, and small.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.93% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main changes in the PR: it adds support for LowCardinality wrapper types, Bool, DateTime64, and other ClickHouse type mappings. This is the primary focus of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.

OpenGrep is compatible with Semgrep configurations. Add an opengrep.yml or semgrep.yml configuration file to your project to enable OpenGrep analysis.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ibis-server/app/model/metadata/clickhouse.py`:
- Around line 159-161: The type-mapping currently only handles parameterized
DateTime64 but misses parameterized DateTime and JSON, causing them to fall
through to UNKNOWN; update the conditional in the same function that checks
normalized_type (the block that currently checks
normalized_type.startswith("datetime64(")) to also handle
normalized_type.startswith("datetime(") and normalized_type.startswith("json(")
and return RustWrenEngineColumnType.TIMESTAMP for DateTime and
RustWrenEngineColumnType.JSON for JSON, and add unit tests that assert
DateTime('UTC') maps to TIMESTAMP and JSON(max_dynamic_paths=1024) maps to JSON
to prevent regressions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47c58ca8-2309-4073-884a-e51164a729ad

📥 Commits

Reviewing files that changed from the base of the PR and between 57f9a15 and f0a29ff.

📒 Files selected for processing (2)
  • ibis-server/app/model/metadata/clickhouse.py
  • ibis-server/tests/test_clickhouse_type_mapping.py

@goldmedal
Copy link
Copy Markdown
Contributor

Hi @ahmedjawedaj, thanks for working on this. There are some ruff-check fails in the ibis-server folder. They can be fixed by just fmt command, or the following command

poetry run ruff format .
poetry run ruff check --fix .
taplo fmt

- Add DateTime([timezone]) handler (e.g. DateTime('UTC') -> TIMESTAMP)
- Add JSON(...) handler (e.g. JSON(max_dynamic_paths=1024) -> JSON)
- Fix PIE810: merge startswith calls for Enum8/Enum16
- Apply ruff format to all test assertions
- Add unit tests for DateTime('UTC'), DateTime('Europe/Berlin'),
  JSON(max_dynamic_paths=1024), and plain JSON
@ahmedjawedaj
Copy link
Copy Markdown
Contributor Author

@goldmedal Please review and merge

Copy link
Copy Markdown
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahmedjawedaj, Overall looks good to me—only one suggestion for the testing.

- Added @pytest.mark.clickhouse to TestTransformColumnType so
  that these tests run in the CI pipeline (as requested by maintainer)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ibis-server/tests/test_clickhouse_type_mapping.py (1)

247-251: Assert the warning side effect for unknown types.

This test currently checks only the returned enum. Since unknown-type logging is part of the behavior, verify the warning with caplog too.

Proposed patch
-    def test_unknown_type_returns_unknown(self, metadata):
-        assert (
-            metadata._transform_column_type("SomeWeirdType")
-            == RustWrenEngineColumnType.UNKNOWN
-        )
+    def test_unknown_type_returns_unknown(self, metadata, caplog):
+        with caplog.at_level("WARNING"):
+            result = metadata._transform_column_type("SomeWeirdType")
+        assert result == RustWrenEngineColumnType.UNKNOWN
+        assert "Unknown ClickHouse data type: SomeWeirdType" in caplog.text
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ibis-server/tests/test_clickhouse_type_mapping.py` around lines 247 - 251,
The test test_unknown_type_returns_unknown should also assert the warning
side-effect: use the pytest caplog fixture when calling
metadata._transform_column_type("SomeWeirdType") and assert that caplog captured
a warning-level log entry mentioning the unknown type (e.g., check
"SomeWeirdType" in caplog.text or inspect caplog.records for levelname ==
"WARNING"). Keep the existing enum assertion for
RustWrenEngineColumnType.UNKNOWN and add the caplog-based check immediately
after the call to verify the warning was emitted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ibis-server/tests/test_clickhouse_type_mapping.py`:
- Around line 247-251: The test test_unknown_type_returns_unknown should also
assert the warning side-effect: use the pytest caplog fixture when calling
metadata._transform_column_type("SomeWeirdType") and assert that caplog captured
a warning-level log entry mentioning the unknown type (e.g., check
"SomeWeirdType" in caplog.text or inspect caplog.records for levelname ==
"WARNING"). Keep the existing enum assertion for
RustWrenEngineColumnType.UNKNOWN and add the caplog-based check immediately
after the call to verify the warning was emitted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f933f2c-18f7-4672-9ede-297a374dd24a

📥 Commits

Reviewing files that changed from the base of the PR and between 60439f9 and 26bb864.

📒 Files selected for processing (1)
  • ibis-server/tests/test_clickhouse_type_mapping.py

- Use caplog to verify that a WARNING is emitted when an unknown
  ClickHouse type is encountered, as requested by CR feedback.
Comment on lines +247 to +251
def test_unknown_type_returns_unknown(self, metadata, caplog):
with caplog.at_level("WARNING"):
result = metadata._transform_column_type("SomeWeirdType")
assert result == RustWrenEngineColumnType.UNKNOWN
assert "Unknown ClickHouse data type: SomeWeirdType" in caplog.text
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_unknown_type_returns_unknown(self, metadata, caplog):
with caplog.at_level("WARNING"):
result = metadata._transform_column_type("SomeWeirdType")
assert result == RustWrenEngineColumnType.UNKNOWN
assert "Unknown ClickHouse data type: SomeWeirdType" in caplog.text
def test_unknown_type_returns_unknown(self, metadata):
result = metadata._transform_column_type("SomeWeirdType")
assert result == RustWrenEngineColumnType.UNKNOWN

The CI failed due to the log is output by loguru, not a standard stderr.
I think we don't need to assert the warning message. Just ensure the result is what we want.

Loguru does not output to standard stderr, so pytest caplog
cannot capture its messages. Simplified per maintainer feedback.
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahmedjawedaj 👍

@goldmedal goldmedal merged commit 926cedc into Canner:main Mar 17, 2026
7 checks passed
chilijung pushed a commit that referenced this pull request Mar 18, 2026
nhaluc1005 pushed a commit to nhaluc1005/text2sql-practice that referenced this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants