feat(clickhouse): add LowCardinality, Bool, DateTime64 and other type mapping support#1448
Conversation
… mapping support
Add comprehensive ClickHouse type mapping support to _transform_column_type:
- Handle LowCardinality() wrapper by unwrapping and recursing
- Add Bool type alias ('bool' -> BOOL)
- Add Date32 mapping (-> DATE)
- Add DateTime64 with precision handling (-> TIMESTAMP)
- Add large integer types: Int128, Int256, UInt128, UInt256 (-> NUMERIC)
- Add FixedString(N) support (-> VARCHAR)
- Add complex types: Array, Map, Tuple (-> VARCHAR)
- Add Nothing type (-> NULL)
- Add JSON type (-> JSON)
- Improve logging for unknown types
Also add unit tests covering 30+ type mapping scenarios including
wrapper types, basic types, and edge cases.
Fixes columns with LowCardinality(String), Bool, Nullable(UUID),
DateTime64(3) etc. being silently ignored during metadata discovery.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughExtends ClickHouse type mapping and parsing: adds scalar aliases and numeric widenings, introduces JSON/Nothing, normalizes and strips input, recursively unwraps LowCardinality/Nullable, and recognizes FixedString, DateTime variants, Enum, Array/Map/Tuple; adds comprehensive unit tests validating mappings. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.OpenGrep is compatible with Semgrep configurations. Add an |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ibis-server/app/model/metadata/clickhouse.py`:
- Around line 159-161: The type-mapping currently only handles parameterized
DateTime64 but misses parameterized DateTime and JSON, causing them to fall
through to UNKNOWN; update the conditional in the same function that checks
normalized_type (the block that currently checks
normalized_type.startswith("datetime64(")) to also handle
normalized_type.startswith("datetime(") and normalized_type.startswith("json(")
and return RustWrenEngineColumnType.TIMESTAMP for DateTime and
RustWrenEngineColumnType.JSON for JSON, and add unit tests that assert
DateTime('UTC') maps to TIMESTAMP and JSON(max_dynamic_paths=1024) maps to JSON
to prevent regressions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 47c58ca8-2309-4073-884a-e51164a729ad
📒 Files selected for processing (2)
ibis-server/app/model/metadata/clickhouse.pyibis-server/tests/test_clickhouse_type_mapping.py
|
Hi @ahmedjawedaj, thanks for working on this. There are some ruff-check fails in the |
- Add DateTime([timezone]) handler (e.g. DateTime('UTC') -> TIMESTAMP)
- Add JSON(...) handler (e.g. JSON(max_dynamic_paths=1024) -> JSON)
- Fix PIE810: merge startswith calls for Enum8/Enum16
- Apply ruff format to all test assertions
- Add unit tests for DateTime('UTC'), DateTime('Europe/Berlin'),
JSON(max_dynamic_paths=1024), and plain JSON
|
@goldmedal Please review and merge |
goldmedal
left a comment
There was a problem hiding this comment.
Thanks @ahmedjawedaj, Overall looks good to me—only one suggestion for the testing.
- Added @pytest.mark.clickhouse to TestTransformColumnType so that these tests run in the CI pipeline (as requested by maintainer)
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ibis-server/tests/test_clickhouse_type_mapping.py (1)
247-251: Assert the warning side effect for unknown types.This test currently checks only the returned enum. Since unknown-type logging is part of the behavior, verify the warning with
caplogtoo.Proposed patch
- def test_unknown_type_returns_unknown(self, metadata): - assert ( - metadata._transform_column_type("SomeWeirdType") - == RustWrenEngineColumnType.UNKNOWN - ) + def test_unknown_type_returns_unknown(self, metadata, caplog): + with caplog.at_level("WARNING"): + result = metadata._transform_column_type("SomeWeirdType") + assert result == RustWrenEngineColumnType.UNKNOWN + assert "Unknown ClickHouse data type: SomeWeirdType" in caplog.text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ibis-server/tests/test_clickhouse_type_mapping.py` around lines 247 - 251, The test test_unknown_type_returns_unknown should also assert the warning side-effect: use the pytest caplog fixture when calling metadata._transform_column_type("SomeWeirdType") and assert that caplog captured a warning-level log entry mentioning the unknown type (e.g., check "SomeWeirdType" in caplog.text or inspect caplog.records for levelname == "WARNING"). Keep the existing enum assertion for RustWrenEngineColumnType.UNKNOWN and add the caplog-based check immediately after the call to verify the warning was emitted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ibis-server/tests/test_clickhouse_type_mapping.py`:
- Around line 247-251: The test test_unknown_type_returns_unknown should also
assert the warning side-effect: use the pytest caplog fixture when calling
metadata._transform_column_type("SomeWeirdType") and assert that caplog captured
a warning-level log entry mentioning the unknown type (e.g., check
"SomeWeirdType" in caplog.text or inspect caplog.records for levelname ==
"WARNING"). Keep the existing enum assertion for
RustWrenEngineColumnType.UNKNOWN and add the caplog-based check immediately
after the call to verify the warning was emitted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4f933f2c-18f7-4672-9ede-297a374dd24a
📒 Files selected for processing (1)
ibis-server/tests/test_clickhouse_type_mapping.py
- Use caplog to verify that a WARNING is emitted when an unknown ClickHouse type is encountered, as requested by CR feedback.
| def test_unknown_type_returns_unknown(self, metadata, caplog): | ||
| with caplog.at_level("WARNING"): | ||
| result = metadata._transform_column_type("SomeWeirdType") | ||
| assert result == RustWrenEngineColumnType.UNKNOWN | ||
| assert "Unknown ClickHouse data type: SomeWeirdType" in caplog.text |
There was a problem hiding this comment.
| def test_unknown_type_returns_unknown(self, metadata, caplog): | |
| with caplog.at_level("WARNING"): | |
| result = metadata._transform_column_type("SomeWeirdType") | |
| assert result == RustWrenEngineColumnType.UNKNOWN | |
| assert "Unknown ClickHouse data type: SomeWeirdType" in caplog.text | |
| def test_unknown_type_returns_unknown(self, metadata): | |
| result = metadata._transform_column_type("SomeWeirdType") | |
| assert result == RustWrenEngineColumnType.UNKNOWN |
The CI failed due to the log is output by loguru, not a standard stderr.
I think we don't need to assert the warning message. Just ensure the result is what we want.
Loguru does not output to standard stderr, so pytest caplog cannot capture its messages. Simplified per maintainer feedback.
goldmedal
left a comment
There was a problem hiding this comment.
Thanks @ahmedjawedaj 👍
… mapping support (Canner#1448)
Summary
Add comprehensive ClickHouse data type mapping support to
_transform_column_typeinibis-server/app/model/metadata/clickhouse.py.Problem
When connecting to a ClickHouse database, many columns are silently ignored because their data types are not recognized by the ibis-server metadata layer. Common ClickHouse types like
LowCardinality(String),Bool,DateTime64(3), andDate32produce "Unknown ClickHouse data type" warnings and result in columns being dropped from the schema.Changes
Type mapping additions:
LowCardinality()wrapper — recursive unwrapping (similar to existingNullable()handling)Boolalias — maps toBOOL(ClickHouse reportsBoolbut onlybooleanwas mapped)Date32— maps toDATEDateTime64(precision)— maps toTIMESTAMP(with precision stripping)FixedString(N)— maps toVARCHARInt128,Int256,UInt128,UInt256— map toNUMERICArray(...),Map(...),Tuple(...)— map toVARCHARNothing— maps toNULLJSON— maps toJSONTests:
LowCardinality,Nullable), nested wrappers (LowCardinality(Nullable(String))), parameterized types (DateTime64(3)), and edge cases.Verification
All tests pass. Verified in a live WrenAI deployment with a ClickHouse database containing 60+ tables using
LowCardinality(String),Bool,Nullable(UUID),DateTime64(3)etc. All previously-ignored columns are now correctly mapped.Summary by CodeRabbit
New Features
Bug Fixes
Tests