chore(ibis): refactor the type mapping table by goldmedal · Pull Request #1302 · Canner/wren-engine

goldmedal · 2025-09-02T03:01:50Z

Description

Refactor the type mapping table
Fix the unkown type log

Summary by CodeRabbit

New Features
- Runtime warnings now surface when unknown column types are encountered across connectors.
Refactor
- Centralized, standardized type-mapping across Athena, BigQuery, Canner, ClickHouse, MSSQL, MySQL, DuckDB/Object Storage, Oracle, Postgres, Redshift, Snowflake, and Trino.
Bug Fixes
- Expanded and more accurate type recognition reduces misclassification; complex/unsupported types now return safe fallbacks while emitting warnings.
Chores
- Added type annotations and docstrings to improve reliability and maintainability.

coderabbitai · 2025-09-02T03:02:00Z

Walkthrough

Centralizes data type mappings across multiple metadata modules by introducing module-level mapping dictionaries, adding type hints to transformer methods, normalizing inputs, and logging warnings for unknown types. Some transformers now return union types when complex types are passed through unchanged.

Changes

Cohort / File(s)	Summary of changes
Athena mapping refactor `ibis-server/app/model/metadata/athena.py`	Added `ATHENA_TYPE_MAPPING`; refactored `_transform_column_type` with normalization, type hints, and logger warning on unknowns.
BigQuery mapping refactor `ibis-server/app/model/metadata/bigquery.py`	Added `BIGQUERY_TYPE_MAPPING`; updated `_transform_column_type` to handle complex types (returns `str`) or mapped `RustWrenEngineColumnType`; normalization and warnings.
Canner mapping refactor `ibis-server/app/model/metadata/canner.py`	Added `CANNER_TYPE_MAPPING`; refactored transformer to normalize input, use mapping, add type hints and warnings.
ClickHouse mapping refactor `ibis-server/app/model/metadata/clickhouse.py`	Added `CLICKHOUSE_TYPE_MAPPING`; new typed `_transform_column_type` using mapping, normalization, and warnings.
MSSQL mapping refactor `ibis-server/app/model/metadata/mssql.py`	Added `MSSQL_TYPE_MAPPING`; typed transformer replaced inline logic with mapping lookup, normalization, and warnings.
MySQL mapping refactor `ibis-server/app/model/metadata/mysql.py`	Added `MYSQL_TYPE_MAPPING`; refactored `_transform_column_type` to use mapping, input normalization, type hints, and warnings.
Object storage / DuckDB mapping refactor `ibis-server/app/model/metadata/object_storage.py`	Added `DUCKDB_TYPE_MAPPING`; replaced switch with lookup in `_to_column_type` for ObjectStorage/DuckDB metadata; preserved special cases; normalization and warnings.
Oracle mapping refactor `ibis-server/app/model/metadata/oracle.py`	Added `ORACLE_TYPE_MAPPING`; typed transformer uses uppercase normalization and mapping lookup; logs warnings for unknowns.
Postgres core and extensions mapping refactor `ibis-server/app/model/metadata/postgres.py`	Added `POSTGRES_TYPE_MAPPING` and `POSTGRES_EXTENSION_TYPE_MAPPING`; typed transformers in `ExtensionHandler` and `PostgresMetadata` now use mappings, normalize input, and log unknown types.
Redshift mapping refactor `ibis-server/app/model/metadata/redshift.py`	Added `REDSHIFT_TYPE_MAPPING`; typed `_transform_redshift_column_type` uses mapping with normalization and warnings.
Snowflake mapping refactor `ibis-server/app/model/metadata/snowflake.py`	Added `SNOWFLAKE_TYPE_MAPPING`; typed transformer uses mapping with normalization and warnings.
Trino mapping refactor `ibis-server/app/model/metadata/trino.py`	Added `TRINO_TYPE_MAPPING`; typed transformer strips params, normalizes, looks up mapping, and logs warnings for unknowns.

Sequence Diagram(s)

sequenceDiagram
  participant Metadata as MetadataModule
  participant Mapping as TYPE_MAPPING (module-level)
  participant Logger as loguru.logger

  Metadata->>Mapping: normalize(data_type) → key
  alt key is complex (array/struct) or passthrough
    Mapping-->>Metadata: return original string (complex)
  else mapped
    Mapping-->>Metadata: return RustWrenEngineColumnType
  else unknown
    Mapping-->>Metadata: return UNKNOWN
    Mapping->>Logger: warning("unknown type: ...")
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat(ibis): Add Oracle connector #1067 — Overlaps on Oracle metadata refactor (ORACLE_TYPE_MAPPING and _transform_column_type) and logging changes.
feat(ibis): introduce Local file connector #1029 — Changes to object_storage/DuckDB type translation; intersects with DUCKDB_TYPE_MAPPING and _to_column_type.
feat(ibis): postgreSQL, PostGIS support for IBIS Server. #1188 — Related Postgres/PostGIS type handling changes affecting the new Postgres mappings and extension handling.

Suggested reviewers

douenergy
wwwy3y3

Poem

I hop through types, a tidy spree,
From Snowflake, Postgres, Redshift tree —
I map each byte, timestamp, and key,
I warn when mysteries trouble me.
Carrots, columns, schema glee 🥕🐰

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)

ibis-server/app/model/metadata/snowflake.py (1)

163-184: Strip size/precision and whitespace when normalizing types

Snowflake emits types with params (e.g., NUMBER(38,0), VARCHAR(16777216)). Without stripping, these resolve to UNKNOWN.
-        # Convert to lowercase for comparison
-        normalized_type = data_type.lower()
+        # Remove parameter specs and normalize (e.g., "NUMBER(38,0)" -> "number")
+        normalized_type = re.sub(r"$.*$", "", data_type).strip().lower()

ibis-server/app/model/metadata/redshift.py (1)

176-199: Strip parameterization to catch e.g., character varying(50), numeric(10,2)

Without this, many types end up UNKNOWN.

-        # Convert to lowercase for comparison
-        normalized_type = data_type.lower()
+        # Normalize by removing parameter specs and trimming spaces
+        normalized_type = re.sub(r"\(.*\)", "", data_type).strip().lower()

ibis-server/app/model/metadata/object_storage.py (2)

83-97: Fix column type extraction from DuckDB relation (current code will fail)

conn.read_* returns a DuckDBPyRelation; df[col].dtypes[0] is a pandas-ism and will error. Use relation.columns/types (or DESCRIBE). Referencing your prior learning about DuckDB DESCRIBE.

-                    columns = []
-                    try:
-                        for col in df.columns:
-                            duckdb_type = df[col].dtypes[0]
-                            columns.append(
-                                Column(
-                                    name=col,
-                                    type=self._to_column_type(duckdb_type.__str__()),
-                                    notNull=False,
-                                )
-                            )
-                    except Exception as e:
-                        logger.debug(f"Failed to read column types: {e}")
-                        continue
+                    # Extract schema from DuckDB relation without materializing data
+                    try:
+                        columns = []
+                        col_names = list(df.columns)
+                        col_types = [str(t) for t in df.types]
+                        for name, typ in zip(col_names, col_types):
+                            columns.append(
+                                Column(
+                                    name=name,
+                                    type=self._to_column_type(typ),
+                                    notNull=False,
+                                )
+                            )
+                    except Exception as e:
+                        logger.debug(f"Failed to read column types: {e}")
+                        continue

151-182: Make DECIMAL/STRUCT/array guards case-insensitive and reuse normalized value

Avoid case fragility; ensures consistent UNKNOWN handling for STRUCT/arrays.

-        if col_type.startswith("DECIMAL"):
+        lower = col_type.lower()
+        if lower.startswith("decimal"):
             return RustWrenEngineColumnType.DECIMAL
@@
-        if col_type.startswith("STRUCT"):
+        if lower.startswith("struct"):
             return RustWrenEngineColumnType.UNKNOWN
@@
-        if col_type.endswith("[]"):
+        if lower.endswith("[]"):
             return RustWrenEngineColumnType.UNKNOWN
-        # Convert to lowercase for comparison
-        normalized_type = col_type.lower()
+        # Use the lower-cased value for mapping
+        normalized_type = lower

ibis-server/app/model/metadata/clickhouse.py (2)

91-101: Set notNull based on Nullable(...) wrapper (currently always False)

ClickHouse encodes nullability in the type. This is a correctness issue.

-            unique_tables[schema_table].columns.append(
-                Column(
-                    name=row["column_name"],
-                    type=self._transform_column_type(row["data_type"]),
-                    notNull=False,
-                    description=row["column_comment"],
-                    properties=None,
-                )
-            )
+            dtype = row["data_type"]
+            unique_tables[schema_table].columns.append(
+                Column(
+                    name=row["column_name"],
+                    type=self._transform_column_type(dtype),
+                    notNull=not dtype.lower().startswith("nullable("),
+                    description=row["column_comment"],
+                    properties=None,
+                )
+            )

112-133: Handle Nullable/LowCardinality/Array wrappers and strip params (DateTime64, Decimal, FixedString)

Without unwrapping, most real-world types map to UNKNOWN.

-        # Convert to lowercase for comparison
-        normalized_type = data_type.lower()
-
-        # Use the module-level mapping table
-        mapped_type = CLICKHOUSE_TYPE_MAPPING.get(
-            normalized_type, RustWrenEngineColumnType.UNKNOWN
-        )
+        # Unwrap wrappers and strip parameters:
+        # - Nullable(T), LowCardinality(T), SimpleAggregateFunction(..., T)
+        # - Array(T) -> T (array element type; array-ness currently ignored)
+        # - Decimal(p,s), FixedString(n), DateTime64(s)
+        t = data_type.strip()
+        # unwrap known wrappers
+        wrap_pat = re.compile(r'^(Nullable|LowCardinality|SimpleAggregateFunction)\((.+)\)$', re.IGNORECASE)
+        while True:
+            m = wrap_pat.match(t)
+            if not m:
+                break
+            # for SimpleAggregateFunction(func, T) keep T
+            inner = m.group(2)
+            # if inner still contains a comma, keep the last part as the base type
+            if m.group(1).lower() == "simpleaggregatefunction" and "," in inner:
+                inner = inner.split(",")[-1].strip()
+            t = inner
+        # unwrap Array(T) -> T
+        m = re.match(r'^Array\((.+)\)$', t, re.IGNORECASE)
+        if m:
+            t = m.group(1)
+        # drop parameter lists: Decimal(10,2) -> Decimal, FixedString(16) -> FixedString, DateTime64(3) -> DateTime64
+        t = re.sub(r'\([^()]*\)', '', t).lower()
+
+        mapped_type = CLICKHOUSE_TYPE_MAPPING.get(t, RustWrenEngineColumnType.UNKNOWN)

ibis-server/app/model/metadata/postgres.py (2)

121-131: Bug: PostGIS handler updates only the first column and forces string type.

It ignores the target column name and breaks after the first column.
It coerces enum to str, diverging from get_table_list which sets the enum.

Apply:

-            table = tables[table_name]
-            for column in table.columns:
-                column.type = str(
-                    self._transform_postgres_column_type(row["column_type"])
-                )
-                break
+            table = tables[table_name]
+            target = next((c for c in table.columns if c.name == row["column_name"]), None)
+            if target is None:
+                logger.warning(f"PostGIS column not found: {table_name}.{row['column_name']}")
+                continue
+            target.type = self._transform_postgres_column_type(row["column_type"])

69-101: Correct type hints for tables and extensions.

augment/postgis_handler operate on a dict[str, Table]; get_extensions returns a list of dicts. Fix annotations to match usage and avoid confusion for static analysis.

-    def augment(self, tables: list[Table]) -> list[Table]:
+    def augment(self, tables: dict[str, Table]) -> dict[str, Table]:
...
-    def get_extensions(self) -> list[str]:
+    def get_extensions(self) -> list[dict[str, str]]:
...
-    def postgis_handler(self, tables: list[Table], schema_name: str) -> list[Table]:
+    def postgis_handler(self, tables: dict[str, Table], schema_name: str) -> dict[str, Table]:

ibis-server/app/model/metadata/bigquery.py (1)

77-86: Wire the transformer into get_table_list.

Currently Column.type is the raw BigQuery string, so the new mapping/logging is unused.

-            return Column(
+            return Column(
                 # field_path supports both column & nested column
                 name=row["field_path"],
-                type=row["data_type"],
+                type=self._transform_column_type(row["data_type"]),
                 notNull=row["is_nullable"].lower() == "no",
                 description=row["column_description"],
                 properties={},
                 nestedColumns=[] if has_nested_columns(row) else None,
             )

🧹 Nitpick comments (11)

ibis-server/app/model/metadata/canner.py (2)
240-246: Add type hints and docstring for consistency with other adapters

Align this method signature/docstring with athena/mysql/trino for readability and static checks.

Example outside-range change:
def _transform_column_type(self, data_type: str) -> RustWrenEngineColumnType:
    """Transform Canner data type to RustWrenEngineColumnType."""
    ...
41-44: Verify FLOAT mapping semantics (“float” -> FLOAT8)

MySQL-style “float” is typically 32-bit; here it maps to 64-bit. If Canner surfaces MySQL-like types, consider mapping to FLOAT4/REAL for accuracy, or document the intentional widening.
-    "float": RustWrenEngineColumnType.FLOAT8,
+    "float": RustWrenEngineColumnType.FLOAT4,
Do you want me to audit float/double mappings across all adapters for consistency?
ibis-server/app/model/metadata/mysql.py (2)
178-180: Align comment with implementation

We don’t strip params here; DATA_TYPE already excludes them in MySQL. Update the comment to avoid confusion.
-        # Remove parameter specifications like VARCHAR(255) -> VARCHAR
-        normalized_type = data_type.strip().lower()
+        # MySQL information_schema.COLUMNS.DATA_TYPE is already base type (no length/precision)
+        normalized_type = data_type.strip().lower()
46-48: Double-check TIMESTAMP -> TIMESTAMPTZ

MySQL TIMESTAMP stores UTC but has no timezone field; many systems map it to TIMESTAMP (no tz). Please confirm intended semantics across adapters. I can batch-update mappings/tests if needed.
ibis-server/app/model/metadata/snowflake.py (1)
17-51: Expand Snowflake type coverage (LTZ/TIME/BINARY/VARIANT/OBJECT/GEOGRAPHY/GEOMETRY)

Common Snowflake types are missing, leading to unnecessary UNKNOWNs and noisy logs.
 SNOWFLAKE_TYPE_MAPPING = {
@@
     # Date and Time Types
     "date": RustWrenEngineColumnType.DATE,
-    "datetime": RustWrenEngineColumnType.TIMESTAMP,
+    "datetime": RustWrenEngineColumnType.TIMESTAMP,
     "timestamp": RustWrenEngineColumnType.TIMESTAMP,
     "timestamp_ntz": RustWrenEngineColumnType.TIMESTAMP,
     "timestamp_tz": RustWrenEngineColumnType.TIMESTAMPTZ,
+    "timestamp_ltz": RustWrenEngineColumnType.TIMESTAMPTZ,
+    "time": RustWrenEngineColumnType.TIME,
+    # Semi-structured / binary
+    "variant": RustWrenEngineColumnType.JSON,
+    "object": RustWrenEngineColumnType.JSON,
+    "binary": RustWrenEngineColumnType.BYTES,
+    # Spatial
+    "geography": RustWrenEngineColumnType.GEOGRAPHY,
+    "geometry": RustWrenEngineColumnType.GEOMETRY,
 }
ibis-server/app/model/metadata/redshift.py (1)
15-51: Add Redshift SUPER type mapping

SUPER is common; mapping to JSON avoids UNKNOWN noise.
 REDSHIFT_TYPE_MAPPING = {
@@
     "inet": RustWrenEngineColumnType.INET,
     "oid": RustWrenEngineColumnType.OID,
+    "super": RustWrenEngineColumnType.JSON,
 }
ibis-server/app/model/metadata/object_storage.py (1)
25-51: Align DuckDB scalar mappings with existing conventions

Use consistent integer/float targets across modules; avoid lossy/int-size mismatches.
 DUCKDB_TYPE_MAPPING = {
-    "bigint": RustWrenEngineColumnType.INT64,
-    "bit": RustWrenEngineColumnType.INT2,
+    "bigint": RustWrenEngineColumnType.BIGINT,
     "blob": RustWrenEngineColumnType.BYTES,
     "boolean": RustWrenEngineColumnType.BOOL,
     "date": RustWrenEngineColumnType.DATE,
     "double": RustWrenEngineColumnType.DOUBLE,
-    "float": RustWrenEngineColumnType.FLOAT,
-    "integer": RustWrenEngineColumnType.INT,
+    "float": RustWrenEngineColumnType.FLOAT4,
+    "real": RustWrenEngineColumnType.REAL,
+    "integer": RustWrenEngineColumnType.INTEGER,
@@
     "interval": RustWrenEngineColumnType.INTERVAL,
     "json": RustWrenEngineColumnType.JSON,
-    "smallint": RustWrenEngineColumnType.INT2,
+    "smallint": RustWrenEngineColumnType.SMALLINT,
     "time": RustWrenEngineColumnType.TIME,
     "timestamp": RustWrenEngineColumnType.TIMESTAMP,
     "timestamp with time zone": RustWrenEngineColumnType.TIMESTAMPTZ,
-    "tinyint": RustWrenEngineColumnType.INT2,
+    "tinyint": RustWrenEngineColumnType.TINYINT,
@@
     "uuid": RustWrenEngineColumnType.UUID,
-    "varchar": RustWrenEngineColumnType.STRING,
+    "varchar": RustWrenEngineColumnType.VARCHAR,
 }
ibis-server/app/model/metadata/postgres.py (3)
25-36: Normalize integer/float aliases to a single canonical target (consistency).

Today, some aliases resolve to INTEGER/DOUBLE while enum exposes INT/INT4/INT8, FLOAT/FLOAT4/FLOAT8. Pick one canonical target per family to avoid mixed semantics across engines. If you keep INTEGER/DOUBLE here, mirror that everywhere; otherwise switch these to INT/DOUBLE or INT4/FLOAT8 consistently.

145-147: Trim input before lowercasing.

Be defensive against stray whitespace from drivers.
-        normalized_type = data_type.lower()
+        normalized_type = data_type.strip().lower()
Also applies to: 290-292

298-301: Good: unknown-type warning added. Consider richer context later.

If/when you can pass table/column into the transformer, include them in the warning to speed up triage.
ibis-server/app/model/metadata/bigquery.py (1)
204-206: Trim input before lowercasing.
-        normalized_type = data_type.lower()
+        normalized_type = data_type.strip().lower()

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 857f35e and d32d1de.

📒 Files selected for processing (12)

ibis-server/app/model/metadata/athena.py (3 hunks)
ibis-server/app/model/metadata/bigquery.py (3 hunks)
ibis-server/app/model/metadata/canner.py (3 hunks)
ibis-server/app/model/metadata/clickhouse.py (3 hunks)
ibis-server/app/model/metadata/mssql.py (3 hunks)
ibis-server/app/model/metadata/mysql.py (3 hunks)
ibis-server/app/model/metadata/object_storage.py (3 hunks)
ibis-server/app/model/metadata/oracle.py (3 hunks)
ibis-server/app/model/metadata/postgres.py (4 hunks)
ibis-server/app/model/metadata/redshift.py (3 hunks)
ibis-server/app/model/metadata/snowflake.py (3 hunks)
ibis-server/app/model/metadata/trino.py (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-01-07T03:56:21.741Z

Learnt from: goldmedal
PR: Canner/wren-engine#1029
File: ibis-server/app/model/metadata/object_storage.py:44-44
Timestamp: 2025-01-07T03:56:21.741Z
Learning: When working with DuckDB in Python, use `conn.execute("DESCRIBE SELECT * FROM table").fetchall()` to get column types instead of accessing DataFrame-style attributes like `dtype` or `dtypes`.

Applied to files:

ibis-server/app/model/metadata/object_storage.py

🧬 Code graph analysis (12)

ibis-server/app/model/metadata/trino.py (2)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/athena.py (1)

_transform_column_type (140-160)

ibis-server/app/model/metadata/oracle.py (2)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/athena.py (1)

_transform_column_type (140-160)

ibis-server/app/model/metadata/athena.py (2)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/trino.py (1)

_transform_column_type (138-158)

ibis-server/app/model/metadata/canner.py (1)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/snowflake.py (2)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/athena.py (1)

_transform_column_type (140-160)

ibis-server/app/model/metadata/redshift.py (1)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/clickhouse.py (2)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/athena.py (1)

_transform_column_type (140-160)

ibis-server/app/model/metadata/postgres.py (1)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/mssql.py (3)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/athena.py (1)

_transform_column_type (140-160)

ibis-server/app/model/metadata/mysql.py (1)

_transform_column_type (169-189)

ibis-server/app/model/metadata/bigquery.py (2)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/athena.py (1)

_transform_column_type (140-160)

ibis-server/app/model/metadata/mysql.py (3)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/athena.py (1)

_transform_column_type (140-160)

ibis-server/app/model/metadata/trino.py (1)

_transform_column_type (138-158)

ibis-server/app/model/metadata/object_storage.py (1)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: ci

🔇 Additional comments (5)

ibis-server/app/model/metadata/athena.py (2)

18-51: Mapping table looks consistent with the approach across modules

No blockers here.

140-160: Confirm Athena TIMESTAMP semantics (tz vs no-tz)

Athena/Trino historically differentiate TIMESTAMP vs TIMESTAMP WITH TIME ZONE. Mapping plain "timestamp" to TIMESTAMPTZ may be inaccurate depending on engine settings. Please verify and align.

Do you want me to switch "timestamp" -> TIMESTAMP and add "timestamp with time zone" -> TIMESTAMPTZ if that matches your environment?

ibis-server/app/model/metadata/postgres.py (1)

54-58: LGTM: Extension mapping table is clear.

ibis-server/app/model/metadata/bigquery.py (2)

215-219: Verify TIMESTAMP mapping consistency across engines.

BigQuery TIMESTAMP (UTC point-in-time) is mapped to TIMESTAMPTZ here. Ensure this aligns with other connectors (e.g., Postgres TIMESTAMPTZ) and downstream expectations.

Would you like me to scan the repo for other connectors’ TIMESTAMP mappings and summarize the current consistency matrix?

195-203: Union-type annotation supported – no changes needed.
Project requires Python ≥3.11, so str | RustWrenEngineColumnType is valid syntax.

ibis-server/app/model/metadata/bigquery.py

ibis-server/app/model/metadata/canner.py

ibis-server/app/model/metadata/clickhouse.py

ibis-server/app/model/metadata/mssql.py

ibis-server/app/model/metadata/mysql.py

ibis-server/app/model/metadata/oracle.py

ibis-server/app/model/metadata/postgres.py

ibis-server/app/model/metadata/redshift.py

ibis-server/app/model/metadata/snowflake.py

ibis-server/app/model/metadata/trino.py

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

ibis-server/app/model/metadata/bigquery.py (3)

79-88: Type mapping isn’t applied; unknown-type logging won’t trigger.

get_column sets Column.type from raw data_type, bypassing _transform_column_type. Wire it in so the mapping and warnings are effective.

         def get_column(row) -> Column:
             return Column(
                 # field_path supports both column & nested column
                 name=row["field_path"],
-                type=row["data_type"],
+                type=self._transform_column_type(row["data_type"]),
                 notNull=row["is_nullable"].lower() == "no",
                 description=row["column_description"],
                 properties={},
                 nestedColumns=[] if has_nested_columns(row) else None,
             )

Also applies to: 197-221

67-72: Quote dataset identifier to support hyphenated project IDs.

Without backticks, projects like my-proj will break. Quote {dataset_id} in INFORMATION_SCHEMA refs.

-            FROM {dataset_id}.INFORMATION_SCHEMA.COLUMNS c 
-            JOIN {dataset_id}.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS cf 
+            FROM `{dataset_id}`.INFORMATION_SCHEMA.COLUMNS c 
+            JOIN `{dataset_id}`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS cf 
                 ON cf.table_name = c.table_name 
                 AND cf.column_name = c.column_name
-            LEFT JOIN {dataset_id}.INFORMATION_SCHEMA.TABLE_OPTIONS table_options
+            LEFT JOIN `{dataset_id}`.INFORMATION_SCHEMA.TABLE_OPTIONS table_options
                 ON c.table_name = table_options.table_name AND table_options.OPTION_NAME = 'description'

167-171: Syntax error: trailing comma before FROM.

SELECT list ends with a comma, which invalidates the query.

-                kcu.table_name as constraintedTable, kcu.column_name as constraintedColumn, 
+                kcu.table_name as constraintedTable, kcu.column_name as constraintedColumn

♻️ Duplicate comments (1)

ibis-server/app/model/metadata/bigquery.py (1)

15-32: Boolean mapping added — resolves the prior review.

Adding "bool"/"boolean" → BOOL fixes the unknown-type noise for booleans.

🧹 Nitpick comments (3)

ibis-server/app/model/metadata/bigquery.py (3)

15-32: Consider a few common BigQuery aliases.

Optionally cover INTEGER → INT64 and DECIMAL → NUMERIC to be defensive against variant sources that may emit synonyms.

 BIGQUERY_TYPE_MAPPING = {
     # GEOGRAPHY and RANGE columns were filtered out
     "bool": RustWrenEngineColumnType.BOOL,
     "boolean": RustWrenEngineColumnType.BOOL,
+    "integer": RustWrenEngineColumnType.INT64,
+    "decimal": RustWrenEngineColumnType.NUMERIC,
     "bytes": RustWrenEngineColumnType.BYTES,

106-108: Make nested-type detection case-insensitive.

Defensive against any casing variance.

-        def has_nested_columns(row) -> bool:
-            return "STRUCT" in row["data_type"]
+        def has_nested_columns(row) -> bool:
+            return "struct" in row["data_type"].lower()

218-219: Prefer loguru’s lazy formatting.

Avoid f-string evaluation when the log is filtered.

-        if mapped_type == RustWrenEngineColumnType.UNKNOWN:
-            logger.warning(f"Unknown BigQuery data type: {data_type}")
+        if mapped_type == RustWrenEngineColumnType.UNKNOWN:
+            logger.warning("Unknown BigQuery data type: {}", data_type)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d32d1de and b4706f9.

📒 Files selected for processing (2)

ibis-server/app/model/metadata/bigquery.py (3 hunks)
ibis-server/app/model/metadata/canner.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

ibis-server/app/model/metadata/canner.py

🧰 Additional context used

🧬 Code graph analysis (1)

ibis-server/app/model/metadata/bigquery.py (2)

ibis-server/app/model/metadata/dto.py (1)

RustWrenEngineColumnType (13-57)

ibis-server/app/model/metadata/clickhouse.py (1)

_transform_column_type (112-132)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: ci

🔇 Additional comments (2)

ibis-server/app/model/metadata/bigquery.py (2)

1-1: Logger import is fine.

Adopting loguru here is consistent with other backends.

197-197: No action needed: Python ≥3.11 supports PEP 604 unions.
The project’s requires-python = ">=3.11" (pyproject.toml:6) ensures str | RustWrenEngineColumnType is valid.

goldmedal · 2025-09-02T05:07:05Z

BigQuery has been tested locally.

poetry run pytest -m 'bigquery'
======================================================================================================================================= test session starts =======================================================================================================================================
platform darwin -- Python 3.11.11, pytest-8.4.1, pluggy-1.5.0
rootdir: /Users/jax/git/wren-engine/ibis-server
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 369 items / 330 deselected / 39 selected                                                                                                                                                                                                                                                

tests/routers/v2/connector/test_bigquery.py ..................                                                                                                                                                                                                                              [ 46%]
tests/routers/v3/connector/bigquery/test_functions.py ....                                                                                                                                                                                                                                  [ 56%]
tests/routers/v3/connector/bigquery/test_query.py .................

chore(ibis): refactor the type mapping table

d32d1de

github-actions bot added bigquery ibis python Pull requests that update Python code labels Sep 2, 2025

coderabbitai bot reviewed Sep 2, 2025

View reviewed changes

address comment

b4706f9

coderabbitai bot reviewed Sep 2, 2025

View reviewed changes

goldmedal requested a review from douenergy September 2, 2025 05:07

douenergy approved these changes Sep 2, 2025

View reviewed changes

douenergy merged commit 2261e7b into Canner:main Sep 2, 2025
5 of 6 checks passed

coderabbitai bot mentioned this pull request Oct 31, 2025

feat(ibis): introduce Databricks connector #1361

Merged

coderabbitai bot mentioned this pull request Dec 19, 2025

Oracle adb integration #1392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(ibis): refactor the type mapping table#1302

chore(ibis): refactor the type mapping table#1302
douenergy merged 2 commits intoCanner:mainfrom
goldmedal:chore/refactor-type-mapping

goldmedal commented Sep 2, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 2, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

goldmedal commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

goldmedal commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

goldmedal commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

goldmedal commented Sep 2, 2025 •

edited

Loading

coderabbitai bot commented Sep 2, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)