Skip to content

chore(ibis): refactor the type mapping table#1302

Merged
douenergy merged 2 commits intoCanner:mainfrom
goldmedal:chore/refactor-type-mapping
Sep 2, 2025
Merged

chore(ibis): refactor the type mapping table#1302
douenergy merged 2 commits intoCanner:mainfrom
goldmedal:chore/refactor-type-mapping

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Sep 2, 2025

Description

  • Refactor the type mapping table
  • Fix the unkown type log

Summary by CodeRabbit

  • New Features

    • Runtime warnings now surface when unknown column types are encountered across connectors.
  • Refactor

    • Centralized, standardized type-mapping across Athena, BigQuery, Canner, ClickHouse, MSSQL, MySQL, DuckDB/Object Storage, Oracle, Postgres, Redshift, Snowflake, and Trino.
  • Bug Fixes

    • Expanded and more accurate type recognition reduces misclassification; complex/unsupported types now return safe fallbacks while emitting warnings.
  • Chores

    • Added type annotations and docstrings to improve reliability and maintainability.

@github-actions github-actions bot added bigquery ibis python Pull requests that update Python code labels Sep 2, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 2, 2025

Walkthrough

Centralizes data type mappings across multiple metadata modules by introducing module-level mapping dictionaries, adding type hints to transformer methods, normalizing inputs, and logging warnings for unknown types. Some transformers now return union types when complex types are passed through unchanged.

Changes

Cohort / File(s) Summary of changes
Athena mapping refactor
ibis-server/app/model/metadata/athena.py
Added ATHENA_TYPE_MAPPING; refactored _transform_column_type with normalization, type hints, and logger warning on unknowns.
BigQuery mapping refactor
ibis-server/app/model/metadata/bigquery.py
Added BIGQUERY_TYPE_MAPPING; updated _transform_column_type to handle complex types (returns str) or mapped RustWrenEngineColumnType; normalization and warnings.
Canner mapping refactor
ibis-server/app/model/metadata/canner.py
Added CANNER_TYPE_MAPPING; refactored transformer to normalize input, use mapping, add type hints and warnings.
ClickHouse mapping refactor
ibis-server/app/model/metadata/clickhouse.py
Added CLICKHOUSE_TYPE_MAPPING; new typed _transform_column_type using mapping, normalization, and warnings.
MSSQL mapping refactor
ibis-server/app/model/metadata/mssql.py
Added MSSQL_TYPE_MAPPING; typed transformer replaced inline logic with mapping lookup, normalization, and warnings.
MySQL mapping refactor
ibis-server/app/model/metadata/mysql.py
Added MYSQL_TYPE_MAPPING; refactored _transform_column_type to use mapping, input normalization, type hints, and warnings.
Object storage / DuckDB mapping refactor
ibis-server/app/model/metadata/object_storage.py
Added DUCKDB_TYPE_MAPPING; replaced switch with lookup in _to_column_type for ObjectStorage/DuckDB metadata; preserved special cases; normalization and warnings.
Oracle mapping refactor
ibis-server/app/model/metadata/oracle.py
Added ORACLE_TYPE_MAPPING; typed transformer uses uppercase normalization and mapping lookup; logs warnings for unknowns.
Postgres core and extensions mapping refactor
ibis-server/app/model/metadata/postgres.py
Added POSTGRES_TYPE_MAPPING and POSTGRES_EXTENSION_TYPE_MAPPING; typed transformers in ExtensionHandler and PostgresMetadata now use mappings, normalize input, and log unknown types.
Redshift mapping refactor
ibis-server/app/model/metadata/redshift.py
Added REDSHIFT_TYPE_MAPPING; typed _transform_redshift_column_type uses mapping with normalization and warnings.
Snowflake mapping refactor
ibis-server/app/model/metadata/snowflake.py
Added SNOWFLAKE_TYPE_MAPPING; typed transformer uses mapping with normalization and warnings.
Trino mapping refactor
ibis-server/app/model/metadata/trino.py
Added TRINO_TYPE_MAPPING; typed transformer strips params, normalizes, looks up mapping, and logs warnings for unknowns.

Sequence Diagram(s)

sequenceDiagram
  participant Metadata as MetadataModule
  participant Mapping as TYPE_MAPPING (module-level)
  participant Logger as loguru.logger

  Metadata->>Mapping: normalize(data_type) → key
  alt key is complex (array/struct) or passthrough
    Mapping-->>Metadata: return original string (complex)
  else mapped
    Mapping-->>Metadata: return RustWrenEngineColumnType
  else unknown
    Mapping-->>Metadata: return UNKNOWN
    Mapping->>Logger: warning("unknown type: ...")
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • douenergy
  • wwwy3y3

Poem

I hop through types, a tidy spree,
From Snowflake, Postgres, Redshift tree —
I map each byte, timestamp, and key,
I warn when mysteries trouble me.
Carrots, columns, schema glee 🥕🐰

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)
ibis-server/app/model/metadata/snowflake.py (1)

163-184: Strip size/precision and whitespace when normalizing types

Snowflake emits types with params (e.g., NUMBER(38,0), VARCHAR(16777216)). Without stripping, these resolve to UNKNOWN.

-        # Convert to lowercase for comparison
-        normalized_type = data_type.lower()
+        # Remove parameter specs and normalize (e.g., "NUMBER(38,0)" -> "number")
+        normalized_type = re.sub(r"\(.*\)", "", data_type).strip().lower()
ibis-server/app/model/metadata/redshift.py (1)

176-199: Strip parameterization to catch e.g., character varying(50), numeric(10,2)

Without this, many types end up UNKNOWN.

-        # Convert to lowercase for comparison
-        normalized_type = data_type.lower()
+        # Normalize by removing parameter specs and trimming spaces
+        normalized_type = re.sub(r"\(.*\)", "", data_type).strip().lower()
ibis-server/app/model/metadata/object_storage.py (2)

83-97: Fix column type extraction from DuckDB relation (current code will fail)

conn.read_* returns a DuckDBPyRelation; df[col].dtypes[0] is a pandas-ism and will error. Use relation.columns/types (or DESCRIBE). Referencing your prior learning about DuckDB DESCRIBE.

-                    columns = []
-                    try:
-                        for col in df.columns:
-                            duckdb_type = df[col].dtypes[0]
-                            columns.append(
-                                Column(
-                                    name=col,
-                                    type=self._to_column_type(duckdb_type.__str__()),
-                                    notNull=False,
-                                )
-                            )
-                    except Exception as e:
-                        logger.debug(f"Failed to read column types: {e}")
-                        continue
+                    # Extract schema from DuckDB relation without materializing data
+                    try:
+                        columns = []
+                        col_names = list(df.columns)
+                        col_types = [str(t) for t in df.types]
+                        for name, typ in zip(col_names, col_types):
+                            columns.append(
+                                Column(
+                                    name=name,
+                                    type=self._to_column_type(typ),
+                                    notNull=False,
+                                )
+                            )
+                    except Exception as e:
+                        logger.debug(f"Failed to read column types: {e}")
+                        continue

151-182: Make DECIMAL/STRUCT/array guards case-insensitive and reuse normalized value

Avoid case fragility; ensures consistent UNKNOWN handling for STRUCT/arrays.

-        if col_type.startswith("DECIMAL"):
+        lower = col_type.lower()
+        if lower.startswith("decimal"):
             return RustWrenEngineColumnType.DECIMAL
@@
-        if col_type.startswith("STRUCT"):
+        if lower.startswith("struct"):
             return RustWrenEngineColumnType.UNKNOWN
@@
-        if col_type.endswith("[]"):
+        if lower.endswith("[]"):
             return RustWrenEngineColumnType.UNKNOWN
-        # Convert to lowercase for comparison
-        normalized_type = col_type.lower()
+        # Use the lower-cased value for mapping
+        normalized_type = lower
ibis-server/app/model/metadata/clickhouse.py (2)

91-101: Set notNull based on Nullable(...) wrapper (currently always False)

ClickHouse encodes nullability in the type. This is a correctness issue.

-            unique_tables[schema_table].columns.append(
-                Column(
-                    name=row["column_name"],
-                    type=self._transform_column_type(row["data_type"]),
-                    notNull=False,
-                    description=row["column_comment"],
-                    properties=None,
-                )
-            )
+            dtype = row["data_type"]
+            unique_tables[schema_table].columns.append(
+                Column(
+                    name=row["column_name"],
+                    type=self._transform_column_type(dtype),
+                    notNull=not dtype.lower().startswith("nullable("),
+                    description=row["column_comment"],
+                    properties=None,
+                )
+            )

112-133: Handle Nullable/LowCardinality/Array wrappers and strip params (DateTime64, Decimal, FixedString)

Without unwrapping, most real-world types map to UNKNOWN.

-        # Convert to lowercase for comparison
-        normalized_type = data_type.lower()
-
-        # Use the module-level mapping table
-        mapped_type = CLICKHOUSE_TYPE_MAPPING.get(
-            normalized_type, RustWrenEngineColumnType.UNKNOWN
-        )
+        # Unwrap wrappers and strip parameters:
+        # - Nullable(T), LowCardinality(T), SimpleAggregateFunction(..., T)
+        # - Array(T) -> T (array element type; array-ness currently ignored)
+        # - Decimal(p,s), FixedString(n), DateTime64(s)
+        t = data_type.strip()
+        # unwrap known wrappers
+        wrap_pat = re.compile(r'^(Nullable|LowCardinality|SimpleAggregateFunction)\((.+)\)$', re.IGNORECASE)
+        while True:
+            m = wrap_pat.match(t)
+            if not m:
+                break
+            # for SimpleAggregateFunction(func, T) keep T
+            inner = m.group(2)
+            # if inner still contains a comma, keep the last part as the base type
+            if m.group(1).lower() == "simpleaggregatefunction" and "," in inner:
+                inner = inner.split(",")[-1].strip()
+            t = inner
+        # unwrap Array(T) -> T
+        m = re.match(r'^Array\((.+)\)$', t, re.IGNORECASE)
+        if m:
+            t = m.group(1)
+        # drop parameter lists: Decimal(10,2) -> Decimal, FixedString(16) -> FixedString, DateTime64(3) -> DateTime64
+        t = re.sub(r'\([^()]*\)', '', t).lower()
+
+        mapped_type = CLICKHOUSE_TYPE_MAPPING.get(t, RustWrenEngineColumnType.UNKNOWN)
ibis-server/app/model/metadata/postgres.py (2)

121-131: Bug: PostGIS handler updates only the first column and forces string type.

  • It ignores the target column name and breaks after the first column.
  • It coerces enum to str, diverging from get_table_list which sets the enum.

Apply:

-            table = tables[table_name]
-            for column in table.columns:
-                column.type = str(
-                    self._transform_postgres_column_type(row["column_type"])
-                )
-                break
+            table = tables[table_name]
+            target = next((c for c in table.columns if c.name == row["column_name"]), None)
+            if target is None:
+                logger.warning(f"PostGIS column not found: {table_name}.{row['column_name']}")
+                continue
+            target.type = self._transform_postgres_column_type(row["column_type"])

69-101: Correct type hints for tables and extensions.

augment/postgis_handler operate on a dict[str, Table]; get_extensions returns a list of dicts. Fix annotations to match usage and avoid confusion for static analysis.

-    def augment(self, tables: list[Table]) -> list[Table]:
+    def augment(self, tables: dict[str, Table]) -> dict[str, Table]:
...
-    def get_extensions(self) -> list[str]:
+    def get_extensions(self) -> list[dict[str, str]]:
...
-    def postgis_handler(self, tables: list[Table], schema_name: str) -> list[Table]:
+    def postgis_handler(self, tables: dict[str, Table], schema_name: str) -> dict[str, Table]:
ibis-server/app/model/metadata/bigquery.py (1)

77-86: Wire the transformer into get_table_list.

Currently Column.type is the raw BigQuery string, so the new mapping/logging is unused.

-            return Column(
+            return Column(
                 # field_path supports both column & nested column
                 name=row["field_path"],
-                type=row["data_type"],
+                type=self._transform_column_type(row["data_type"]),
                 notNull=row["is_nullable"].lower() == "no",
                 description=row["column_description"],
                 properties={},
                 nestedColumns=[] if has_nested_columns(row) else None,
             )
🧹 Nitpick comments (11)
ibis-server/app/model/metadata/canner.py (2)

240-246: Add type hints and docstring for consistency with other adapters

Align this method signature/docstring with athena/mysql/trino for readability and static checks.

Example outside-range change:

def _transform_column_type(self, data_type: str) -> RustWrenEngineColumnType:
    """Transform Canner data type to RustWrenEngineColumnType."""
    ...

41-44: Verify FLOAT mapping semantics (“float” -> FLOAT8)

MySQL-style “float” is typically 32-bit; here it maps to 64-bit. If Canner surfaces MySQL-like types, consider mapping to FLOAT4/REAL for accuracy, or document the intentional widening.

-    "float": RustWrenEngineColumnType.FLOAT8,
+    "float": RustWrenEngineColumnType.FLOAT4,

Do you want me to audit float/double mappings across all adapters for consistency?

ibis-server/app/model/metadata/mysql.py (2)

178-180: Align comment with implementation

We don’t strip params here; DATA_TYPE already excludes them in MySQL. Update the comment to avoid confusion.

-        # Remove parameter specifications like VARCHAR(255) -> VARCHAR
-        normalized_type = data_type.strip().lower()
+        # MySQL information_schema.COLUMNS.DATA_TYPE is already base type (no length/precision)
+        normalized_type = data_type.strip().lower()

46-48: Double-check TIMESTAMP -> TIMESTAMPTZ

MySQL TIMESTAMP stores UTC but has no timezone field; many systems map it to TIMESTAMP (no tz). Please confirm intended semantics across adapters. I can batch-update mappings/tests if needed.

ibis-server/app/model/metadata/snowflake.py (1)

17-51: Expand Snowflake type coverage (LTZ/TIME/BINARY/VARIANT/OBJECT/GEOGRAPHY/GEOMETRY)

Common Snowflake types are missing, leading to unnecessary UNKNOWNs and noisy logs.

 SNOWFLAKE_TYPE_MAPPING = {
@@
     # Date and Time Types
     "date": RustWrenEngineColumnType.DATE,
-    "datetime": RustWrenEngineColumnType.TIMESTAMP,
+    "datetime": RustWrenEngineColumnType.TIMESTAMP,
     "timestamp": RustWrenEngineColumnType.TIMESTAMP,
     "timestamp_ntz": RustWrenEngineColumnType.TIMESTAMP,
     "timestamp_tz": RustWrenEngineColumnType.TIMESTAMPTZ,
+    "timestamp_ltz": RustWrenEngineColumnType.TIMESTAMPTZ,
+    "time": RustWrenEngineColumnType.TIME,
+    # Semi-structured / binary
+    "variant": RustWrenEngineColumnType.JSON,
+    "object": RustWrenEngineColumnType.JSON,
+    "binary": RustWrenEngineColumnType.BYTES,
+    # Spatial
+    "geography": RustWrenEngineColumnType.GEOGRAPHY,
+    "geometry": RustWrenEngineColumnType.GEOMETRY,
 }
ibis-server/app/model/metadata/redshift.py (1)

15-51: Add Redshift SUPER type mapping

SUPER is common; mapping to JSON avoids UNKNOWN noise.

 REDSHIFT_TYPE_MAPPING = {
@@
     "inet": RustWrenEngineColumnType.INET,
     "oid": RustWrenEngineColumnType.OID,
+    "super": RustWrenEngineColumnType.JSON,
 }
ibis-server/app/model/metadata/object_storage.py (1)

25-51: Align DuckDB scalar mappings with existing conventions

Use consistent integer/float targets across modules; avoid lossy/int-size mismatches.

 DUCKDB_TYPE_MAPPING = {
-    "bigint": RustWrenEngineColumnType.INT64,
-    "bit": RustWrenEngineColumnType.INT2,
+    "bigint": RustWrenEngineColumnType.BIGINT,
     "blob": RustWrenEngineColumnType.BYTES,
     "boolean": RustWrenEngineColumnType.BOOL,
     "date": RustWrenEngineColumnType.DATE,
     "double": RustWrenEngineColumnType.DOUBLE,
-    "float": RustWrenEngineColumnType.FLOAT,
-    "integer": RustWrenEngineColumnType.INT,
+    "float": RustWrenEngineColumnType.FLOAT4,
+    "real": RustWrenEngineColumnType.REAL,
+    "integer": RustWrenEngineColumnType.INTEGER,
@@
     "interval": RustWrenEngineColumnType.INTERVAL,
     "json": RustWrenEngineColumnType.JSON,
-    "smallint": RustWrenEngineColumnType.INT2,
+    "smallint": RustWrenEngineColumnType.SMALLINT,
     "time": RustWrenEngineColumnType.TIME,
     "timestamp": RustWrenEngineColumnType.TIMESTAMP,
     "timestamp with time zone": RustWrenEngineColumnType.TIMESTAMPTZ,
-    "tinyint": RustWrenEngineColumnType.INT2,
+    "tinyint": RustWrenEngineColumnType.TINYINT,
@@
     "uuid": RustWrenEngineColumnType.UUID,
-    "varchar": RustWrenEngineColumnType.STRING,
+    "varchar": RustWrenEngineColumnType.VARCHAR,
 }
ibis-server/app/model/metadata/postgres.py (3)

25-36: Normalize integer/float aliases to a single canonical target (consistency).

Today, some aliases resolve to INTEGER/DOUBLE while enum exposes INT/INT4/INT8, FLOAT/FLOAT4/FLOAT8. Pick one canonical target per family to avoid mixed semantics across engines. If you keep INTEGER/DOUBLE here, mirror that everywhere; otherwise switch these to INT/DOUBLE or INT4/FLOAT8 consistently.


145-147: Trim input before lowercasing.

Be defensive against stray whitespace from drivers.

-        normalized_type = data_type.lower()
+        normalized_type = data_type.strip().lower()

Also applies to: 290-292


298-301: Good: unknown-type warning added. Consider richer context later.

If/when you can pass table/column into the transformer, include them in the warning to speed up triage.

ibis-server/app/model/metadata/bigquery.py (1)

204-206: Trim input before lowercasing.

-        normalized_type = data_type.lower()
+        normalized_type = data_type.strip().lower()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 857f35e and d32d1de.

📒 Files selected for processing (12)
  • ibis-server/app/model/metadata/athena.py (3 hunks)
  • ibis-server/app/model/metadata/bigquery.py (3 hunks)
  • ibis-server/app/model/metadata/canner.py (3 hunks)
  • ibis-server/app/model/metadata/clickhouse.py (3 hunks)
  • ibis-server/app/model/metadata/mssql.py (3 hunks)
  • ibis-server/app/model/metadata/mysql.py (3 hunks)
  • ibis-server/app/model/metadata/object_storage.py (3 hunks)
  • ibis-server/app/model/metadata/oracle.py (3 hunks)
  • ibis-server/app/model/metadata/postgres.py (4 hunks)
  • ibis-server/app/model/metadata/redshift.py (3 hunks)
  • ibis-server/app/model/metadata/snowflake.py (3 hunks)
  • ibis-server/app/model/metadata/trino.py (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-01-07T03:56:21.741Z
Learnt from: goldmedal
PR: Canner/wren-engine#1029
File: ibis-server/app/model/metadata/object_storage.py:44-44
Timestamp: 2025-01-07T03:56:21.741Z
Learning: When working with DuckDB in Python, use `conn.execute("DESCRIBE SELECT * FROM table").fetchall()` to get column types instead of accessing DataFrame-style attributes like `dtype` or `dtypes`.

Applied to files:

  • ibis-server/app/model/metadata/object_storage.py
🧬 Code graph analysis (12)
ibis-server/app/model/metadata/trino.py (2)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/athena.py (1)
  • _transform_column_type (140-160)
ibis-server/app/model/metadata/oracle.py (2)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/athena.py (1)
  • _transform_column_type (140-160)
ibis-server/app/model/metadata/athena.py (2)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/trino.py (1)
  • _transform_column_type (138-158)
ibis-server/app/model/metadata/canner.py (1)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/snowflake.py (2)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/athena.py (1)
  • _transform_column_type (140-160)
ibis-server/app/model/metadata/redshift.py (1)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/clickhouse.py (2)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/athena.py (1)
  • _transform_column_type (140-160)
ibis-server/app/model/metadata/postgres.py (1)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/mssql.py (3)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/athena.py (1)
  • _transform_column_type (140-160)
ibis-server/app/model/metadata/mysql.py (1)
  • _transform_column_type (169-189)
ibis-server/app/model/metadata/bigquery.py (2)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/athena.py (1)
  • _transform_column_type (140-160)
ibis-server/app/model/metadata/mysql.py (3)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/athena.py (1)
  • _transform_column_type (140-160)
ibis-server/app/model/metadata/trino.py (1)
  • _transform_column_type (138-158)
ibis-server/app/model/metadata/object_storage.py (1)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (5)
ibis-server/app/model/metadata/athena.py (2)

18-51: Mapping table looks consistent with the approach across modules

No blockers here.


140-160: Confirm Athena TIMESTAMP semantics (tz vs no-tz)

Athena/Trino historically differentiate TIMESTAMP vs TIMESTAMP WITH TIME ZONE. Mapping plain "timestamp" to TIMESTAMPTZ may be inaccurate depending on engine settings. Please verify and align.

Do you want me to switch "timestamp" -> TIMESTAMP and add "timestamp with time zone" -> TIMESTAMPTZ if that matches your environment?

ibis-server/app/model/metadata/postgres.py (1)

54-58: LGTM: Extension mapping table is clear.

ibis-server/app/model/metadata/bigquery.py (2)

215-219: Verify TIMESTAMP mapping consistency across engines.

BigQuery TIMESTAMP (UTC point-in-time) is mapped to TIMESTAMPTZ here. Ensure this aligns with other connectors (e.g., Postgres TIMESTAMPTZ) and downstream expectations.

Would you like me to scan the repo for other connectors’ TIMESTAMP mappings and summarize the current consistency matrix?


195-203: Union-type annotation supported – no changes needed.
Project requires Python ≥3.11, so str | RustWrenEngineColumnType is valid syntax.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
ibis-server/app/model/metadata/bigquery.py (3)

79-88: Type mapping isn’t applied; unknown-type logging won’t trigger.

get_column sets Column.type from raw data_type, bypassing _transform_column_type. Wire it in so the mapping and warnings are effective.

         def get_column(row) -> Column:
             return Column(
                 # field_path supports both column & nested column
                 name=row["field_path"],
-                type=row["data_type"],
+                type=self._transform_column_type(row["data_type"]),
                 notNull=row["is_nullable"].lower() == "no",
                 description=row["column_description"],
                 properties={},
                 nestedColumns=[] if has_nested_columns(row) else None,
             )

Also applies to: 197-221


67-72: Quote dataset identifier to support hyphenated project IDs.

Without backticks, projects like my-proj will break. Quote {dataset_id} in INFORMATION_SCHEMA refs.

-            FROM {dataset_id}.INFORMATION_SCHEMA.COLUMNS c 
-            JOIN {dataset_id}.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS cf 
+            FROM `{dataset_id}`.INFORMATION_SCHEMA.COLUMNS c 
+            JOIN `{dataset_id}`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS cf 
                 ON cf.table_name = c.table_name 
                 AND cf.column_name = c.column_name
-            LEFT JOIN {dataset_id}.INFORMATION_SCHEMA.TABLE_OPTIONS table_options
+            LEFT JOIN `{dataset_id}`.INFORMATION_SCHEMA.TABLE_OPTIONS table_options
                 ON c.table_name = table_options.table_name AND table_options.OPTION_NAME = 'description'

167-171: Syntax error: trailing comma before FROM.

SELECT list ends with a comma, which invalidates the query.

-                kcu.table_name as constraintedTable, kcu.column_name as constraintedColumn, 
+                kcu.table_name as constraintedTable, kcu.column_name as constraintedColumn
♻️ Duplicate comments (1)
ibis-server/app/model/metadata/bigquery.py (1)

15-32: Boolean mapping added — resolves the prior review.

Adding "bool"/"boolean" → BOOL fixes the unknown-type noise for booleans.

🧹 Nitpick comments (3)
ibis-server/app/model/metadata/bigquery.py (3)

15-32: Consider a few common BigQuery aliases.

Optionally cover INTEGER → INT64 and DECIMAL → NUMERIC to be defensive against variant sources that may emit synonyms.

 BIGQUERY_TYPE_MAPPING = {
     # GEOGRAPHY and RANGE columns were filtered out
     "bool": RustWrenEngineColumnType.BOOL,
     "boolean": RustWrenEngineColumnType.BOOL,
+    "integer": RustWrenEngineColumnType.INT64,
+    "decimal": RustWrenEngineColumnType.NUMERIC,
     "bytes": RustWrenEngineColumnType.BYTES,

106-108: Make nested-type detection case-insensitive.

Defensive against any casing variance.

-        def has_nested_columns(row) -> bool:
-            return "STRUCT" in row["data_type"]
+        def has_nested_columns(row) -> bool:
+            return "struct" in row["data_type"].lower()

218-219: Prefer loguru’s lazy formatting.

Avoid f-string evaluation when the log is filtered.

-        if mapped_type == RustWrenEngineColumnType.UNKNOWN:
-            logger.warning(f"Unknown BigQuery data type: {data_type}")
+        if mapped_type == RustWrenEngineColumnType.UNKNOWN:
+            logger.warning("Unknown BigQuery data type: {}", data_type)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d32d1de and b4706f9.

📒 Files selected for processing (2)
  • ibis-server/app/model/metadata/bigquery.py (3 hunks)
  • ibis-server/app/model/metadata/canner.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/app/model/metadata/canner.py
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/app/model/metadata/bigquery.py (2)
ibis-server/app/model/metadata/dto.py (1)
  • RustWrenEngineColumnType (13-57)
ibis-server/app/model/metadata/clickhouse.py (1)
  • _transform_column_type (112-132)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (2)
ibis-server/app/model/metadata/bigquery.py (2)

1-1: Logger import is fine.

Adopting loguru here is consistent with other backends.


197-197: No action needed: Python ≥3.11 supports PEP 604 unions.
The project’s requires-python = ">=3.11" (pyproject.toml:6) ensures str | RustWrenEngineColumnType is valid.

@goldmedal
Copy link
Copy Markdown
Contributor Author

BigQuery has been tested locally.

poetry run pytest -m 'bigquery'
======================================================================================================================================= test session starts =======================================================================================================================================
platform darwin -- Python 3.11.11, pytest-8.4.1, pluggy-1.5.0
rootdir: /Users/jax/git/wren-engine/ibis-server
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 369 items / 330 deselected / 39 selected                                                                                                                                                                                                                                                

tests/routers/v2/connector/test_bigquery.py ..................                                                                                                                                                                                                                              [ 46%]
tests/routers/v3/connector/bigquery/test_functions.py ....                                                                                                                                                                                                                                  [ 56%]
tests/routers/v3/connector/bigquery/test_query.py .................  

@goldmedal goldmedal requested a review from douenergy September 2, 2025 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants