feat(ibis): Introduce WrenError for the structured ErrorResponse#1299
feat(ibis): Introduce WrenError for the structured ErrorResponse#1299goldmedal merged 12 commits intoCanner:mainfrom
Conversation
WalkthroughCentralizes error handling by adding app/model/error.py (WrenError, ErrorCode, ErrorPhase, ErrorResponse), replaces legacy custom exceptions across many modules, changes FastAPI error responses to JSON (ORJSONResponse) with X-Correlation-ID header support, and updates tests to assert JSON error payloads while removing many /validate tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant App as FastAPI App
participant Endpoint as Route Logic
participant Service as Service / DB / MDL
participant Handler as WrenError Handler
participant Resp as ORJSONResponse
Client->>App: HTTP request (may include X-Correlation-ID)
App->>Endpoint: dispatch
Endpoint->>Service: perform operation
alt Success
Service-->>Endpoint: result
Endpoint-->>Client: 2xx JSON
else WrenError raised
Service-->>Endpoint: raise WrenError(code, phase, metadata)
Endpoint->>Handler: build ErrorResponse (errorCode, message, phase, metadata, correlationId, timestamp)
Handler->>Resp: serialize JSON
Resp-->>Client: status = WrenError.get_http_status_code()
else Unexpected Exception
Service-->>Endpoint: raise Exception
Endpoint->>Handler: build generic ErrorResponse (GENERIC_INTERNAL_ERROR, correlationId, timestamp)
Resp-->>Client: 500 JSON
end
note right of App: Correlation header: X-Correlation-ID
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
ibis-server/app/mdl/java_engine.py (1)
37-42: Bug: r.raise_for_status() returns None — chaining .text will crash on success.httpx.Response.raise_for_status() returns None (and raises on non-2xx). As written, successful responses attempt None.text and fail.
- return r.raise_for_status().text.replace("\n", " ") + r.raise_for_status() + return r.text.replace("\n", " ")ibis-server/app/util.py (1)
252-264: Exceptions other than “timeout” are swallowed — function may return None silently.In both ClickHouse and Trino handlers, if the error is not a timeout, the except block falls through without re-raising, potentially returning None and hiding real failures.
- except TimeoutError: + except asyncio.TimeoutError: raise DatabaseTimeoutError( f"{operation_name} timeout after {app_timeout_seconds} seconds" ) except clickhouse_connect.driver.exceptions.DatabaseError as e: if "TIMEOUT_EXCEEDED" in str(e): raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}") + raise except trino.exceptions.TrinoQueryError as e: if e.error_name == "EXCEEDED_TIME_LIMIT": raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}") + raise except psycopg.errors.QueryCanceled as e: raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}")ibis-server/wren/__init__.py (1)
57-74: Use specific error codes and simplify file handling for MDL file
- File readability check after a successful open is unnecessary; catch FileNotFoundError instead and map to MDL_NOT_FOUND (404).
- Malformed JSON should map to INVALID_MDL.
- Keep GENERIC_USER_ERROR only for generic 4xx conditions.
- with open(mdl_path) as f: - if not f.readable(): - raise WrenError( - ErrorCode.GENERIC_USER_ERROR, f"Cannot read MDL file at {mdl_path}" - ) - try: - manifest = json.load(f) - manifest_base64 = ( - base64.b64encode(json.dumps(manifest).encode("utf-8")).decode("utf-8") - if manifest - else None - ) - except json.JSONDecodeError as e: - raise WrenError( - ErrorCode.GENERIC_USER_ERROR, - f"Invalid JSON in MDL file at {mdl_path}: {e}", - ) from e + try: + with open(mdl_path) as f: + manifest = json.load(f) + manifest_base64 = ( + base64.b64encode(json.dumps(manifest).encode("utf-8")).decode("utf-8") + if manifest + else None + ) + except FileNotFoundError as e: + raise WrenError( + ErrorCode.MDL_NOT_FOUND, + f"MDL file not found at {mdl_path}", + ) from e + except json.JSONDecodeError as e: + raise WrenError( + ErrorCode.INVALID_MDL, + f"Invalid JSON in MDL file at {mdl_path}: {e}", + ) from eibis-server/app/model/validator.py (2)
93-96: Avoid naive split on '=' when parsing relationship condition.Use a single split to tolerate expressions containing '=' (e.g., CAST, JSON ops). Trim both sides safely.
- columns = condition.split("=") - left_column = columns[0].strip().split(".")[1] - right_column = columns[1].strip().split(".")[1] + left_expr, right_expr = [p.strip() for p in condition.split("=", 1)] + left_column = left_expr.split(".")[1] + right_column = right_expr.split(".")[1]
97-104: Quote column identifiers in DISTINCT to avoid case/reserved-word issues.The join SQL quotes columns; do the same for the uniqueness check for consistency and correctness.
- def generate_column_is_unique_sql(model_name, column_name): - return f'SELECT count(*) = count(distinct {column_name}) AS result FROM "{model_name}"' + def generate_column_is_unique_sql(model_name, column_name): + return ( + f'SELECT count(*) = count(distinct "{column_name}") AS result ' + f'FROM "{model_name}"' + )ibis-server/app/main.py (1)
105-114: And for 501 handler: use the runtime correlation ID.- return ORJSONResponse( + from asgi_correlation_id import correlation_id # local import + corr_id = correlation_id.get() or request.headers.get(X_CORRELATION_ID) + return ORJSONResponse( status_code=501, content=ErrorResponse( error_code=ErrorCode.NOT_IMPLEMENTED.name, message=str(exc), - timestamp=datetime.now().isoformat(), - correlation_id=request.headers.get(X_CORRELATION_ID), + timestamp=datetime.now().isoformat(), + correlation_id=corr_id, ).model_dump(by_alias=True), )ibis-server/app/model/connector.py (1)
399-441: BigQuery: return a PyArrow table for the empty-result workaround.The method contract returns pa.Table. Returning a pandas.DataFrame here will break upstream code that expects .to_pandas()/.to_pyarrow().
- ibis_fields = ibis_schema_mapper.to_ibis(bq_fields.schema) - return pd.DataFrame(columns=ibis_fields.names) + ibis_fields = ibis_schema_mapper.to_ibis(bq_fields.schema) + empty_df = pd.DataFrame(columns=ibis_fields.names) + return pa.Table.from_pandas(empty_df)
🧹 Nitpick comments (60)
ibis-server/tests/routers/v2/connector/test_local_file.py (1)
248-251: Migrate raw-text error assertions to structured JSON checks across all testsThe API now consistently returns a WrenError/ErrorResponse JSON payload rather than plain text. Beyond the single snippet in
test_local_file.py, dozens of other tests still assert onresponse.textor do exact-match onresponse.json()["message"]. To improve robustness and validate the full error schema, update each test to:
- Parse the JSON once (e.g.
err = response.json())- Assert the
"message"key exists, is a string, and usesstartswithorinrather than exact equality- Verify the presence (and string type) of the structured fields:
"code","phase", and"correlationId"Key locations needing refactoring include (but are not limited to):
ibis-server/tests/routers/v2/test_relationship_valid.py
• lines 121, 134, 150: rawresponse.text == …ibis-server/tests/routers/v2/connector/test_minio_file.py
• line 351: exactresponse.text == "Failed to list files: Unsupported format: unsupported"ibis-server/tests/routers/v2/connector/test_local_file.py
• lines 248–251: exactresponse.json()["message"] == "Failed to list files: Unsupported format: unsupported"ibis-server/tests/routers/v2/connector/test_s3_file.py & test_gcs_file.py
• lines 296–297: exactresponse.json()["message"] == "Failed to list files: Unsupported format: unsupported"ibis-server/tests/routers/v2/connector/test_postgres.py
• lines 598–599, 686–687, 954–955, 966–968: exactresponse.json()["message"] == …ibis-server/tests/routers/v2/connector/test_oracle.py
• lines 420–421: exactresponse.json()["message"] == 'Ambiguous model: …'ibis-server/tests/routers/v3/connector/postgres/test_validate.py
• lines 43–44, 111–112, 122–123, 177–178: exactresponse.json()["message"] == …ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py
• lines 271–272, 286–288: exactresponse.json()["message"] == ……and all other tests under
ibis-server/tests/routers/{v2,v3}/connectorthat still referenceresponse.textor do exact JSON-message matches.Example refactor for test_local_file.py:
- assert ( - response.json()["message"] - == "Failed to list files: Unsupported format: unsupported" - ) + err = response.json() + # Verify message shape without brittle exact match + assert "message" in err and isinstance(err["message"], str) + assert err["message"].startswith("Failed to list files: Unsupported format") + # Ensure structured error fields exist + for key in ("code", "phase", "correlationId"): + assert key in err and isinstance(err[key], str)Please apply similar changes across all identified tests to fully migrate to the structured ErrorResponse schema.
ibis-server/app/dependencies.py (2)
42-43: Use the constant and exact match for filtering correlation header.Request headers from ASGI are lowercase; prefer comparing against X_CORRELATION_ID.lower() and use equality instead of startswith to avoid unintentionally passing through similarly prefixed headers.
Apply this diff:
- elif header_string.startswith("x-correlation-id"): - return True + elif header_string == X_CORRELATION_ID.lower(): + return True
44-49: Optional: allow W3C baggage propagation.You already pass traceparent/tracestate/sentry-trace. If you use OpenTelemetry or Sentry, consider also allowing the standardized "baggage" header for context propagation consistency.
Apply this diff:
- elif header_string == "sentry-trace": + elif header_string == "sentry-trace": return True + elif header_string == "baggage": + return Trueibis-server/tests/routers/v2/connector/test_s3_file.py (1)
295-298: Strengthen assertions to validate structured ErrorResponse.Since the API now returns a structured error payload (ErrorResponse), assert presence of errorCode and correlationId, and that the correlationId matches the response header. Keeps tests aligned with the new contract and catches serialization regressions.
Apply this diff:
- assert ( - response.json()["message"] - == "Failed to list files: Unsupported format: unsupported" - ) + err = response.json() + assert err["message"] == "Failed to list files: Unsupported format: unsupported" + # Structured fields + assert "errorCode" in err + assert "correlationId" in err + # Correlation header is propagated and matches payload + cid = response.headers.get("X-Correlation-ID") + assert cid and err["correlationId"] == cidibis-server/tests/routers/v2/connector/test_gcs_file.py (1)
295-298: Mirror S3 test improvements for GCS error shape.Validate errorCode and correlationId presence, and assert correlation header parity with payload.
Apply this diff:
- assert ( - response.json()["message"] - == "Failed to list files: Unsupported format: unsupported" - ) + err = response.json() + assert err["message"] == "Failed to list files: Unsupported format: unsupported" + # Structured fields + assert "errorCode" in err + assert "correlationId" in err + # Correlation header is propagated and matches payload + cid = response.headers.get("X-Correlation-ID") + assert cid and err["correlationId"] == cidibis-server/app/model/error.py (4)
72-85: Make timestamps timezone-aware (UTC).Naive local timestamps complicate cross-service correlation. Use UTC and include the offset in ISO-8601.
Apply this diff:
-from datetime import datetime +from datetime import datetime, timezone @@ - self.timestamp = datetime.now().isoformat() + self.timestamp = datetime.now(timezone.utc).isoformat()
72-80: Either use the 'cause' parameter or drop it.The constructor accepts cause but never stores or exposes it. Persist it for logging/observability, or remove the parameter to avoid confusion.
Apply this diff:
def __init__( self, error_code: ErrorCode, message: str, phase: ErrorPhase | None = None, metadata: dict[str, Any] | None = None, cause: Exception | None = None, ): self.error_code = error_code self.message = message self.phase = phase self.metadata = metadata - self.timestamp = datetime.now().isoformat() + self.timestamp = datetime.now(timezone.utc).isoformat() + self.cause = cause # retained for logs/diagnostics
96-114: Map DUCKDB_FILE_NOT_FOUND to 404.A file-not-found condition semantically aligns with HTTP 404 alongside other "not found" codes.
Apply this diff:
match self.error_code: case ( ErrorCode.NOT_FOUND | ErrorCode.MDL_NOT_FOUND | ErrorCode.VALIDATION_RULE_NOT_FOUND + | ErrorCode.DUCKDB_FILE_NOT_FOUND ): return HTTP_404_NOT_FOUND
116-125: Optional: include structured metadata for timeouts.Consider adding metadata (e.g., timeoutMs, engine/resource info) to aid users in remediation without parsing the message string.
Example (outside current diff scope):
class DatabaseTimeoutError(WrenError): def __init__(self, message: str, *, timeout_ms: int | None = None, resource: str | None = None): meta = {"timeoutMs": timeout_ms, "resource": resource} super().__init__(error_code=ErrorCode.DATABASE_TIMEOUT, message=message, metadata=meta)ibis-server/app/model/metadata/canner.py (1)
72-74: Attach phase and helpful metadata to WrenError.This enriches diagnostics and keeps errors consistent with the new model.
Apply this diff:
- raise WrenError( - ErrorCode.INVALID_CONNECTION_INFO, f"Workspace {ws_sql_name} not found" - ) + from app.model.error import ErrorPhase + raise WrenError( + ErrorCode.INVALID_CONNECTION_INFO, + f"Workspace {ws_sql_name} not found", + phase=ErrorPhase.METADATA_FETCHING, + metadata={"workspaceSqlName": ws_sql_name}, + )ibis-server/app/mdl/java_engine.py (1)
31-35: Use a more precise phase and enrich metadata for observability.Since this path guards dry-plan calls, SQL_DRY_RUN is a closer fit than SQL_PLANNING. Also consider attaching minimal metadata (e.g., which endpoint was missing) to aid debugging.
- raise WrenError( - ErrorCode.GENERIC_INTERNAL_ERROR, - "WREN_ENGINE_ENDPOINT is not set. Cannot call dry_plan without a valid endpoint.", - phase=ErrorPhase.SQL_PLANNING, - ) + raise WrenError( + ErrorCode.GENERIC_INTERNAL_ERROR, + "WREN_ENGINE_ENDPOINT is not set. Cannot call dry_plan without a valid endpoint.", + phase=ErrorPhase.SQL_DRY_RUN, + metadata={"reason": "missing_wren_engine_endpoint"}, + )ibis-server/app/util.py (4)
41-43: Polish the migration banner grammar (tiny nit).-MIGRATION_MESSAGE = "Wren engine is migrating to Rust version now. \ - Wren AI team are appreciate if you can provide the error messages and related logs for us." +MIGRATION_MESSAGE = "Wren engine is migrating to the Rust version now. \ + The Wren AI team would appreciate receiving error messages and related logs."
64-71: Typos in debug log — make log lines easier to grep.- logger.debug(f"formmated_sql: {formatted_sql}") + logger.debug(f"formatted_sql: {formatted_sql}")
213-234: Rename _formater → _formatter for consistency; update call site.Minor naming nit; improves readability and greppability.
-def _formater(field: pa.Field) -> str: +def _formatter(field: pa.Field) -> str: @@ - + ", ".join([_formater(field) for field in df.schema]) + + ", ".join([_formatter(field) for field in df.schema])
248-256: Optional: handle the general case where operation is a Task vs. bare awaitable consistently.Current usage is fine, but if you foresee passing coroutines directly more often, consider normalizing to a Task internally to simplify cancellation paths.
ibis-server/tests/routers/v2/connector/test_clickhouse.py (1)
471-499: Assert structured error payload for timeouts, not just substrings.Given the unified ErrorResponse, assert errorCode and message in JSON. Keep substring check if desired.
- assert response.status_code == 504 # Gateway Timeout - assert "Query was cancelled:" in response.text + assert response.status_code == 504 # Gateway Timeout + payload = response.json() + assert payload["errorCode"] == ErrorCode.DATABASE_TIMEOUT.name + assert "Query was cancelled:" in payload["message"] @@ - assert response.status_code == 504 # Gateway Timeout - assert "Query was cancelled:" in response.text + assert response.status_code == 504 # Gateway Timeout + payload = response.json() + assert payload["errorCode"] == ErrorCode.DATABASE_TIMEOUT.name + assert "Query was cancelled:" in payload["message"]ibis-server/app/mdl/analyzer.py (3)
5-5: Consider importing ErrorPhase to tag where failures occur.Adding phases improves troubleshooting and aligns with other modules.
-from app.model.error import ErrorCode, WrenError +from app.model.error import ErrorCode, WrenError, ErrorPhase
20-25: Attach phase and minimal metadata to WrenError for better diagnostics.Connect failures and HTTP errors benefit from phase and context (endpoint/path, status).
- raise WrenError( - ErrorCode.LEGACY_ENGINE_ERROR, f"Can not connect to Java Engine: {e}" - ) from e + raise WrenError( + ErrorCode.LEGACY_ENGINE_ERROR, + f"Cannot connect to Java Engine: {e}", + phase=ErrorPhase.SQL_PLANNING, + metadata={"endpoint": wren_engine_endpoint, "path": "/v2/analysis/sql"}, + ) from e @@ - raise WrenError(ErrorCode.GENERIC_USER_ERROR, e.response.text) + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + e.response.text, + phase=ErrorPhase.SQL_PLANNING, + metadata={ + "endpoint": wren_engine_endpoint, + "path": "/v2/analysis/sql", + "status_code": e.response.status_code, + }, + )
37-41: Mirror the same phase/metadata for analyze_batch.- raise WrenError( - ErrorCode.LEGACY_ENGINE_ERROR, f"Can not connect to Java Engine: {e}" - ) from e + raise WrenError( + ErrorCode.LEGACY_ENGINE_ERROR, + f"Cannot connect to Java Engine: {e}", + phase=ErrorPhase.SQL_PLANNING, + metadata={"endpoint": wren_engine_endpoint, "path": "/v2/analysis/sqls"}, + ) from e @@ - raise WrenError(ErrorCode.GENERIC_USER_ERROR, e.response.text) + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + e.response.text, + phase=ErrorPhase.SQL_PLANNING, + metadata={ + "endpoint": wren_engine_endpoint, + "path": "/v2/analysis/sqls", + "status_code": e.response.status_code, + }, + )ibis-server/app/model/data_source.py (2)
181-185: Expose non-sensitive metadata on invalid ClickHouse URL to speed up debugging.Including the provided scheme (only) helps identify typical mistakes (e.g., “http” vs “clickhouse”) without leaking secrets.
- raise WrenError( - ErrorCode.INVALID_CONNECTION_INFO, - "Invalid connection URL for ClickHouse", - ) + raise WrenError( + ErrorCode.INVALID_CONNECTION_INFO, + "Invalid connection URL for ClickHouse", + metadata={"scheme": parsed.scheme or None}, + )
411-416: SSL CA requirement is correct; consider clarifying the mode name.Message is clear. Optionally add ssl_mode value to metadata for clarity.
- raise WrenError( - ErrorCode.INVALID_CONNECTION_INFO, - "SSL CA must be provided when SSL mode is VERIFY CA", - ) + raise WrenError( + ErrorCode.INVALID_CONNECTION_INFO, + "SSL CA must be provided when SSL mode is VERIFY CA", + metadata={"ssl_mode": "VERIFY_CA"}, + )ibis-server/app/model/metadata/object_storage.py (2)
84-88: Avoid leaking raw exception details; includecauseand keep the message generic.Returning the raw exception text in the end-user message risks exposing internal paths or provider messages. Keep the message stable, attach the underlying exception via
cause, and rely on logs/observability for details. Also keeps phase metadata intact.Apply this diff:
- raise WrenError( - ErrorCode.GENERIC_USER_ERROR, - f"Failed to list files: {e!s}", - phase=ErrorPhase.METADATA_FETCHING, - ) + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + "Failed to list files", + phase=ErrorPhase.METADATA_FETCHING, + cause=e, + )
350-355: Add phase metadata for consistency inget_version().Other branches use
ErrorPhase.METADATA_FETCHING; aligning here improves tracing and alerting. Message content is fine.- raise WrenError( - ErrorCode.GENERIC_USER_ERROR, "Failed to get DuckDB version" - ) + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + "Failed to get DuckDB version", + phase=ErrorPhase.METADATA_FETCHING, + ) @@ - raise WrenError(ErrorCode.GENERIC_USER_ERROR, "DuckDB version is empty") + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + "DuckDB version is empty", + phase=ErrorPhase.METADATA_FETCHING, + )ibis-server/app/mdl/rewriter.py (5)
50-60: Capture dialect metadata and attachcausefor SQLGlot failures.Adding
read/writeto metadata materially improves debuggability in telemetry, andcausepreserves the original stack for logs while keeping the client message concise.try: read = self._get_read_dialect(self.experiment) write = self._get_write_dialect(self.data_source) return sqlglot.transpile(planned_sql, read=read, write=write)[0] except Exception as e: - raise WrenError( - ErrorCode.SQLGLOT_ERROR, - str(e), - phase=ErrorPhase.SQL_TRANSPILE, - metadata={PLANNED_SQL: planned_sql}, - ) + raise WrenError( + ErrorCode.SQLGLOT_ERROR, + str(e), + phase=ErrorPhase.SQL_TRANSPILE, + metadata={PLANNED_SQL: planned_sql, "readDialect": read, "writeDialect": write}, + cause=e, + )
112-125: Consider classifying engine connectivity/timeouts as external errors and set phase.Using
LEGACY_ENGINE_ERROR(101) maps to HTTP 500. For network connectivity/timeouts to an external service,GENERIC_EXTERNAL_ERROR(maps to 502) may better communicate the nature of the failure to clients. Also attachphase=SQL_PLANNINGandcausefor parity with other error sites.If API consumers depend on 500 today, confirm before changing. Proposed diff:
- except httpx.ConnectError as e: - raise WrenError( - ErrorCode.LEGACY_ENGINE_ERROR, f"Can not connect to Java Engine: {e}" - ) + except httpx.ConnectError as e: + raise WrenError( + ErrorCode.GENERIC_EXTERNAL_ERROR, + f"Cannot connect to Java Engine: {e}", + phase=ErrorPhase.SQL_PLANNING, + cause=e, + ) except httpx.TimeoutException as e: - raise WrenError( - ErrorCode.LEGACY_ENGINE_ERROR, - f"Timeout when connecting to Java Engine: {e}", - ) + raise WrenError( + ErrorCode.GENERIC_EXTERNAL_ERROR, + f"Timeout when connecting to Java Engine: {e}", + phase=ErrorPhase.SQL_PLANNING, + cause=e, + ) except httpx.HTTPStatusError as e: - raise WrenError( - ErrorCode.INVALID_SQL, e.response.text, ErrorPhase.SQL_PLANNING - ) + raise WrenError( + ErrorCode.INVALID_SQL, e.response.text, ErrorPhase.SQL_PLANNING, cause=e + )
149-150: Attachcauseto preserve stack on embedded planning errors.Minor enhancement; keeps client message stable and observability richer.
- raise WrenError(ErrorCode.INVALID_SQL, str(e), ErrorPhase.SQL_PLANNING) + raise WrenError( + ErrorCode.INVALID_SQL, str(e), ErrorPhase.SQL_PLANNING, cause=e + )
162-163: Same as above for sync path — includecause.- raise WrenError(ErrorCode.INVALID_SQL, str(e), ErrorPhase.SQL_PLANNING) + raise WrenError( + ErrorCode.INVALID_SQL, str(e), ErrorPhase.SQL_PLANNING, cause=e + )
180-181: Includecausefor extraction errors.Consistent with other error sites; retains original exception context in logs/traces.
- raise WrenError(ErrorCode.INVALID_MDL, str(e), ErrorPhase.MDL_EXTRACTION) + raise WrenError( + ErrorCode.INVALID_MDL, str(e), ErrorPhase.MDL_EXTRACTION, cause=e + )ibis-server/tests/routers/v2/connector/test_postgres.py (2)
598-600: Make the assertion robust; avoid stringifyingrulesin full.The full
rulesmapping can be order/version-dependent; assert on a stable substring instead.- assert ( - response.json()["message"] - == f"The rule `unknown_rule` is not in the rules, rules: {rules}" - ) + msg = response.json()["message"] + assert "The rule `unknown_rule` is not in the rules" in msg
685-687: Add errorCode/phase checks for consistency with preceding test.Keeps all validation-parameter tests aligned on the same invariants.
- assert response.status_code == 422 - assert response.json()["message"] == "modelName is required" + assert response.status_code == 422 + res = response.json() + assert res["errorCode"] == ErrorCode.VALIDATION_PARAMETER_ERROR.name + assert res["phase"] == ErrorPhase.VALIDATION.name + assert res["message"] == "modelName is required"ibis-server/tests/routers/v2/connector/test_mysql.py (1)
364-368: Use substring matching for driver error messages.Exact MySQL driver error strings can vary across versions/platforms. Keep the check resilient.
- assert response.status_code == 422 - result = response.json() - assert result["errorCode"] == error_code.name - assert result["message"] == expected_error + assert response.status_code == 422 + result = response.json() + assert result["errorCode"] == error_code.name + # Driver messages vary by version; assert containment for stability. + assert expected_error in result["message"]ibis-server/tests/routers/v3/connector/postgres/test_validate.py (4)
43-45: Stabilize assertion: prefer error_code and substring checks over full message equalityComparing the entire message string (including the dynamic
rulesdict) is brittle and can fail on minor formatting/order changes. Since the API now returns structured errors, assert the error_code and a stable substring instead. Also consider checking the presence of the correlation ID header.- assert ( - response.json()["message"] - == f"The rule `unknown_rule` is not in the rules, rules: {rules}" - ) + result = response.json() + assert result["error_code"] == "VALIDATION_RULE_NOT_FOUND" + assert "The rule `unknown_rule` is not in the rules" in result["message"] + # Optional: ensure correlation id is propagated + assert response.headers.get("x-correlation-id")
111-111: Leverage structured payload: assert error_code for parameter errorsYou can strengthen this test by asserting the specific error_code that maps to parameter issues.
- assert response.json()["message"] == "columnName is required" + result = response.json() + assert result["error_code"] == "VALIDATION_PARAMETER_ERROR" + assert result["message"] == "columnName is required"
122-122: Likewise, assert the parameter error_code for modelNameSame reasoning as above.
- assert response.json()["message"] == "modelName is required" + result = response.json() + assert result["error_code"] == "VALIDATION_PARAMETER_ERROR" + assert result["message"] == "modelName is required"
175-179: Prefer structured checks for RLAC validation failureThis is a good, clear message. To make the test more resilient (and aligned with the new WrenError contract), assert the error_code and a stable substring.
- assert ( - response.json()["message"] - == "Error during planning: The session property @session_not_found is used for `rlac_validation` rule, but not found in the session properties" - ) + result = response.json() + assert result["error_code"] == "VALIDATION_ERROR" + assert "@session_not_found" in result["message"] + # Optional: ensure correlation id is propagated + assert response.headers.get("x-correlation-id")ibis-server/wren/session/__init__.py (6)
12-12: Import ErrorPhase to enrich error context in Task errorsSince the error model supports phases and other modules already set them, import ErrorPhase here so Task can annotate failures with the appropriate phase.
-from app.model.error import ErrorCode, WrenError +from app.model.error import ErrorCode, ErrorPhase, WrenError
139-143: Use phase and correct guidance in dry_run precondition error“transpile()” isn’t a method on Task; users should call plan() (or Context.sql(...).plan/execute). Also add phase to help downstream tooling.
- raise WrenError( - ErrorCode.GENERIC_USER_ERROR, - "Dialect SQL is not set. Call transpile() first.", - ) + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + "Dialect SQL is not set. Call plan() first.", + phase=ErrorPhase.SQL_DRY_RUN, + )
153-158: Return a more precise code for missing connection info and set phaseINVALID_CONNECTION_INFO is available and communicates the issue better than GENERIC_USER_ERROR. Also set the execution phase.
- if self.context.connection_info is None: - raise WrenError( - ErrorCode.GENERIC_USER_ERROR, - "Connection info is not set. Cannot execute without connection info.", - ) + if self.context.connection_info is None: + raise WrenError( + ErrorCode.INVALID_CONNECTION_INFO, + "Connection info is not set. Cannot execute without connection info.", + phase=ErrorPhase.SQL_EXECUTION, + )
158-163: Align execute precondition with phase and correct guidanceSame as dry_run: update guidance to “plan()” and include phase.
- if self.dialect_sql is None: - raise WrenError( - ErrorCode.GENERIC_USER_ERROR, - "Dialect SQL is not set. Call transpile() first.", - ) + if self.dialect_sql is None: + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + "Dialect SQL is not set. Call plan() first.", + phase=ErrorPhase.SQL_EXECUTION, + )
168-173: Optionally, include phase for result formatting preconditionMinor consistency tweak: carry the SQL_EXECUTION phase into this guard too.
- if self.results is None: - raise WrenError( - ErrorCode.GENERIC_USER_ERROR, - "Results are not set. Call execute() first.", - ) + if self.results is None: + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + "Results are not set. Call execute() first.", + phase=ErrorPhase.SQL_EXECUTION, + )
76-88: Typo: _lowrcase_properties → _lowercase_propertiesRenaming improves readability and avoids leaking typos into your public API if this method ever becomes referenced elsewhere. Low-impact change but requires updating the call site.
- self.properties: dict = self._lowrcase_properties(properties) + self.properties: dict = self._lowercase_properties(properties) @@ - def _lowrcase_properties(self, properties: dict | None) -> dict: + def _lowercase_properties(self, properties: dict | None) -> dict:ibis-server/wren/__main__.py (4)
15-17: Enforce both required CLI args in usage checkThe CLI requires <data_source> and <mdl_path>. Adjust the guard to require at least two args beyond the module name.
- if len(sys.argv) < 2: + if len(sys.argv) < 3: print("Usage: python -m wren <data_source> <mdl_path> [connection_info_path]") # noqa: T201 sys.exit(1)
26-33: Replace unreadable-file check with structured exception handling; validate JSON
f.readable()will always be True for a successfully opened file in read mode. Prefer catching FileNotFoundError and JSONDecodeError, and return richer WrenErrors (use INVALID_CONNECTION_INFO for malformed JSON).- with open(connection_info_path := sys.argv[3]) as f: - if not f.readable(): - raise WrenError( - ErrorCode.GENERIC_USER_ERROR, - f"Cannot read connection info file at {connection_info_path}", - ) - - connection_info = json.load(f) + try: + with open(connection_info_path := sys.argv[3]) as f: + connection_info = json.load(f) + except FileNotFoundError as e: + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + f"Cannot read connection info file at {connection_info_path}", + ) from e + except json.JSONDecodeError as e: + raise WrenError( + ErrorCode.INVALID_CONNECTION_INFO, + f"Invalid JSON in connection info file at {connection_info_path}: {e}", + ) from e
21-22: Harden invalid data_source handlingConstructing the DataSource enum with an invalid value raises ValueError and will produce an unfriendly traceback in a CLI. Wrap and re-raise as WrenError to keep UX consistent.
- data_source = DataSource(data_source) + try: + data_source = DataSource(data_source) + except ValueError as e: + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + f"Unknown data_source: {data_source}", + ) from e
46-60: Graceful CLI error handling for WrenErrorCatching WrenError at the CLI boundary avoids stack traces and prints a clean message before exiting with a non-zero code.
- session = create_session_context( - data_source=data_source.name, - connection_info=connection_info, - mdl_path=mdl_path, - ) - print(f"Session created: {session}") # noqa: T201 - print("You can now interact with the Wren session using the 'wren' variable:") # noqa: T201 - print("> task = wren.sql('SELECT * FROM your_table').execute()") # noqa: T201 - print("> print(task.results)") # noqa: T201 - print("> print(task.formatted_result())") # noqa: T201 - code.interact(local={"wren": session}) + try: + session = create_session_context( + data_source=data_source.name, + connection_info=connection_info, + mdl_path=mdl_path, + ) + except WrenError as e: + print(f"[{e.error_code.name}] {e.message}", file=sys.stderr) # noqa: T201 + sys.exit(2) + print(f"Session created: {session}") # noqa: T201 + print("You can now interact with the Wren session using the 'wren' variable:") # noqa: T201 + print("> task = wren.sql('SELECT * FROM your_table').execute()") # noqa: T201 + print("> print(task.results)") # noqa: T201 + print("> print(task.formatted_result())") # noqa: T201 + code.interact(local={"wren": session})ibis-server/wren/__init__.py (1)
55-56: Optionally normalize invalid data_source earlyCreating the DataSource enum can raise ValueError. Consider wrapping this to produce a WrenError with a clear message.
- data_source = DataSource(data_source) + try: + data_source = DataSource(data_source) + except ValueError as e: + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + f"Unknown data_source: {data_source}", + ) from eibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py (4)
106-106: Consider asserting the error payload (message/error_code) for x-user-catalog-only caseYou already validate the 404. Tighten this by asserting the structured error fields to prevent regressions in error mapping.
- assert response.status_code == 404 + assert response.status_code == 404 + payload = response.json() + assert payload["error_code"] in ("NOT_FOUND", "MDL_NOT_FOUND") + assert response.headers.get("x-correlation-id")
138-138: Likewise, assert structured error fields for x-user-schema-only caseMirror the above for symmetry and stronger guarantees.
- assert response.status_code == 404 + assert response.status_code == 404 + payload = response.json() + assert payload["error_code"] in ("NOT_FOUND", "MDL_NOT_FOUND") + assert response.headers.get("x-correlation-id")
286-288: Mirror the structured assertions for the headers-based requestSame enhancement as above for consistency.
- assert response.status_code == 404 - assert response.json()["message"] == 'Model not found: "Nation"' + assert response.status_code == 404 + payload = response.json() + assert payload["message"] == 'Model not found: "Nation"' + assert payload["error_code"] in ("NOT_FOUND", "MDL_NOT_FOUND") # set exact one after verifying + assert response.headers.get("x-correlation-id")
270-272: Augment “Model not found” test with precise errorCode and correlationIdThe
substituteimplementation always raisesWrenErrorwithErrorCode.NOT_FOUNDfor missing models, and the response model uses the aliaserrorCodeandcorrelationId. Let’s update the assertions accordingly:tests/routers/v3/connector/postgres/test_model_substitute.py @@ async def test_model_substitute_non_existent_model(...): - assert response.status_code == 404 - assert response.json()["message"] == 'Model not found: "Nation"' + assert response.status_code == 404 + payload = response.json() + # message remains unchanged + assert payload["message"] == 'Model not found: "Nation"' + # implementation always uses NOT_FOUND for missing models + assert payload["errorCode"] == "NOT_FOUND" # ErrorCode.NOT_FOUND + # correlationId is populated via the exception handler + assert "correlationId" in payload and payload["correlationId"] is not None + # the CorrelationIdMiddleware should also set the header + assert response.headers.get("x-correlation-id")ibis-server/tests/routers/v2/connector/test_oracle.py (1)
419-422: Strengthen JSON error assertions (also assert errorCode, phase, correlationId).You’ve switched to structured WrenError JSON; let’s assert the key fields too for better coverage and to prevent regressions.
assert response.status_code == 422 - assert ( - response.json()["message"] - == 'Ambiguous model: found multiple matches for "ORDERS"' - ) + data = response.json() + assert data["message"] == 'Ambiguous model: found multiple matches for "ORDERS"' + assert data["errorCode"] == "GENERIC_USER_ERROR" + assert data.get("phase") == "SQL_SUBSTITUTE" + assert data.get("correlationId") is not Noneibis-server/app/mdl/substitute.py (2)
43-47: Add useful metadata to WrenError for ambiguous model to aid debugging.Include the offending source and duplicate keys in the error metadata. This improves observability without changing behavior.
- raise WrenError( - ErrorCode.GENERIC_USER_ERROR, - f"Ambiguous model: found multiple matches for {source}", - phase=ErrorPhase.SQL_SUBSTITUTE, - ) + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + f"Ambiguous model: found multiple matches for {source}", + phase=ErrorPhase.SQL_SUBSTITUTE, + metadata={"source": str(source), "duplicateKeys": duplicate_keys}, + )
50-55: Attach context metadata for “Model not found”.Carrying the source that failed resolution in metadata makes operational debugging faster and keeps client messages stable.
- raise WrenError( - ErrorCode.NOT_FOUND, - f"Model not found: {source}", - phase=ErrorPhase.SQL_SUBSTITUTE, - ) + raise WrenError( + ErrorCode.NOT_FOUND, + f"Model not found: {source}", + phase=ErrorPhase.SQL_SUBSTITUTE, + metadata={"source": str(source)}, + )ibis-server/app/model/validator.py (4)
25-29: Set phase=VALIDATION when the rule is unknown.This aligns with the rest of the validator and improves downstream telemetry consistency.
- raise WrenError( - ErrorCode.VALIDATION_RULE_NOT_FOUND, - f"The rule `{rule}` is not in the rules, rules: {rules}", - ) + raise WrenError( + ErrorCode.VALIDATION_RULE_NOT_FOUND, + f"The rule `{rule}` is not in the rules, rules: {rules}", + phase=ErrorPhase.VALIDATION, + )
147-160: Prefer deterministic DataFrame accessors over Series.get().Series.get(0) depends on the index label and can be brittle. Use .iat for positional access.
- output["result"] = str(result.get("result").get(0)) - output["is_related"] = str(result.get("is_related").get(0)) + output["result"] = str(result["result"].iat[0]) + output["is_related"] = str(result["is_related"].iat[0]) if result.get("left_table_unique") is not None: - output["left_table_unique"] = str( - result.get("left_table_unique").get(0) - ) + output["left_table_unique"] = str(result["left_table_unique"].iat[0]) if result.get("right_table_unique") is not None: - output["right_table_unique"] = str( - result.get("right_table_unique").get(0) - ) + output["right_table_unique"] = str(result["right_table_unique"].iat[0])
171-176: Include validation details in “relationship is not valid” metadata.This keeps the message stable and transports machine-readable diagnostics.
- raise WrenError( - ErrorCode.VALIDATION_ERROR, - f"Relationship {relationship_name} is not valid: {format_result(result)}", - phase=ErrorPhase.VALIDATION, - ) + raise WrenError( + ErrorCode.VALIDATION_ERROR, + f"Relationship {relationship_name} is not valid", + phase=ErrorPhase.VALIDATION, + metadata={ + "relationship": relationship_name, + "details": format_result(result), + }, + )
237-239: Preserve exception chaining for RLAC syntax validation.Add “from e” to keep the original traceback for ops while maintaining user-friendly payloads.
- raise WrenError( - ErrorCode.VALIDATION_ERROR, str(e), phase=ErrorPhase.VALIDATION - ) + raise WrenError( + ErrorCode.VALIDATION_ERROR, str(e), phase=ErrorPhase.VALIDATION + ) from eibis-server/app/main.py (1)
48-51: Centralize “X-Correlation-ID” constant usageWe’ve defined
X_CORRELATION_IDinapp/dependencies.pyand wired it intoCorrelationIdMiddleware, but elsewhere the literal"X-Correlation-ID"is still used. To avoid duplication and potential drift, import and use the constant in all places:• app/middleware/init.py (line 12):
• Changerequest.headers.get("X-Correlation-ID")→request.headers.get(X_CORRELATION_ID)
• app/util.py (lines 145–146):
• Changeif "X-Correlation-ID" in header→if X_CORRELATION_ID in header
• Changeheader["X-Correlation-ID"]→header[X_CORRELATION_ID]Example diff:
--- a/ibis-server/app/middleware/__init__.py +++ b/ibis-server/app/middleware/__init__.py @@ -from starlette.requests import Request +from starlette.requests import Request +from app.dependencies import X_CORRELATION_ID @@ async def dispatch(self, request: Request, call_next) -> Response: - correlation_id = request.headers.get("X-Correlation-ID") + correlation_id = request.headers.get(X_CORRELATION_ID)--- a/ibis-server/app/util.py +++ b/ibis-server/app/util.py @@ from opentelemetry.trace import Span from app.dependencies import X_CORRELATION_ID @@ def extract_and_set_correlation_id(header: Mapping[str, str], span: Span) -> None: - if "X-Correlation-ID" in header: - span.set_attribute("correlation_id", header["X-Correlation-ID"]) + if X_CORRELATION_ID in header: + span.set_attribute("correlation_id", header[X_CORRELATION_ID])This ensures a single source of truth for the header name and prevents accidental mismatches down the road.
ibis-server/app/model/connector.py (2)
461-470: Push down LIMIT into DuckDB instead of slicing in memory.Slicing fetches the full result first, which is wasteful for large datasets. Compose a limited SQL instead.
- if limit is None: - # If no limit is specified, we return the full result - return self.connection.execute(sql).fetch_arrow_table() - else: - # If a limit is specified, we slice the result - # DuckDB does not support LIMIT in fetch_arrow_table, so we use slice - # to limit the number of rows returned - return self.connection.execute(sql).fetch_arrow_table().slice(length=limit) + if limit is None: + return self.connection.execute(sql).fetch_arrow_table() + # Push down limit to the engine to avoid large transfers + limited_sql = f"SELECT * FROM ({sql}) AS sub LIMIT {int(limit)}" + return self.connection.execute(limited_sql).fetch_arrow_table()
511-513: Set an error phase for DuckDB file listing failures.Phase=METADATA_FETCHING clarifies where the error occurred.
- raise WrenError( - ErrorCode.GENERIC_USER_ERROR, f"Failed to list files: {e!s}" - ) + raise WrenError( + ErrorCode.GENERIC_USER_ERROR, + f"Failed to list files: {e!s}", + phase=ErrorPhase.METADATA_FETCHING, + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (34)
ibis-server/app/dependencies.py(1 hunks)ibis-server/app/main.py(3 hunks)ibis-server/app/mdl/analyzer.py(3 hunks)ibis-server/app/mdl/java_engine.py(2 hunks)ibis-server/app/mdl/rewriter.py(6 hunks)ibis-server/app/mdl/substitute.py(2 hunks)ibis-server/app/model/__init__.py(0 hunks)ibis-server/app/model/connector.py(8 hunks)ibis-server/app/model/data_source.py(4 hunks)ibis-server/app/model/error.py(1 hunks)ibis-server/app/model/metadata/canner.py(2 hunks)ibis-server/app/model/metadata/object_storage.py(3 hunks)ibis-server/app/model/validator.py(6 hunks)ibis-server/app/routers/v3/connector.py(1 hunks)ibis-server/app/util.py(1 hunks)ibis-server/tests/routers/v2/connector/test_athena.py(0 hunks)ibis-server/tests/routers/v2/connector/test_bigquery.py(0 hunks)ibis-server/tests/routers/v2/connector/test_canner.py(0 hunks)ibis-server/tests/routers/v2/connector/test_clickhouse.py(2 hunks)ibis-server/tests/routers/v2/connector/test_gcs_file.py(1 hunks)ibis-server/tests/routers/v2/connector/test_local_file.py(1 hunks)ibis-server/tests/routers/v2/connector/test_mssql.py(0 hunks)ibis-server/tests/routers/v2/connector/test_mysql.py(2 hunks)ibis-server/tests/routers/v2/connector/test_oracle.py(1 hunks)ibis-server/tests/routers/v2/connector/test_postgres.py(12 hunks)ibis-server/tests/routers/v2/connector/test_redshift.py(0 hunks)ibis-server/tests/routers/v2/connector/test_s3_file.py(1 hunks)ibis-server/tests/routers/v2/connector/test_snowflake.py(0 hunks)ibis-server/tests/routers/v2/connector/test_trino.py(0 hunks)ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py(4 hunks)ibis-server/tests/routers/v3/connector/postgres/test_validate.py(4 hunks)ibis-server/wren/__init__.py(3 hunks)ibis-server/wren/__main__.py(2 hunks)ibis-server/wren/session/__init__.py(3 hunks)
💤 Files with no reviewable changes (8)
- ibis-server/tests/routers/v2/connector/test_bigquery.py
- ibis-server/tests/routers/v2/connector/test_canner.py
- ibis-server/tests/routers/v2/connector/test_trino.py
- ibis-server/tests/routers/v2/connector/test_snowflake.py
- ibis-server/tests/routers/v2/connector/test_athena.py
- ibis-server/tests/routers/v2/connector/test_redshift.py
- ibis-server/app/model/init.py
- ibis-server/tests/routers/v2/connector/test_mssql.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-18T02:23:34.040Z
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object without requiring conn.register().
Applied to files:
ibis-server/app/model/connector.py
🧬 Code graph analysis (17)
ibis-server/app/model/metadata/canner.py (1)
ibis-server/app/model/error.py (2)
ErrorCode(19-38)WrenError(65-113)
ibis-server/wren/__init__.py (1)
ibis-server/app/model/error.py (2)
ErrorCode(19-38)WrenError(65-113)
ibis-server/app/model/data_source.py (1)
ibis-server/app/model/error.py (2)
ErrorCode(19-38)WrenError(65-113)
ibis-server/app/routers/v3/connector.py (1)
ibis-server/app/model/error.py (1)
DatabaseTimeoutError(116-125)
ibis-server/wren/session/__init__.py (2)
ibis-server/app/model/error.py (2)
ErrorCode(19-38)WrenError(65-113)ibis-server/app/model/connector.py (7)
query(85-110)query(155-160)query(273-277)query(340-347)query(403-441)query(461-469)query(558-566)
ibis-server/wren/__main__.py (1)
ibis-server/app/model/error.py (2)
ErrorCode(19-38)WrenError(65-113)
ibis-server/app/mdl/substitute.py (1)
ibis-server/app/model/error.py (3)
ErrorCode(19-38)ErrorPhase(41-52)WrenError(65-113)
ibis-server/app/util.py (1)
ibis-server/app/model/error.py (1)
DatabaseTimeoutError(116-125)
ibis-server/app/model/metadata/object_storage.py (1)
ibis-server/app/model/error.py (3)
ErrorCode(19-38)ErrorPhase(41-52)WrenError(65-113)
ibis-server/tests/routers/v2/connector/test_postgres.py (2)
ibis-server/app/model/error.py (2)
ErrorCode(19-38)ErrorPhase(41-52)ibis-server/tests/conftest.py (1)
client(18-23)
ibis-server/tests/routers/v2/connector/test_clickhouse.py (1)
ibis-server/app/model/error.py (1)
ErrorCode(19-38)
ibis-server/app/mdl/analyzer.py (1)
ibis-server/app/model/error.py (2)
ErrorCode(19-38)WrenError(65-113)
ibis-server/app/mdl/rewriter.py (2)
ibis-server/app/model/error.py (3)
ErrorCode(19-38)ErrorPhase(41-52)WrenError(65-113)ibis-server/wren/session/__init__.py (2)
_get_read_dialect(119-120)_get_write_dialect(122-134)
ibis-server/app/model/connector.py (2)
ibis-server/app/model/error.py (3)
ErrorCode(19-38)ErrorPhase(41-52)WrenError(65-113)ibis-server/wren/session/__init__.py (3)
sql(37-52)dry_run(136-143)execute(145-164)
ibis-server/tests/routers/v2/connector/test_mysql.py (2)
ibis-server/app/model/error.py (1)
ErrorCode(19-38)ibis-server/app/model/__init__.py (1)
SSLMode(523-526)
ibis-server/app/model/validator.py (3)
ibis-server/app/model/error.py (3)
ErrorCode(19-38)ErrorPhase(41-52)WrenError(65-113)ibis-server/app/mdl/rewriter.py (3)
rewrite(63-72)rewrite(107-124)rewrite(136-149)ibis-server/app/model/connector.py (6)
dry_run(112-136)dry_run(194-195)dry_run(303-320)dry_run(366-368)dry_run(472-473)dry_run(569-571)
ibis-server/app/main.py (3)
ibis-server/app/mdl/java_engine.py (1)
JavaEngineConnector(13-68)ibis-server/app/middleware/__init__.py (1)
RequestLogMiddleware(10-44)ibis-server/app/model/error.py (5)
ErrorCode(19-38)ErrorResponse(55-62)WrenError(65-113)get_http_status_code(96-113)get_response(86-94)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
ibis-server/app/model/connector.py (1)
412-451: Bug: Returns pandas DataFrame where pyarrow.Table is expected; enforce Arrow return with schema.In the “Must pass schema” branch, query currently returns a pandas DataFrame, violating the declared return type (pa.Table) and the contract used by other connectors. Convert the empty result into a correctly-typed Arrow table.
Apply:
except ValueError as e: # Import here to avoid override the custom datatypes import ibis.backends.bigquery # noqa: PLC0415 @@ if "Must pass schema" in str(e): with tracer.start_as_current_span( "get_schema", kind=trace.SpanKind.CLIENT ): credits_json = loads( base64.b64decode( self.connection_info.credentials.get_secret_value() ).decode("utf-8") ) credentials = service_account.Credentials.from_service_account_info( credits_json ) credentials = credentials.with_scopes( [ "https://www.googleapis.com/auth/drive", "https://www.googleapis.com/auth/cloud-platform", ] ) client = bigquery.Client(credentials=credentials) ibis_schema_mapper = ibis.backends.bigquery.BigQuerySchema() bq_fields = client.query(sql).result() - ibis_fields = ibis_schema_mapper.to_ibis(bq_fields.schema) - return pd.DataFrame(columns=ibis_fields.names) + ibis_fields = ibis_schema_mapper.to_ibis(bq_fields.schema) + # Build an empty Arrow table with the correct schema to avoid the ibis empty-result bug + pa_fields = [ + pa.field(name, dtype.to_pyarrow(), nullable=True) + for name, dtype in zip(ibis_fields.names, ibis_fields.types) + ] + pa_schema = pa.schema(pa_fields) + empty_cols = [pa.array([], type=f.type) for f in pa_schema] + return pa.Table.from_arrays(empty_cols, schema=pa_schema) else: raise eFollow-up: consider caching the schema mapping for repeated “dry” queries of the same SQL if this path is hot.
ibis-server/app/util.py (1)
257-272: Exceptions are being swallowed if they’re not recognized as timeoutsIn execute_with_timeout, non-timeout ClickHouseDbError and TrinoQueryError are caught but not re-raised when they don’t match the timeout condition. This causes the function to return None silently, masking real errors and breaking callers that expect a result. The timeout branch for asyncio.wait_for should also explicitly catch asyncio.TimeoutError to be unambiguous.
Apply this diff to ensure non-timeout errors propagate and to robustly catch asyncio timeouts:
async def execute_with_timeout(operation, operation_name: str): """Asynchronously execute an operation with a timeout.""" try: return await asyncio.wait_for(operation, timeout=app_timeout_seconds) - except TimeoutError: + except (asyncio.TimeoutError, TimeoutError): raise DatabaseTimeoutError( f"{operation_name} timeout after {app_timeout_seconds} seconds" ) except ClickHouseDbError as e: if "TIMEOUT_EXCEEDED" in str(e): raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}") + raise except trino.exceptions.TrinoQueryError as e: if e.error_name == "EXCEEDED_TIME_LIMIT": raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}") + raise except psycopg.errors.QueryCanceled as e: raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}")
♻️ Duplicate comments (2)
ibis-server/app/model/connector.py (2)
11-19: Good guard: ClickHouse made optional with a sentinel error type.This addresses the “optional ClickHouse dependency” concern and avoids import-time failures on non-ClickHouse deployments.
If you want to be stricter, narrow the exception to ImportError to avoid masking unrelated import side effects:
-try: +try: import clickhouse_connect ClickHouseDbError = clickhouse_connect.driver.exceptions.DatabaseError -except Exception: # pragma: no cover +except ImportError: # pragma: no cover class ClickHouseDbError(Exception): pass
95-120: ClickHouse execution-phase fix is correct; verify upstream timeout mapping.
- The phase for ClickHouse non-timeout errors is now SQL_EXECUTION — this fixes the earlier mislabel (was SQL_DRY_RUN). Good.
- TIMEOUT_EXCEEDED is re-raised raw (plus Trino/QueryCanceled). Ensure the API layer (FastAPI error handler) maps these to WrenError with ErrorCode.DATABASE_TIMEOUT; otherwise clients will see backend-specific exceptions.
Run to confirm there is a central handler translating these exceptions to WrenError.DATABASE_TIMEOUT:
#!/bin/bash # Find FastAPI exception handlers or middleware mapping DB timeouts to WrenError rg -n -C3 -P 'TrinoQueryError|QueryCanceled|TIMEOUT_EXCEEDED|DATABASE_TIMEOUT|exception_handler' --type=py
🧹 Nitpick comments (9)
ibis-server/tests/routers/v2/test_relationship_valid.py (3)
134-134: Validate standardized error payload and relax exact message assertionTo make this test more robust against future wording changes and to ensure the full error schema is correct, update the assertion as follows:
- assert response.json()["message"] == "relationshipName is required" + # parse the JSON payload once + payload = response.json() + + # assert the canonical error code and phase for missing parameters + assert payload["errorCode"] == ErrorCode.VALIDATION_PARAMETER_ERROR.name + assert payload["phase"] == ErrorPhase.VALIDATION.name + + # relax the message check to a substring match + assert "relationshipName is required" in payload["message"] + + # validate correlation id in both payload and response header + assert "correlationId" in payload + assert response.headers.get("X-Correlation-ID") == payload["correlationId"]Key updates:
- Use
.nameonErrorCode.VALIDATION_PARAMETER_ERRORandErrorPhase.VALIDATIONto lock in the right error semantics.- Change the message assertion to a substring check so minor wording tweaks won’t break this test.
- Confirm the presence of
correlationIdin the payload and that it matches theX-Correlation-IDheader.
150-151: Stabilize test assertion on relationship validation error messageThis test is currently tied to the exact formatting of Python’s
dict→strconversion, which makes it brittle (key order, quotes, spacing). Instead, verify only the stable prefix and the presence of important tokens. For example, update the snippet intest_relationship_valid.pyat lines 150–151 as follows:- # brittle exact match on stringified dict - assert response.json()["message"] - == "Relationship invalid_t1_many_t2_id is not valid: {'result': 'False', 'is_related': 'True', 'left_table_unique': 'False', 'right_table_unique': 'True'}" + # check only the stable prefix… + msg = response.json()["message"] + assert msg.startswith("Relationship invalid_t1_many_t2_id is not valid:") + + # …and verify key tokens without hard-coding the dict formatting + for key in ["left_table_unique", "right_table_unique", "is_related"]: + assert key in msgIf you’d like to surface the structured error code and phase (rather than overloading
message), you can also assert on the top-level fields emitted in the JSON payload:payload = response.json() assert payload["errorCode"] == ErrorCode.VALIDATION_ERROR.name assert payload["phase"] == ErrorPhase.VALIDATION.name
121-121: Update test to assert structured error fields and relax message checkPlease update
ibis-server/tests/routers/v2/test_relationship_valid.pyaround line 121 as follows:• Replace the exact message assertion with an “in” check:
- assert response.json()["message"] == "Relationship not_found not found in manifest" + payload = response.json() + # message contains relationship name and not-found text + assert "Relationship not_found not found in manifest" in payload["message"]• Ensure you import the enums at the top of the file if not already present:
from app.model.error import ErrorCode, ErrorPhase• Add assertions for the structured error fields and correlation ID:
# structured fields assert payload["errorCode"] == ErrorCode.VALIDATION_PARAMETER_ERROR.name assert payload["phase"] == ErrorPhase.VALIDATION.name # correlation ID present and matches header assert "correlationId" in payload assert response.headers.get("X-Correlation-ID") == payload["correlationId"]These changes make the test more resilient to minor wording tweaks and lock in the new error contract.
ibis-server/app/model/connector.py (3)
471-479: Avoid fetching full result then slicing in-memory for DuckDB LIMIT.Current approach executes the full query and slices the Arrow table, which is wasteful for large datasets. Push the LIMIT into SQL.
Apply:
def query(self, sql: str, limit: int | None) -> pa.Table: - if limit is None: - # If no limit is specified, we return the full result - return self.connection.execute(sql).fetch_arrow_table() - else: - # If a limit is specified, we slice the result - # DuckDB does not support LIMIT in fetch_arrow_table, so we use slice - # to limit the number of rows returned - return self.connection.execute(sql).fetch_arrow_table().slice(length=limit) + if limit is None: + return self.connection.execute(sql).fetch_arrow_table() + # Push-down LIMIT to DuckDB to reduce compute and memory + limited_sql = f"SELECT * FROM ({sql}) AS _sub LIMIT {int(limit)}" + return self.connection.execute(limited_sql).fetch_arrow_table()
487-490: Clear DUCKDB_FILE_NOT_FOUND classification.Good use of a specific code instead of a generic error. Consider adding a phase (e.g., METADATA_FETCHING) for observability, though optional.
557-560: Invalid Redshift connection info → GENERIC_INTERNAL_ERROR is appropriate.Clear signal for a programming/config error as opposed to user input.
If RedshiftConnectionInfo.port is a SecretStr of a string, consider int() casting at callsite to avoid implicit coercions in the driver.
ibis-server/tests/routers/v2/connector/test_minio_file.py (3)
250-252: Align dry-run negative case with structured JSON errorsThis still checks only text presence. Assert JSON structure and correlation header to stay consistent with WrenError responses.
Apply this diff:
assert response.status_code == 422 - assert response.text is not None + payload = response.json() + assert "errorCode" in payload and "message" in payload + assert "correlationId" in payload and payload["correlationId"] + assert response.headers.get("X-Correlation-ID") == payload["correlationId"]
270-270: Also validate structured errors for invalid connection info (first request)Mirror the same structured checks here for consistency across tests.
Apply this diff:
assert response.status_code == 422 + payload = response.json() + assert "errorCode" in payload and "message" in payload + assert "correlationId" in payload and payload["correlationId"] + assert response.headers.get("X-Correlation-ID") == payload["correlationId"]
285-285: Also validate structured errors for invalid connection info (tables metadata request)Same rationale: assert body fields and header correlation.
Apply this diff:
assert response.status_code == 422 + payload = response.json() + assert "errorCode" in payload and "message" in payload + assert "correlationId" in payload and payload["correlationId"] + assert response.headers.get("X-Correlation-ID") == payload["correlationId"]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
ibis-server/app/model/connector.py(8 hunks)ibis-server/app/util.py(3 hunks)ibis-server/tests/routers/v2/connector/test_minio_file.py(1 hunks)ibis-server/tests/routers/v2/test_relationship_valid.py(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-18T02:23:34.040Z
Learnt from: goldmedal
PR: Canner/wren-engine#1224
File: ibis-server/app/util.py:49-56
Timestamp: 2025-06-18T02:23:34.040Z
Learning: DuckDB supports querying PyArrow Tables directly in SQL queries without needing to register them. When a pa.Table object is referenced in a FROM clause (e.g., "SELECT ... FROM df" where df is a pa.Table), DuckDB automatically handles the PyArrow object without requiring conn.register().
Applied to files:
ibis-server/app/model/connector.py
🧬 Code graph analysis (2)
ibis-server/app/model/connector.py (3)
ibis-server/app/util.py (1)
ClickHouseDbError(11-12)ibis-server/app/model/error.py (3)
ErrorCode(19-38)ErrorPhase(41-52)WrenError(65-113)ibis-server/wren/session/__init__.py (3)
sql(37-52)dry_run(136-143)execute(145-164)
ibis-server/app/util.py (2)
ibis-server/app/model/connector.py (1)
ClickHouseDbError(17-18)ibis-server/app/model/error.py (1)
DatabaseTimeoutError(116-125)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (9)
ibis-server/app/model/connector.py (7)
26-26: Imports align with downstream usage.psycopg and trino exceptions are referenced later in the centralized error wrapper; import locations look good.
Also applies to: 29-29
51-56: Consistent structured errors (WrenError) integration looks solid.Centralizing DIALECT_SQL, ErrorCode, ErrorPhase, WrenError here sets up clear, uniform error payloads.
121-146: Dry-run path mirrors query error policy; confirm consistent timeout translation.The SQL_DRY_RUN phase for ClickHouse non-timeout errors is correct. Same upstream timeout mapping verification applies here.
Use the same search from the previous comment to verify dry-run timeouts are converted to WrenError.DATABASE_TIMEOUT centrally.
319-329: MSSql dry-run workaround with structured errors is appropriate.
- INVALID_SQL for the known ibis AttributeError quirk, with a descriptive message derived via sys.dm_exec_describe_first_result_set — good call.
- IBIS_PROJECT_ERROR for other AttributeError cases with SQL_DRY_RUN phase — reasonable classification.
482-482: Direct DuckDB dry-run via execute is fine.Letting outer layers wrap exceptions into WrenError keeps concerns separated.
498-505: ATTACH errors mapped to ATTACH_DUCKDB_ERROR — looks right.Both IOException and HTTPException are covered; message is concise.
520-523: File listing failures mapped to GENERIC_USER_ERROR.Reasonable default. Ensure that e!s does not include sensitive paths for remote filesystems; if it does, consider redacting in the future.
ibis-server/tests/routers/v2/connector/test_minio_file.py (1)
351-354: Once we confirm how WrenError is serialized (i.e. which fields are included in the JSON payload and how the correlation ID header is mapped), we can refine the test assertion. Please share the output of the above inspection so we can finalize the strengthened assertion snippet.ibis-server/app/util.py (1)
44-44: Import path update looks correct and aligns with the new error modelUsing DatabaseTimeoutError from app.model.error is consistent with the WrenError-based centralized error handling introduced in this PR.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ibis-server/app/util.py (1)
257-268: Use asyncio.TimeoutError instead of built-in TimeoutError
asyncio.wait_forraisesasyncio.TimeoutError, not the built-inTimeoutError. The existingexcept TimeoutErrorclauses will miss real timeouts and won’t translate them intoDatabaseTimeoutError. Update all three occurrences inibis-server/app/util.py:
- At the primary
execute_with_timeoutwrapper (around line 261)- In the query-task cancellation block (around line 293)
- In the connection‐cleanup block (around line 305)
--- a/ibis-server/app/util.py @@ 258,264 - except TimeoutError: + except asyncio.TimeoutError: raise DatabaseTimeoutError( f"{operation_name} timeout after {app_timeout_seconds} seconds" ) @@ 290,296 - except (TimeoutError, asyncio.CancelledError): + except (asyncio.TimeoutError, asyncio.CancelledError): # Task didn't cancel in time or was cancelled logger.warning( f"{operation_name} task cancellation timed out or was cancelled" ) @@ 302,308 - except TimeoutError: + except asyncio.TimeoutError: logger.warning("Connection cleanup timed out")
♻️ Duplicate comments (1)
ibis-server/app/util.py (1)
5-12: Centralize ClickHouseDbError; avoid duplicate class definitions across modules.This file defines a fallback ClickHouseDbError while app/model/connector.py also defines one (lines 16–17 in that file). Split-brain exceptions can break except handlers when different imports are used. Move the guarded alias into a single shared module and import it here.
Apply in this file:
-try: - import clickhouse_connect - - ClickHouseDbError = clickhouse_connect.driver.exceptions.DatabaseError -except ImportError: # pragma: no cover - - class ClickHouseDbError(Exception): - pass +from app.exceptions import ClickHouseDbErrorNew module (add file app/exceptions.py):
try: import clickhouse_connect ClickHouseDbError = clickhouse_connect.driver.exceptions.DatabaseError except ImportError: # pragma: no cover class ClickHouseDbError(Exception): pass
🧹 Nitpick comments (1)
ibis-server/app/util.py (1)
265-272: Prefer structured timeout detection and preserve cause chaining.String matching ("TIMEOUT_EXCEEDED") is brittle. If the driver exposes structured fields (e.g., code/error_name), check those first; keep the string check as a fallback. Also chain the original exception for better diagnostics.
- except ClickHouseDbError as e: - if "TIMEOUT_EXCEEDED" in str(e): - raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}") + except ClickHouseDbError as e: + code = getattr(e, "code", None) or getattr(e, "error_code", None) + if code == 160 or "TIMEOUT_EXCEEDED" in str(e): + raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}") from e raise - except trino.exceptions.TrinoQueryError as e: - if e.error_name == "EXCEEDED_TIME_LIMIT": - raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}") + except trino.exceptions.TrinoQueryError as e: + if getattr(e, "error_name", "") == "EXCEEDED_TIME_LIMIT": + raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}") from e raise - except psycopg.errors.QueryCanceled as e: - raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}") + except psycopg.errors.QueryCanceled as e: + raise DatabaseTimeoutError(f"{operation_name} was cancelled: {e}") from e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
ibis-server/app/model/connector.py(8 hunks)ibis-server/app/util.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- ibis-server/app/model/connector.py
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/app/util.py (2)
ibis-server/app/model/connector.py (1)
ClickHouseDbError(17-18)ibis-server/app/model/error.py (1)
DatabaseTimeoutError(116-125)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/util.py (1)
44-44: Import path switch to WrenError-based DatabaseTimeoutError — LGTM.Correctly imports the unified error from app.model.error, aligning with the new structured ErrorResponse.
|
BigQuery has been tested on local |
Description
WrenErroris the error that wraps the original exception to unify the format of error response. It includes error code, error phase and some debugging informations. The response looks like the following format now:{ "errorCode": "INVALID_SQL", "message": "Error during planning: table 'wren.public.orders1' not found", "metadata": null, "phase": "SQL_PLANNING", "timestamp": "2025-08-25T14:05:34.573940", "correlationId": "55b4302c-9c45-4956-9060-bb0da82fbf62" }ErrorCodeis used to classify the error.messageis the original error messages.metadatacan collect some informations for debugging (e.g. dialect SQL or planned SQL)phasementions when the error occurs.Summary by CodeRabbit
New Features
Improvements
Bug Fixes