feat: add DuckDB data source with v3 metadata API support#1436
feat: add DuckDB data source with v3 metadata API support#1436
Conversation
- Add DataSource.duckdb enum value and LocalFileConnectionInfo mapping - Route duckdb through DuckDBConnector and DuckDBMetadata in factory - Add GET /v3/connector/duckdb/metadata/constraints endpoint - Add duckdb pytest marker - Add duckdb connector tests (query + metadata) with jaffle_shop fixture Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds DuckDB as a new data source: new QueryDuckDBDTO, DataSource.duckdb enum member, connector and metadata wiring to DuckDB, a POST /{data_source}/metadata/constraints endpoint, tightened filesystem path resolution, pytest marker, and DuckDB-focused tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Router as API Router (/v3/connector)
participant Factory as MetadataFactory
participant Metadata as DuckDBMetadata
participant FS as Filesystem
Client->>Router: POST /duckdb/metadata/constraints + connectionInfo
Router->>Factory: build connection_info, get_metadata(data_source=duckdb, connection_info)
Factory->>Metadata: instantiate DuckDBMetadata(connection_info)
Metadata->>FS: open resolved duckdb file path
FS-->>Metadata: file handle / schema info
Metadata->>Router: constraints list
Router-->>Client: 200 OK with constraints
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Construct the candidate path by joining the trusted allowed_root with the user-supplied value before resolving, matching the CodeQL-recognised safe pattern (join trusted base + user input → normalize → startswith check). Previously the path was resolved directly from user input, so CodeQL's taint tracker still flagged open(path) even though the startswith guard was present. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
ibis-server/app/routers/v3/connector.py (1)
617-636: Consider consistency with other metadata endpoints regarding filter parameters.The
get_constraintsendpoint acceptsMetadataDTObut doesn't usedto.filter_infoordto.table_limit, unlikeget_table_list(lines 583-588) andget_schema_list(lines 609-614). If theget_constraintsmethod doesn't support filtering, consider either:
- Documenting this limitation in the endpoint description.
- Using a simpler DTO without unused fields.
Otherwise, if filtering should be supported, pass the filter parameters to
execute_get_constraints_with_timeout.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ibis-server/app/routers/v3/connector.py` around lines 617 - 636, get_constraints currently accepts a MetadataDTO but never uses dto.filter_info or dto.table_limit (unlike get_table_list and get_schema_list), which is inconsistent; either update get_constraints to pass the filter parameters into execute_get_constraints_with_timeout (e.g., extract dto.filter_info and dto.table_limit after MetadataFactory.get_metadata and forward them) or change the endpoint signature/description to use a slimmer DTO or explicitly document that filtering/limit are not supported; locate the get_constraints function and adjust the call to execute_get_constraints_with_timeout (or the DTO type/endpoint description) accordingly to make behavior consistent with get_table_list/get_schema_list.ibis-server/tests/routers/v3/connector/duckdb/conftest.py (1)
10-17: Minor redundancy in marker application.The
pytest_collection_modifyitemshook adds theduckdbmarker to tests, butpytestmark = pytest.mark.duckdbat line 5 already marks all tests in this module. The hook may be useful for ensuring markers on dynamically collected tests, but for standard test files in this directory, it's redundant.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ibis-server/tests/routers/v3/connector/duckdb/conftest.py` around lines 10 - 17, The pytest_collection_modifyitems hook redundantly reapplies the pytestmark (pytestmark = pytest.mark.duckdb) to tests already marked; remove the hook implementation to avoid duplication, or if dynamic marking is required keep only the logic that targets out-of-module items—specifically delete or comment out the pytest_collection_modifyitems function (which uses current_file_dir, items, item.fspath, and item.add_marker(pytestmark)) so tests rely on the module-level pytestmark (duckdb) instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ibis-server/app/routers/v3/connector.py`:
- Around line 617-636: get_constraints currently accepts a MetadataDTO but never
uses dto.filter_info or dto.table_limit (unlike get_table_list and
get_schema_list), which is inconsistent; either update get_constraints to pass
the filter parameters into execute_get_constraints_with_timeout (e.g., extract
dto.filter_info and dto.table_limit after MetadataFactory.get_metadata and
forward them) or change the endpoint signature/description to use a slimmer DTO
or explicitly document that filtering/limit are not supported; locate the
get_constraints function and adjust the call to
execute_get_constraints_with_timeout (or the DTO type/endpoint description)
accordingly to make behavior consistent with get_table_list/get_schema_list.
In `@ibis-server/tests/routers/v3/connector/duckdb/conftest.py`:
- Around line 10-17: The pytest_collection_modifyitems hook redundantly
reapplies the pytestmark (pytestmark = pytest.mark.duckdb) to tests already
marked; remove the hook implementation to avoid duplication, or if dynamic
marking is required keep only the logic that targets out-of-module
items—specifically delete or comment out the pytest_collection_modifyitems
function (which uses current_file_dir, items, item.fspath, and
item.add_marker(pytestmark)) so tests rely on the module-level pytestmark
(duckdb) instead.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a6bc87f4-528a-4dba-b5cf-f2817714b6b4
📒 Files selected for processing (11)
ibis-server/app/model/__init__.pyibis-server/app/model/connector.pyibis-server/app/model/data_source.pyibis-server/app/model/metadata/factory.pyibis-server/app/routers/v3/connector.pyibis-server/pyproject.tomlibis-server/tests/resource/duckdb/jaffle_shop.duckdbibis-server/tests/routers/v3/connector/duckdb/__init__.pyibis-server/tests/routers/v3/connector/duckdb/conftest.pyibis-server/tests/routers/v3/connector/duckdb/test_metadata.pyibis-server/tests/routers/v3/connector/duckdb/test_query.py
…in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…pliance Switch from pathlib to os.path module to exactly match the CodeQL-documented safe pattern: os.path.normpath(os.path.join(trusted_root, user_input)) with a simple startswith guard. The previous pathlib / operator and relative_to() check were not recognised as sanitisers by CodeQL's taint tracker. Also wraps with realpath to prevent symlink escape. Removes now-unused pathlib import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ibis-server/app/util.py (1)
95-106:⚠️ Potential issue | 🟠 MajorNormalize other file-open failures into
INVALID_CONNECTION_INFO.Line 96 can raise
IsADirectoryErrororPermissionError, but onlyFileNotFoundErroris explicitly caught. Unhandled exceptions will surface as 500 errors instead of client errors. CatchOSErroras a fallback to handle all file-access failures:Suggested fix
try: with open(fullpath) as f: return _normalize_port(json.load(f)) except FileNotFoundError: raise WrenError( ErrorCode.INVALID_CONNECTION_INFO, f"Connection file not found: {dto.connection_file_path}", ) + except OSError as e: + raise WrenError( + ErrorCode.INVALID_CONNECTION_INFO, + f"Connection file cannot be read: {dto.connection_file_path}", + ) from e except json.JSONDecodeError as e: raise WrenError( ErrorCode.INVALID_CONNECTION_INFO, f"Invalid JSON in connection file: {e}", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ibis-server/app/util.py` around lines 95 - 106, The file-open block that calls _normalize_port(json.load(f)) only catches FileNotFoundError and JSONDecodeError, leaving other OS-level file access errors (e.g., IsADirectoryError, PermissionError) to bubble up; update the try/except so that OSError is handled and normalized into a WrenError with ErrorCode.INVALID_CONNECTION_INFO (include the underlying error message and dto.connection_file_path for context). Concretely, modify the excepts around the open(fullpath) call to either add an except OSError as e that raises WrenError(ErrorCode.INVALID_CONNECTION_INFO, f"...: {e}") or combine FileNotFoundError and OSError into a single except, leaving the existing json.JSONDecodeError handling and still calling _normalize_port as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ibis-server/app/util.py`:
- Around line 87-90: The allowed-root path check incorrectly builds a prefix
with a duplicated separator (causing "/" to become "//") which rejects valid
paths; update the guard in the code that computes fullpath (using
allowed_root_str and dto.connection_file_path) to avoid duplicating os.sep —
either construct the prefix once using something like
os.path.join(allowed_root_str, '') or replace the startswith check with a robust
os.path.commonpath comparison between allowed_root_str and fullpath so the
containment test works for root ("/") and other roots reliably.
---
Outside diff comments:
In `@ibis-server/app/util.py`:
- Around line 95-106: The file-open block that calls
_normalize_port(json.load(f)) only catches FileNotFoundError and
JSONDecodeError, leaving other OS-level file access errors (e.g.,
IsADirectoryError, PermissionError) to bubble up; update the try/except so that
OSError is handled and normalized into a WrenError with
ErrorCode.INVALID_CONNECTION_INFO (include the underlying error message and
dto.connection_file_path for context). Concretely, modify the excepts around the
open(fullpath) call to either add an except OSError as e that raises
WrenError(ErrorCode.INVALID_CONNECTION_INFO, f"...: {e}") or combine
FileNotFoundError and OSError into a single except, leaving the existing
json.JSONDecodeError handling and still calling _normalize_port as before.
…ystem root When CONNECTION_FILE_ROOT=/, allowed_root_str is "/" and appending os.sep produced "//" which would never match any real path. Build the prefix once, adding the separator only if the root string does not already end with one. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ibis-server/app/util.py (1)
87-101: Document thatCONNECTION_FILE_ROOTmust not be user-writable.The validation at lines 87–99 uses
os.path.realpath()to canonicalize the path and resolve all symlinks before the containment check. While a TOCTOU window between validation and theopen()call on line 101 is theoretically possible ifCONNECTION_FILE_ROOTis writable, therealpath()approach already provides strong symlink protection at check time. Standard Docker deployments (as shown in compose.yaml) mount/workspaceas a controlled volume. To eliminate any ambiguity about the security model, explicitly document in code comments or deployment guidance thatCONNECTION_FILE_ROOTmust not be user-writable and should be configured as a read-only or privileged directory.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ibis-server/app/util.py` around lines 87 - 101, Add a brief comment near the path-validation block (around the use of os.path.realpath(), fullpath, root_prefix and the subsequent open(fullpath)) documenting that CONNECTION_FILE_ROOT must not be user-writable; state that realpath() protects against symlink attacks at check time but TOCTOU remains possible if the root is writable, and instruct operators to configure CONNECTION_FILE_ROOT as a read-only or privileged directory (e.g., mount as read-only in Docker/compose) to eliminate ambiguity about the security model.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ibis-server/app/util.py`:
- Around line 87-101: Add a brief comment near the path-validation block (around
the use of os.path.realpath(), fullpath, root_prefix and the subsequent
open(fullpath)) documenting that CONNECTION_FILE_ROOT must not be user-writable;
state that realpath() protects against symlink attacks at check time but TOCTOU
remains possible if the root is writable, and instruct operators to configure
CONNECTION_FILE_ROOT as a read-only or privileged directory (e.g., mount as
read-only in Docker/compose) to eliminate ambiguity about the security model.
Summary
DataSource.duckdbas a first-class data source (backed byLocalFileConnectionInfo+DuckDBConnector)DuckDBMetadatainMetadataFactoryfor theduckdbdata sourceGET /v3/connector/{data_source}/metadata/constraintsendpoint to the v3 API (ported from v2)duckdbpytest marker topyproject.tomlget_table_list,get_constraints) using ajaffle_shop.duckdbfixtureTest plan
just test duckdbpasses locallytest_metadata_list_tables— verifies tables (main.customers,main.orders) are returned with correct schema/catalog/column metadatatest_metadata_list_constraints— verifies empty constraint list is returned (DuckDB has no FK constraints ininformation_schema)test_query,test_query_with_limit,test_query_orders,test_dry_run— query execution against attached.duckdbfile🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests