Skip to content

feat: add DuckDB data source with v3 metadata API support#1436

Merged
douenergy merged 7 commits intomainfrom
fix/enhance-duckdb-usage
Mar 10, 2026
Merged

feat: add DuckDB data source with v3 metadata API support#1436
douenergy merged 7 commits intomainfrom
fix/enhance-duckdb-usage

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Mar 10, 2026

Summary

  • Add DataSource.duckdb as a first-class data source (backed by LocalFileConnectionInfo + DuckDBConnector)
  • Register DuckDBMetadata in MetadataFactory for the duckdb data source
  • Add GET /v3/connector/{data_source}/metadata/constraints endpoint to the v3 API (ported from v2)
  • Add duckdb pytest marker to pyproject.toml
  • Add DuckDB connector tests: query tests and metadata tests (get_table_list, get_constraints) using a jaffle_shop.duckdb fixture

Test plan

  • just test duckdb passes locally
  • test_metadata_list_tables — verifies tables (main.customers, main.orders) are returned with correct schema/catalog/column metadata
  • test_metadata_list_constraints — verifies empty constraint list is returned (DuckDB has no FK constraints in information_schema)
  • test_query, test_query_with_limit, test_query_orders, test_dry_run — query execution against attached .duckdb file

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • DuckDB is now supported as a data source connector
    • Added a metadata constraints API endpoint to retrieve constraint information
  • Bug Fixes

    • Improved local connection path validation to prevent invalid/outside-root file access
  • Tests

    • Added DuckDB-specific tests covering metadata and query endpoints and test fixtures

- Add DataSource.duckdb enum value and LocalFileConnectionInfo mapping
- Route duckdb through DuckDBConnector and DuckDBMetadata in factory
- Add GET /v3/connector/duckdb/metadata/constraints endpoint
- Add duckdb pytest marker
- Add duckdb connector tests (query + metadata) with jaffle_shop fixture

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added ibis dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Mar 10, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Adds DuckDB as a new data source: new QueryDuckDBDTO, DataSource.duckdb enum member, connector and metadata wiring to DuckDB, a POST /{data_source}/metadata/constraints endpoint, tightened filesystem path resolution, pytest marker, and DuckDB-focused tests.

Changes

Cohort / File(s) Summary
Model DTOs & enums
ibis-server/app/model/__init__.py, ibis-server/app/model/data_source.py
Add QueryDuckDBDTO; add DataSource.duckdb and DataSourceExtension mapping; route duckdb connection info to LocalFileConnectionInfo.
Connector wiring
ibis-server/app/model/connector.py
Map DataSource.duckdb to DuckDBConnector during connector initialization.
Metadata factory
ibis-server/app/model/metadata/factory.py
Add DataSource.duckdb -> DuckDBMetadata mapping and conditional to return DuckDBMetadata for local/object paths whose format is "duckdb".
API router
ibis-server/app/routers/v3/connector.py
Add POST /{data_source}/metadata/constraints endpoint; import Constraint DTO and constraint-execution util; resolve connection and invoke metadata constraints execution.
Utilities
ibis-server/app/util.py
Tighten path validation: join trusted root with provided path, resolve+normalize combined path, enforce containment, and read from the resolved full path.
Tests & infra
ibis-server/pyproject.toml, ibis-server/tests/routers/v3/connector/duckdb/conftest.py, ibis-server/tests/routers/v3/connector/duckdb/test_metadata.py, ibis-server/tests/routers/v3/connector/duckdb/test_query.py
Add duckdb pytest marker; DuckDB conftest and connection_info fixture; integration tests for metadata (tables, constraints) and query endpoints (queries, limits, dry-run).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Router as API Router (/v3/connector)
    participant Factory as MetadataFactory
    participant Metadata as DuckDBMetadata
    participant FS as Filesystem

    Client->>Router: POST /duckdb/metadata/constraints + connectionInfo
    Router->>Factory: build connection_info, get_metadata(data_source=duckdb, connection_info)
    Factory->>Metadata: instantiate DuckDBMetadata(connection_info)
    Metadata->>FS: open resolved duckdb file path
    FS-->>Metadata: file handle / schema info
    Metadata->>Router: constraints list
    Router-->>Client: 200 OK with constraints
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • douenergy
  • wwwy3y3

Poem

🐰 I hopped into code with a curious look,
DuckDB rolled in and opened a book.
Tables and queries, constraints in a line,
Tests chomped the carrots — everything's fine! 🥕🦆

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding DuckDB as a new data source with v3 metadata API support. It is concise, specific, and directly reflects the primary objectives of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/enhance-duckdb-usage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Construct the candidate path by joining the trusted allowed_root with the
user-supplied value before resolving, matching the CodeQL-recognised safe
pattern (join trusted base + user input → normalize → startswith check).
Previously the path was resolved directly from user input, so CodeQL's
taint tracker still flagged open(path) even though the startswith guard
was present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
ibis-server/app/routers/v3/connector.py (1)

617-636: Consider consistency with other metadata endpoints regarding filter parameters.

The get_constraints endpoint accepts MetadataDTO but doesn't use dto.filter_info or dto.table_limit, unlike get_table_list (lines 583-588) and get_schema_list (lines 609-614). If the get_constraints method doesn't support filtering, consider either:

  1. Documenting this limitation in the endpoint description.
  2. Using a simpler DTO without unused fields.

Otherwise, if filtering should be supported, pass the filter parameters to execute_get_constraints_with_timeout.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ibis-server/app/routers/v3/connector.py` around lines 617 - 636,
get_constraints currently accepts a MetadataDTO but never uses dto.filter_info
or dto.table_limit (unlike get_table_list and get_schema_list), which is
inconsistent; either update get_constraints to pass the filter parameters into
execute_get_constraints_with_timeout (e.g., extract dto.filter_info and
dto.table_limit after MetadataFactory.get_metadata and forward them) or change
the endpoint signature/description to use a slimmer DTO or explicitly document
that filtering/limit are not supported; locate the get_constraints function and
adjust the call to execute_get_constraints_with_timeout (or the DTO
type/endpoint description) accordingly to make behavior consistent with
get_table_list/get_schema_list.
ibis-server/tests/routers/v3/connector/duckdb/conftest.py (1)

10-17: Minor redundancy in marker application.

The pytest_collection_modifyitems hook adds the duckdb marker to tests, but pytestmark = pytest.mark.duckdb at line 5 already marks all tests in this module. The hook may be useful for ensuring markers on dynamically collected tests, but for standard test files in this directory, it's redundant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ibis-server/tests/routers/v3/connector/duckdb/conftest.py` around lines 10 -
17, The pytest_collection_modifyitems hook redundantly reapplies the pytestmark
(pytestmark = pytest.mark.duckdb) to tests already marked; remove the hook
implementation to avoid duplication, or if dynamic marking is required keep only
the logic that targets out-of-module items—specifically delete or comment out
the pytest_collection_modifyitems function (which uses current_file_dir, items,
item.fspath, and item.add_marker(pytestmark)) so tests rely on the module-level
pytestmark (duckdb) instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ibis-server/app/routers/v3/connector.py`:
- Around line 617-636: get_constraints currently accepts a MetadataDTO but never
uses dto.filter_info or dto.table_limit (unlike get_table_list and
get_schema_list), which is inconsistent; either update get_constraints to pass
the filter parameters into execute_get_constraints_with_timeout (e.g., extract
dto.filter_info and dto.table_limit after MetadataFactory.get_metadata and
forward them) or change the endpoint signature/description to use a slimmer DTO
or explicitly document that filtering/limit are not supported; locate the
get_constraints function and adjust the call to
execute_get_constraints_with_timeout (or the DTO type/endpoint description)
accordingly to make behavior consistent with get_table_list/get_schema_list.

In `@ibis-server/tests/routers/v3/connector/duckdb/conftest.py`:
- Around line 10-17: The pytest_collection_modifyitems hook redundantly
reapplies the pytestmark (pytestmark = pytest.mark.duckdb) to tests already
marked; remove the hook implementation to avoid duplication, or if dynamic
marking is required keep only the logic that targets out-of-module
items—specifically delete or comment out the pytest_collection_modifyitems
function (which uses current_file_dir, items, item.fspath, and
item.add_marker(pytestmark)) so tests rely on the module-level pytestmark
(duckdb) instead.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a6bc87f4-528a-4dba-b5cf-f2817714b6b4

📥 Commits

Reviewing files that changed from the base of the PR and between 7a5438b and 0cb32e1.

📒 Files selected for processing (11)
  • ibis-server/app/model/__init__.py
  • ibis-server/app/model/connector.py
  • ibis-server/app/model/data_source.py
  • ibis-server/app/model/metadata/factory.py
  • ibis-server/app/routers/v3/connector.py
  • ibis-server/pyproject.toml
  • ibis-server/tests/resource/duckdb/jaffle_shop.duckdb
  • ibis-server/tests/routers/v3/connector/duckdb/__init__.py
  • ibis-server/tests/routers/v3/connector/duckdb/conftest.py
  • ibis-server/tests/routers/v3/connector/duckdb/test_metadata.py
  • ibis-server/tests/routers/v3/connector/duckdb/test_query.py

…in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…pliance

Switch from pathlib to os.path module to exactly match the CodeQL-documented
safe pattern: os.path.normpath(os.path.join(trusted_root, user_input)) with a
simple startswith guard. The previous pathlib / operator and relative_to()
check were not recognised as sanitisers by CodeQL's taint tracker. Also wraps
with realpath to prevent symlink escape. Removes now-unused pathlib import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ibis-server/app/util.py (1)

95-106: ⚠️ Potential issue | 🟠 Major

Normalize other file-open failures into INVALID_CONNECTION_INFO.

Line 96 can raise IsADirectoryError or PermissionError, but only FileNotFoundError is explicitly caught. Unhandled exceptions will surface as 500 errors instead of client errors. Catch OSError as a fallback to handle all file-access failures:

Suggested fix
         try:
             with open(fullpath) as f:
                 return _normalize_port(json.load(f))
         except FileNotFoundError:
             raise WrenError(
                 ErrorCode.INVALID_CONNECTION_INFO,
                 f"Connection file not found: {dto.connection_file_path}",
             )
+        except OSError as e:
+            raise WrenError(
+                ErrorCode.INVALID_CONNECTION_INFO,
+                f"Connection file cannot be read: {dto.connection_file_path}",
+            ) from e
         except json.JSONDecodeError as e:
             raise WrenError(
                 ErrorCode.INVALID_CONNECTION_INFO,
                 f"Invalid JSON in connection file: {e}",
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ibis-server/app/util.py` around lines 95 - 106, The file-open block that
calls _normalize_port(json.load(f)) only catches FileNotFoundError and
JSONDecodeError, leaving other OS-level file access errors (e.g.,
IsADirectoryError, PermissionError) to bubble up; update the try/except so that
OSError is handled and normalized into a WrenError with
ErrorCode.INVALID_CONNECTION_INFO (include the underlying error message and
dto.connection_file_path for context). Concretely, modify the excepts around the
open(fullpath) call to either add an except OSError as e that raises
WrenError(ErrorCode.INVALID_CONNECTION_INFO, f"...: {e}") or combine
FileNotFoundError and OSError into a single except, leaving the existing
json.JSONDecodeError handling and still calling _normalize_port as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ibis-server/app/util.py`:
- Around line 87-90: The allowed-root path check incorrectly builds a prefix
with a duplicated separator (causing "/" to become "//") which rejects valid
paths; update the guard in the code that computes fullpath (using
allowed_root_str and dto.connection_file_path) to avoid duplicating os.sep —
either construct the prefix once using something like
os.path.join(allowed_root_str, '') or replace the startswith check with a robust
os.path.commonpath comparison between allowed_root_str and fullpath so the
containment test works for root ("/") and other roots reliably.

---

Outside diff comments:
In `@ibis-server/app/util.py`:
- Around line 95-106: The file-open block that calls
_normalize_port(json.load(f)) only catches FileNotFoundError and
JSONDecodeError, leaving other OS-level file access errors (e.g.,
IsADirectoryError, PermissionError) to bubble up; update the try/except so that
OSError is handled and normalized into a WrenError with
ErrorCode.INVALID_CONNECTION_INFO (include the underlying error message and
dto.connection_file_path for context). Concretely, modify the excepts around the
open(fullpath) call to either add an except OSError as e that raises
WrenError(ErrorCode.INVALID_CONNECTION_INFO, f"...: {e}") or combine
FileNotFoundError and OSError into a single except, leaving the existing
json.JSONDecodeError handling and still calling _normalize_port as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7b619707-ed6b-4136-bcfa-c57ca78b1d93

📥 Commits

Reviewing files that changed from the base of the PR and between 6a105ef and b6d58b9.

📒 Files selected for processing (1)
  • ibis-server/app/util.py

goldmedal and others added 2 commits March 10, 2026 13:56
…ystem root

When CONNECTION_FILE_ROOT=/, allowed_root_str is "/" and appending os.sep
produced "//" which would never match any real path. Build the prefix once,
adding the separator only if the root string does not already end with one.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ibis-server/app/util.py (1)

87-101: Document that CONNECTION_FILE_ROOT must not be user-writable.

The validation at lines 87–99 uses os.path.realpath() to canonicalize the path and resolve all symlinks before the containment check. While a TOCTOU window between validation and the open() call on line 101 is theoretically possible if CONNECTION_FILE_ROOT is writable, the realpath() approach already provides strong symlink protection at check time. Standard Docker deployments (as shown in compose.yaml) mount /workspace as a controlled volume. To eliminate any ambiguity about the security model, explicitly document in code comments or deployment guidance that CONNECTION_FILE_ROOT must not be user-writable and should be configured as a read-only or privileged directory.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ibis-server/app/util.py` around lines 87 - 101, Add a brief comment near the
path-validation block (around the use of os.path.realpath(), fullpath,
root_prefix and the subsequent open(fullpath)) documenting that
CONNECTION_FILE_ROOT must not be user-writable; state that realpath() protects
against symlink attacks at check time but TOCTOU remains possible if the root is
writable, and instruct operators to configure CONNECTION_FILE_ROOT as a
read-only or privileged directory (e.g., mount as read-only in Docker/compose)
to eliminate ambiguity about the security model.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ibis-server/app/util.py`:
- Around line 87-101: Add a brief comment near the path-validation block (around
the use of os.path.realpath(), fullpath, root_prefix and the subsequent
open(fullpath)) documenting that CONNECTION_FILE_ROOT must not be user-writable;
state that realpath() protects against symlink attacks at check time but TOCTOU
remains possible if the root is writable, and instruct operators to configure
CONNECTION_FILE_ROOT as a read-only or privileged directory (e.g., mount as
read-only in Docker/compose) to eliminate ambiguity about the security model.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 31314cde-28cb-4b03-b5b2-a11792a57c24

📥 Commits

Reviewing files that changed from the base of the PR and between b6d58b9 and 58b5ee4.

📒 Files selected for processing (1)
  • ibis-server/app/util.py

@goldmedal goldmedal requested a review from douenergy March 10, 2026 06:32
Copy link
Copy Markdown
Contributor

@douenergy douenergy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@douenergy douenergy merged commit 7f70dae into main Mar 10, 2026
11 checks passed
@goldmedal goldmedal deleted the fix/enhance-duckdb-usage branch March 10, 2026 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants