feat(ibis): introduce GCS file connector#1053
Conversation
WalkthroughThis pull request introduces support for Google Cloud Storage (GCS) file handling. The changes add a new data source type ( Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant R as Rewriter
participant D as DuckDBConnector
participant U as Utils (init_duckdb_gcs)
participant M as MetadataFactory
C->>R: Submit query with GCS connection info
R->>D: Determine write dialect ("duckdb") for DataSource.gcs_file
D->>U: Call init_duckdb_gcs (with GcsFileConnectionInfo)
U-->>D: Return initialized DuckDB connection
D->>M: Request metadata handling for GCS
M-->>D: Return GCS file metadata
Possibly related PRs
Suggested reviewers
Poem
Tip 🌐 Web search-backed reviews and chat
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (1)
🔇 Additional comments (3)
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
ibis-server/app/model/metadata/object_storage.py (1)
256-256: Fix incorrect logging message.The logging message indicates "Initialized duckdb minio" but this is in the GCS metadata class.
- logger.debug("Initialized duckdb minio") + logger.debug("Initialized duckdb gcs")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
ibis-server/app/mdl/rewriter.py(1 hunks)ibis-server/app/model/__init__.py(3 hunks)ibis-server/app/model/connector.py(3 hunks)ibis-server/app/model/data_source.py(3 hunks)ibis-server/app/model/metadata/factory.py(2 hunks)ibis-server/app/model/metadata/object_storage.py(3 hunks)ibis-server/app/model/utils.py(2 hunks)ibis-server/pyproject.toml(1 hunks)ibis-server/tests/routers/v2/connector/test_gcs_file.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci
🔇 Additional comments (8)
ibis-server/app/model/metadata/factory.py (1)
9-9: LGTM!The addition of
GcsFileMetadatafollows the established pattern for file metadata classes, maintaining consistency with existing implementations.Also applies to: 30-30
ibis-server/app/model/utils.py (1)
53-68: Verify GCS credentials usage.The function creates a GCS secret but doesn't utilize the
credentialsfield fromGcsFileConnectionInfo. This might be required for certain GCS authentication scenarios.Please verify if the
credentialsfield should be used in the secret creation by checking DuckDB's GCS integration documentation:✅ Verification successful
GCS HMAC Authentication is Configured Correctly
- The current implementation follows DuckDB’s documented HMAC authentication method for GCS by creating a secret using
KEY_IDandSECRET.- DuckDB’s configuration for GCS (via
CREATE SECRET) does not require the additionalcredentialsfield when using HMAC keys.- If you intend to support alternative authentication methods (e.g., using local gcloud CLI credentials), a different approach would be needed.
🏁 Scripts executed
The following scripts were executed for the analysis:
Web query:
How to configure GCS authentication in DuckDB?Length of output: 5528
ibis-server/app/mdl/rewriter.py (1)
79-79: LGTM!The addition of
DataSource.gcs_fileto the DuckDB dialect set follows the established pattern for file-based data sources.ibis-server/app/model/__init__.py (1)
67-68: LGTM!The
QueryGcsFileDTOclass follows the established pattern for file-based query DTOs.ibis-server/app/model/data_source.py (1)
25-25: LGTM! The GCS file data source is properly integrated.The changes follow the established pattern for adding new data sources, maintaining consistency with the existing codebase.
Also applies to: 52-52, 79-79
ibis-server/app/model/connector.py (1)
21-21: LGTM! The GCS connector is properly integrated.The changes follow the established pattern for adding new connectors, maintaining consistency with the existing error handling and initialization patterns.
Also applies to: 28-28, 46-46, 172-173
ibis-server/tests/routers/v2/connector/test_gcs_file.py (1)
1-508: LGTM! Comprehensive test coverage for the GCS connector.The test suite is well-structured and covers all essential aspects:
- Basic query functionality
- Query limits and calculated fields
- Error handling and edge cases
- Metadata operations
- Support for different file formats (parquet, csv, json)
- Type mapping verification
Good use of environment variables for sensitive data and pytest fixtures for test setup.
ibis-server/pyproject.toml (1)
70-70: GCS Test Marker Addition: Confirm and DocumentThe new marker
"gcs_file: mark a test as a gcs file test",
has been correctly introduced in the[tool.pytest.ini_options]section. This addition meets the PR objective of supporting tests related to the new GCS file connector.Please ensure that any related documentation or guidelines for writing tests include details about this marker so that developers know when and how to use it.
Description
URL
Connection Info
{ "url": "/tpch/data", "format": "parquet", "bucket": "gcs-bucket-name", "key_id": "hmackeyid", "secret_key": "hmacsercetkey" "credentials": "abcdef123456", }url: The root path of the dataset. (It doesn't include the bucket name)format: The specific file format.bucket: The bucket name.credentials: The credentials of GCPkey_id: The HMAC key id of GCPsecret_key: The HMAC secret key of GCPSummary by CodeRabbit
Summary by CodeRabbit
New Features
Tests
Chores