Skip to content

Update builder/server DuckDB dependencies#662

Merged
forstmeier merged 5 commits intomasterfrom
update-data-manager-docker-definition
Jan 8, 2026
Merged

Update builder/server DuckDB dependencies#662
forstmeier merged 5 commits intomasterfrom
update-data-manager-docker-definition

Conversation

@forstmeier
Copy link
Copy Markdown
Collaborator

@forstmeier forstmeier commented Jan 7, 2026

Overview

Changes

  • update COPY statements order
  • update DuckDB dependency handling

Comments

Fixing a breaking build.

Summary by CodeRabbit

  • Chores
    • Embedded native DuckDB runtime into container images for more reliable local DB support.
    • Bumped DuckDB dependency to v1.4.3 for improved runtime behavior.
    • Streamlined multi-stage build, reducing redundant copies and adding dedicated build/planning stages.
    • Refined artifact copying and image contents for more predictable builds.
    • Improved SSL/certificate handling and related environment configuration to strengthen trust.

✏️ Tip: You can customize this high-level summary in your review settings.

@forstmeier forstmeier requested a review from Copilot January 7, 2026 20:26
@forstmeier forstmeier self-assigned this Jan 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 7, 2026

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Update builder/server DuckDB dependencies' directly aligns with the main changes: updating DuckDB from version 1.0 to 1.4.3 and restructuring the Dockerfile to integrate DuckDB properly in the build stages.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfa1592 and 7681a48.

📒 Files selected for processing (1)
  • applications/datamanager/Dockerfile
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Rust code checks
🔇 Additional comments (3)
applications/datamanager/Dockerfile (3)

3-16: Good cleanup and documentation practices.

The single-layer approach for DuckDB installation correctly minimizes image size by purging wget/unzip in the same RUN command. The note on Line 3 appropriately documents the checksum omission rationale, addressing the concerns from previous reviews.


26-30: Excellent cargo-chef implementation.

The planner stage correctly copies only Cargo files (no src/), which allows cargo-chef to cache dependencies independently of source changes. This addresses the previous feedback about build cache optimization.


51-61: Well-structured server stage with proper library handling.

The server stage correctly:

  • Installs only runtime dependencies (ca-certificates, openssl)
  • Copies libduckdb.so from the chef stage
  • Runs ldconfig after copying to update the dynamic linker cache
  • Cleans up apt lists to reduce image size

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates DuckDB dependency handling in the datamanager Dockerfile to fix a breaking build. The changes install DuckDB v1.1.3 libraries explicitly in both builder and server stages, and reorder COPY statements in the planner stage to be more granular.

Key Changes:

  • Install DuckDB v1.1.3 shared libraries from GitHub releases in both builder and server stages
  • Reorder COPY statements in planner stage to copy source files separately from Cargo files
  • Add wget and unzip dependencies, with cleanup in server stage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread applications/datamanager/Dockerfile
Comment thread applications/datamanager/Dockerfile Outdated
Comment thread applications/datamanager/Dockerfile Outdated
@pulumi
Copy link
Copy Markdown

pulumi Bot commented Jan 7, 2026

🚀 The Update (preview) for forstmeier/pocketsizefund/production (at 7681a48) was successful.

✨ Neo Explanation

Initial production environment deployment for PocketSizeFund, creating a complete ECS-based microservices architecture with three containerized services (datamanager, equitypricemodel, portfoliomanager) behind an Application Load Balancer, running in a new VPC with private subnets and service discovery.

Root Cause Analysis

This deployment was triggered by a code change in the repository. The developer is provisioning a completely new production environment for the PocketSizeFund application from scratch. This appears to be either an initial production deployment or a full infrastructure rebuild, as all 52 resources are being created fresh with no existing state.

Dependency Chain

The infrastructure follows a logical bottom-up build sequence:

  1. Foundation layer: VPC, subnets (public/private across 2 availability zones), and internet/NAT gateways establish the network foundation
  2. Security layer: Security groups and rules control traffic flow between the ALB, ECS tasks, and VPC endpoints
  3. Platform services: IAM roles for task execution and runtime permissions, CloudWatch log groups for observability, and VPC endpoints for private ECR/S3 access
  4. Container orchestration: ECS cluster, task definitions for three services (datamanager, equitypricemodel, portfoliomanager), and service discovery for inter-service communication
  5. Load balancing: Application Load Balancer with target groups and listener rules to route external traffic to the datamanager and portfoliomanager services

Risk analysis

Risk Level: MODERATE

This is a greenfield deployment creating all new resources. While there's no risk of data loss or service disruption (nothing exists yet), the moderate risk stems from the complexity of coordinating 52 interdependent resources. Any misconfiguration in networking, security groups, or IAM permissions could result in a non-functional deployment requiring troubleshooting or rollback.

Resource Changes

    Name                           Type                                                          Operation
+   vpc                            aws:ec2/vpc:Vpc                                               create
+   nat_elastic_ip                 aws:ec2/eip:Eip                                               create
+   alb_sg                         aws:ec2/securityGroup:SecurityGroup                           create
+   private_subnet_1_rta           aws:ec2/routeTableAssociation:RouteTableAssociation           create
+   http_listener                  aws:lb/listener:Listener                                      create
+   execution_role_policy          aws:iam/rolePolicyAttachment:RolePolicyAttachment             create
+   datamanager_logs               aws:cloudwatch/logGroup:LogGroup                              create
+   portfoliomanager_task          aws:ecs/taskDefinition:TaskDefinition                         create
+   ecs_cluster                    aws:ecs/cluster:Cluster                                       create
+   private_route_table            aws:ec2/routeTable:RouteTable                                 create
+   public_subnet_2                aws:ec2/subnet:Subnet                                         create
+   public_internet_route          aws:ec2/route:Route                                           create
+   equitypricemodel_logs          aws:cloudwatch/logGroup:LogGroup                              create
+   portfoliomanager_logs          aws:cloudwatch/logGroup:LogGroup                              create
+   public_subnet_2_rta            aws:ec2/routeTableAssociation:RouteTableAssociation           create
+   datamanager_sd                 aws:servicediscovery/service:Service                          create
+   task_role_s3_policy            aws:iam/rolePolicy:RolePolicy                                 create
+   execution_role                 aws:iam/role:Role                                             create
+   public_route_table             aws:ec2/routeTable:RouteTable                                 create
+   service_discovery              aws:servicediscovery/privateDnsNamespace:PrivateDnsNamespace  create
+   vpc_endpoints_sg               aws:ec2/securityGroup:SecurityGroup                           create
+   datamanager_tg                 aws:lb/targetGroup:TargetGroup                                create
+   vpc_endpoints_ingress          aws:ec2/securityGroupRule:SecurityGroupRule                   create
+   ecs_self_ingress               aws:ec2/securityGroupRule:SecurityGroupRule                   create
+   execution_role_secrets_policy  aws:iam/rolePolicy:RolePolicy                                 create
+   s3_gateway_endpoint            aws:ec2/vpcEndpoint:VpcEndpoint                               create
+   equitypricemodel_sd            aws:servicediscovery/service:Service                          create
+   nat_gateway                    aws:ec2/natGateway:NatGateway                                 create
+   alb                            aws:lb/loadBalancer:LoadBalancer                              create
+   equitypricemodel_task          aws:ecs/taskDefinition:TaskDefinition                         create
+   datamanager_task               aws:ecs/taskDefinition:TaskDefinition                         create
+   equitypricemodel_service       aws:ecs/service:Service                                       create
+   igw                            aws:ec2/internetGateway:InternetGateway                       create
+   private_subnet_1               aws:ec2/subnet:Subnet                                         create
+   public_subnet_1                aws:ec2/subnet:Subnet                                         create
+   ecs_sg                         aws:ec2/securityGroup:SecurityGroup                           create
+   public_subnet_1_rta            aws:ec2/routeTableAssociation:RouteTableAssociation           create
+   ecs_from_alb                   aws:ec2/securityGroupRule:SecurityGroupRule                   create
+   ecr_dkr_endpoint               aws:ec2/vpcEndpoint:VpcEndpoint                               create
+   nat_route                      aws:ec2/route:Route                                           create
+   task_role                      aws:iam/role:Role                                             create
... and 12 other changes

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @applications/datamanager/Dockerfile:
- Around line 11-14: The Dockerfile currently downloads
libduckdb-linux-amd64.zip (v1.1.3) without integrity checking; update the
download to the v1.4.3 release URL (replace the version in the RUN command) and
add SHA256 checksum verification before unzipping: download the zip, verify it
with the official v1.4.3 SHA256 (replace EXPECTED_SHA256 with the checksum from
the DuckDB v1.4.3 release), fail the build if verification fails, then unzip and
clean up as before (affecting the RUN line that references
libduckdb-linux-amd64.zip and ldconfig).
🧹 Nitpick comments (1)
applications/datamanager/Dockerfile (1)

7-9: LGTM! Dependencies added for DuckDB installation.

The addition of wget and unzip is necessary for downloading and extracting the DuckDB binary in the subsequent steps.

Note: These tools remain in the chef stage (which is used as a base for planner and builder). If you want to minimize intermediate image sizes, consider adding cleanup after DuckDB installation, though this is optional.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df47a9c and eedaa45.

📒 Files selected for processing (1)
  • applications/datamanager/Dockerfile
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Rust code checks
🔇 Additional comments (4)
applications/datamanager/Dockerfile (4)

24-24: LGTM! Targeted COPY improves build efficiency.

The explicit copying of Cargo.toml, Cargo.lock, and the src directory follows Docker best practices and works well with the cargo-chef pattern for dependency caching.

Also applies to: 28-28


53-58: LGTM! Appropriate packages for runtime stage.

The package installation includes necessary SSL/TLS dependencies and temporary tools (wget, unzip) that will be cleaned up after DuckDB installation.


67-67: LGTM! Ensures SSL certificates are properly configured.

The explicit call to update-ca-certificates is necessary to ensure the certificate store is current and properly configured for SSL/TLS connections.


71-71: LGTM! Standard SSL/TLS configuration.

Adding SSL_CERT_DIR provides better compatibility across different SSL/TLS implementations. Some libraries prefer this environment variable over SSL_CERT_FILE, and setting both ensures maximum compatibility.

If you want to verify that your application or its dependencies specifically benefit from this variable, you can check the SSL/TLS library documentation (e.g., OpenSSL, rustls, reqwest).

Comment thread applications/datamanager/Dockerfile Outdated
Comment thread applications/datamanager/Dockerfile Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @applications/datamanager/Dockerfile:
- Around line 9-10: Update the DuckDB download and checksum in the Dockerfile:
replace the wget URL that references v1.0.0 and the archive name
(libduckdb-linux-amd64.zip) with a current stable release (e.g., v1.4.3) and the
corresponding archive name/URL, and replace the existing sha256sum string with
the official checksum for that release; after updating, verify the checksum and
rebuild to ensure compatibility and successful verification.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eedaa45 and 071d447.

📒 Files selected for processing (1)
  • applications/datamanager/Dockerfile
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Rust code checks
🔇 Additional comments (3)
applications/datamanager/Dockerfile (3)

14-16: Good layer hygiene.

Purging temporary tools (wget, unzip) and cleaning apt cache in the same layer as their installation properly minimizes image size.


26-32: LGTM!

The planner stage correctly copies the manifest files and source directory needed for cargo chef prepare to generate the recipe.


53-67: Well-structured server stage.

The multi-stage approach correctly:

  1. Copies only the DuckDB shared library (not build tools) from the chef stage
  2. Runs ldconfig to register the library with the dynamic linker
  3. Configures SSL certificates for secure connections

This keeps the final image minimal while ensuring runtime dependencies are available.

Comment thread applications/datamanager/Dockerfile Outdated
Copilot AI review requested due to automatic review settings January 7, 2026 23:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread applications/datamanager/Dockerfile Outdated
@coderabbitai coderabbitai Bot added the rust label Jan 7, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @applications/datamanager/Dockerfile:
- Around line 26-32: Remove the unnecessary source copy in the planner stage:
delete the COPY applications/datamanager/src/ applications/datamanager/src/ line
so the planner only copies Cargo.toml/Cargo.lock (and the per-crate Cargo.toml
already copied) before running cargo chef prepare --recipe-path recipe.json;
leave the subsequent builder stage that copies src/ intact so source changes
don’t bust the cargo-chef dependency cache.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 071d447 and 065c960.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • applications/datamanager/Cargo.toml
  • applications/datamanager/Dockerfile
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Rust code checks
🔇 Additional comments (4)
applications/datamanager/Dockerfile (3)

53-63: Clean runtime stage setup.

The server stage correctly:

  • Uses a minimal debian:trixie-slim base for smaller image size
  • Installs only runtime dependencies (ca-certificates, openssl)
  • Copies the DuckDB shared library from the chef stage (line 61)
  • Runs ldconfig to register the library in the dynamic linker cache (line 63)

This ensures the compiled Rust binary can find and load the DuckDB library at runtime.


65-67: SSL certificate configuration is correct.

The environment variables point to Debian's standard certificate locations, which work correctly with the update-ca-certificates command executed earlier (line 58). This ensures proper SSL/TLS certificate validation for outbound connections.


7-16: Excellent security improvements addressing prior concerns.

The DuckDB installation now includes:

  • SHA256 checksum verification (line 10) - addresses the security concern from past reviews
  • Upgrade to v1.4.3 latest stable release - addresses the outdated version concern
  • Proper cleanup of temporary tools (wget/unzip) and apt cache - addresses image size concerns

The checksum 3c7bb8d586d39ccce56442c3f6bafa97f53d7b6bb5405dac827442fcb31494aa matches the official DuckDB v1.4.3 release. The multi-stage approach correctly installs the library in the chef stage and makes it available for subsequent stages.

applications/datamanager/Cargo.toml (1)

33-33: Version alignment is correct and the update is safe.

The duckdb crate version 1.4.3 correctly matches the native library version (v1.4.3) in the Dockerfile. The codebase uses only standard, stable duckdb-rs APIs (Connection::open_in_memory, execute_batch, prepare, query_map) that have remained unchanged across the 1.x series. The duckdb-rs Rust bindings maintain API stability throughout the 1.x release line, so this version bump introduces no breaking API changes.

Comment thread applications/datamanager/Dockerfile
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @applications/datamanager/Dockerfile:
- Around line 7-16: The Dockerfile currently verifies libduckdb-linux-amd64.zip
using an unverifiable SHA256 value
("3c7bb8d586d39ccce56442c3f6bafa97f53d7b6bb5405dac827442fcb31494aa") which is
not published by DuckDB v1.4.3; either replace that sha256 check with a checksum
obtained from an official DuckDB source and add a comment documenting the
source, or remove the echo | sha256sum -c - step entirely and rely on HTTPS (or
switch to an alternative distribution that publishes verifiable checksums);
update the Dockerfile sequence that references libduckdb-linux-amd64.zip and the
sha256 snippet accordingly so the build no longer validates against an
unverifiable hash.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 065c960 and bfa1592.

📒 Files selected for processing (1)
  • applications/datamanager/Dockerfile
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Rust code checks
🔇 Additional comments (2)
applications/datamanager/Dockerfile (2)

26-30: Perfect cargo-chef optimization.

The planner stage now correctly copies only manifest files (Cargo.toml and Cargo.lock), excluding source code. This prevents cache invalidation on every source change and properly leverages cargo-chef's dependency caching, addressing the previous review concern.


51-63: Well-structured server stage with proper DuckDB and SSL integration.

The server stage is correctly configured:

  • Minimal base image with only runtime dependencies
  • SSL certificates properly installed and updated (lines 53-57)
  • DuckDB shared library copied from chef stage (line 59)
  • ldconfig ensures library availability (line 61)
  • SSL_CERT_FILE explicitly set for runtime clarity (line 63)

The reorganization properly separates build-time and runtime dependencies, ensuring a lean production image.

Comment thread applications/datamanager/Dockerfile
Copilot AI review requested due to automatic review settings January 7, 2026 23:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread applications/datamanager/Dockerfile
@forstmeier forstmeier merged commit 6324d90 into master Jan 8, 2026
6 checks passed
@forstmeier forstmeier deleted the update-data-manager-docker-definition branch January 8, 2026 02:28
@coderabbitai coderabbitai Bot mentioned this pull request Mar 21, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants