Codebase reorganization and cleanup by forstmeier · Pull Request #792 · oscmcompany/fund

forstmeier · 2026-03-14T00:44:04Z

Overview

Changes

rename infrastructure resources and variables
refactor infrastructure/ into separate files
reorganize model training resources
consolidate/cleanup Mask commands
rename applications/ sub-packages for readability
various other cleanups

Context

There were some redundancies in model training workflows and I had a backlog of tweaks I wanted to make so this is all of that. Open to any ideas or adjustments you want.

NOTE: I'll need to do an up on the infrastructure to update everything and then I'll either update this pull request or include the removal of the "alias" attributes from the infrastructure/ resources in a subsequent pull request. Additionally, I'll need to rebuild and push images to ECR once the up has been run.

Summary by CodeRabbit

New Features
- Added cost monitoring with email budget alerts and optional HTTPS/TLS support
- Introduced a new "Tide" model training runtime and updated training/deploy workflows
Infrastructure Updates
- Major modular overhaul: VPC, load balancer, ECS services, RDS, Redis, ECR, IAM, Secrets, and storage now provisioned and exposed
- Service and runtime renames consolidated for deployment: data-manager, portfolio-manager, ensemble-manager, training; updated container/image and service routing references

greptile-apps · 2026-03-14T00:44:08Z

Too many files changed for review. (108 files found, 100 file limit)

coderabbitai · 2026-03-14T00:44:21Z

📝 Walkthrough

Walkthrough

Renames services (datamanager → data_manager, portfoliomanager → portfolio_manager, equitypricemodel → ensemble_manager), adds a Tide model project, and replaces a monolithic Pulumi infra entrypoint with modular infra modules (compute, config, iam, networking, storage, secrets, notifications); updates Dockerfiles, CI/workflows, docker-compose, and many test import paths.

Changes

Cohort / File(s)	Summary
Rust Data Manager `Cargo.toml`, `applications/data_manager/...`	Workspace member and crate/bin renamed to `data_manager`; Dockerfile COPY/build and binary name updated; imports and tests updated to `data_manager`; ENV var tests use `FUND_ENVIRONMENT`.
Python Portfolio Manager `applications/portfolio_manager/...`, `applications/portfolio_manager/tests/*`	Package renamed to `portfolio_manager`; Dockerfile PYTHONPATH and ENTRYPOINT updated; tests and patches repointed; ENV -> `FUND_ENVIRONMENT`; replaced equitypricemodel URL with `ENSEMBLE_MANAGER_BASE_URL` (default http://ensemble-manager:8080).
Python Ensemble Manager `applications/ensemble_manager/...`	New `ensemble_manager` package: pyproject, Dockerfile, server changes (imports, _environment, MODEL_VERSION SSM param), tests updated; removed old equitypricemodel Dockerfile.
Tide Model Project `models/tide/...`, `models/tide/src/tide/`, `models/tide/tests/`	Adds `models/tide` project (pyproject, Dockerfile), moves training flows into `tide.*`, updates LOOKBACK->FUND_LOOKBACK_DAYS, adjusts artifact paths, adds tests and fixtures.
Infrastructure (modular) `infrastructure/__main__.py`, `infrastructure/{config,compute,iam,networking,secrets,storage,notifications,parameters}.py`	Replaces monolithic infra with modular modules: config validation/secret helpers, compute (ECS/ALB/services/taskdefs), IAM (OIDC, roles, policies, SES), networking (VPC/subnets/NAT/endpoints/SGs), secrets (Secrets Manager), storage (S3/ECR), notifications (SNS + Budget); renames many Pulumi exports/SSM paths to new service names.
Docker Compose & Workflows `infrastructure/docker-compose.yaml`, `.github/workflows/launch_infrastructure.yaml`, `maskfile.md`	Compose services renamed to `data-manager`, `portfolio-manager`, `ensemble-manager`; env vars updated to FUND_ prefixes; GH Actions matrix entries renamed to `data_manager`, `portfolio_manager`, `ensemble_manager`; maskfile updates dynamic Dockerfile selection and other deploy command renames.
Tools / Libraries / Tests `libraries/python/...`, `tools/*`, `pyproject.toml`	Adds `structlog` dependency; renames combine_data→combine_equity_bars_and_details and removes CLI block; updates many tests to new module paths; adds/removes infra-related tests; workspace members updated to include `models/tide` and rename Python packages.
Service Dockerfiles `applications/*/Dockerfile`, `tools/Dockerfile`, `models/tide/Dockerfile`	Updates Dockerfiles to reflect renamed package directories and binaries; adds `models/tide` Dockerfile; replaces equitypricemodel references with ensemble_manager/tide where applicable.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client
participant ALB as "ALB (Listener / Rules)"
participant Service as "ECS Service (e.g., data-manager)"
participant Secrets as "Secrets Manager / SSM"
participant DB as "RDS (Postgres)"
Client->>ALB: HTTP(S) request (path-based routing)
ALB->>Service: Route to target group (container port 8080)
Service->>Secrets: Retrieve secrets / SSM params
Service->>DB: Query / write data
DB-->>Service: Response
Service-->>ALB: HTTP response
ALB-->>Client: Response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

PR #618: Overlaps the datamanager → data_manager rename and Docker/Cargo updates; likely touches same crate and helpers (AlpacaClient/S3Client).
PR #615: Repository-wide service renames and workspace restructuring that directly overlap package/module renames and CI/workflow updates.
PR #610: Large infrastructure refactor that rewrites the Pulumi entrypoint and resource structure, closely related to the modular infra introduced here.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Codebase reorganization and cleanup' is overly broad and generic, describing categories of changes rather than specific improvements or main changes.	Use a more specific title that highlights the primary change, such as 'Rename services and refactor infrastructure into modular files' or 'Reorganize codebase: rename services, split infrastructure modules, and consolidate training'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch repository-reorganization-phase-four

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR reorganizes the repository structure and renames services/packages to improve readability, while extracting the TiDE training workflow into a dedicated models/tide Python package and renaming the equity price model service to ensemble_manager. It also refactors Pulumi infrastructure into multiple modules and updates automation/scripts to match the new naming.

Changes:

Introduces models/tide as a standalone Python package with workflow/run/deploy code and its own trainer Docker image.
Renames/realigns application packages (e.g., portfolio_manager, ensemble_manager, data_manager) and updates related tests, Dockerfiles, and workspace configuration.
Refactors Pulumi infrastructure/ into modular files (networking/storage/iam/secrets/notifications/etc.) and updates maskfile.md + GitHub workflow automation.

Reviewed changes

Copilot reviewed 70 out of 106 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tools/pyproject.toml	Removes Prefect dependency from `tools` package.
tools/Dockerfile	Updates build context and PYTHONPATH; still runs Prefect worker.
pyproject.toml	Updates uv workspace members; adds `models/tide` and expands pytest testpaths.
models/tide/tests/test_workflow.py	Updates imports/patch targets to `tide.workflow`.
models/tide/tests/test_trainer.py	Updates imports to `tide.trainer`.
models/tide/tests/test_tide_model.py	Updates imports to `tide.tide_model`.
models/tide/tests/test_tide_data.py	Updates imports to `tide.tide_data`.
models/tide/tests/test_tasks.py	Updates imports/patch targets to `tide.tasks`.
models/tide/tests/test_run.py	Updates imports/patch targets to `tide.run`; updates expected artifact naming.
models/tide/tests/test_notifications.py	Updates imports to `tide.notifications`; renames env vars to `FUND_*`.
models/tide/tests/test_deploy.py	Updates imports/patch targets to `tide.deploy`.
models/tide/tests/conftest.py	Adds shared test fixture for generating raw Polars data.
models/tide/tests/init.py	Package marker for tests.
models/tide/src/tide/workflow.py	New Prefect flow/tasks for sync/prepare/train pipeline; updates env var names + artifact path layout.
models/tide/src/tide/trainer.py	Moves trainer into `tide` package; removes old CLI `__main__` runner.
models/tide/src/tide/tide_model.py	Adds TiDE model implementation (tinygrad) including training/validation utilities.
models/tide/src/tide/tasks.py	Moves training-data preparation utilities into `tide.tasks`; removes old CLI `__main__`.
models/tide/src/tide/run.py	Adds CLI entry (`python -m tide.run`) to execute the training pipeline.
models/tide/src/tide/notifications.py	Updates notification env vars to `FUND_TRAINING_NOTIFICATION_*`.
models/tide/src/tide/deploy.py	Adds deployment helper using Prefect `EntrypointType`; updates env var names.
models/tide/src/tide/init.py	Package marker for `tide`.
models/tide/pyproject.toml	Declares `tide` package and dependencies (prefect/tinygrad/etc.) within the workspace.
models/tide/init.py	Package marker at model root.
models/tide/Dockerfile	Adds CUDA-based trainer image that runs `python -m tide.workflow`.
models/init.py	Package marker for models namespace.
maskfile.md	Updates image build logic, Pulumi imports, ECS service naming, and model train/deploy commands.
libraries/python/tests/test_infrastructure_storage.py	Adds tests asserting storage resources exist in `infrastructure/storage.py`.
libraries/python/tests/test_infrastructure_networking.py	Adds tests asserting NAT alarm configuration in `infrastructure/networking.py`.
libraries/python/tests/test_infrastructure_iam.py	Adds tests asserting IAM policy constraints in `infrastructure/iam.py`.
libraries/python/tests/test_infrastructure_configuration.py	Removes monolithic infra entrypoint tests targeting old `infrastructure/__main__.py` layout.
libraries/python/tests/test_infrastructure_config.py	Adds tests targeting new `infrastructure/config.py`.
libraries/python/tests/test_equity_details_schema.py	Updates import path to `internal.equity_details_schema`.
libraries/python/src/internal/equity_details_schema.py	Adds shared Pandera schema for equity details.
libraries/python/src/internal/combine_data.py	Renames combine function and removes script-style `__main__`.
libraries/python/pyproject.toml	Adds `structlog` dependency to shared `internal` library package.
infrastructure/storage.py	New storage/ECR definitions; migrates S3 bucket resources to `BucketV2`; creates ECR repos + image URIs.
infrastructure/secrets.py	New Secrets Manager definitions and versions using serialized secret objects.
infrastructure/parameters.py	Renames SSM parameter paths (e.g., portfolio-manager, ensemble-manager).
infrastructure/notifications.py	Adds SNS topic subscriptions and AWS budget notifications.
infrastructure/networking.py	New VPC/subnets/NAT/endpoints/security groups + NAT traffic alarm.
infrastructure/iam.py	New IAM roles/policies (GitHub Actions + ECS execution/task roles) + SES + SSM params for training notifications.
infrastructure/docker-compose.yaml	Renames local services (data-manager/portfolio-manager/ensemble-manager) and updates env wiring.
infrastructure/config.py	New Pulumi config loader with validations; centralizes constants and secret config helpers.
applications/portfolio_manager/tests/test_statistical_arbitrage.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_risk_management.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_report.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_regime.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_portfolio_server.py	Updates imports/patch targets to `portfolio_manager.server.*`.
applications/portfolio_manager/tests/test_portfolio_schema.py	Updates imports to `portfolio_manager.portfolio_schema`.
applications/portfolio_manager/tests/test_data_client.py	Updates imports/patch targets to `portfolio_manager.data_client.*`.
applications/portfolio_manager/tests/test_consolidation.py	Updates imports to `portfolio_manager.consolidation`.
applications/portfolio_manager/tests/test_beta.py	Updates imports to `portfolio_manager.beta`.
applications/portfolio_manager/tests/test_alpaca_client.py	Updates imports/patch targets to `portfolio_manager.alpaca_client.*`.
applications/portfolio_manager/src/portfolio_manager/statistical_arbitrage.py	Adds pair selection logic using correlations/z-scores + ranking/greedy selection.
applications/portfolio_manager/src/portfolio_manager/server.py	Updates env var names and switches predictions source to ensemble manager endpoint.
applications/portfolio_manager/src/portfolio_manager/risk_management.py	Adds volatility-parity sizing and beta-neutral optimization for pair weights.
applications/portfolio_manager/src/portfolio_manager/report.py	Adds formatted text reports for regime/beta/consolidation/pairs/portfolio.
applications/portfolio_manager/src/portfolio_manager/regime.py	Adds basic regime classifier (mean reversion vs trending) based on SPY stats.
applications/portfolio_manager/src/portfolio_manager/portfolio_schema.py	Adds Pandera schemas and validation helpers for pairs/portfolio outputs.
applications/portfolio_manager/src/portfolio_manager/exceptions.py	Adds domain exception types.
applications/portfolio_manager/src/portfolio_manager/enums.py	Adds enums for position/trade actions/sides.
applications/portfolio_manager/src/portfolio_manager/data_client.py	Adds HTTP client helpers to fetch data/equity details/SPY series.
applications/portfolio_manager/src/portfolio_manager/consolidation.py	Adds ensemble signal blending + realized volatility computation.
applications/portfolio_manager/src/portfolio_manager/beta.py	Adds market beta computation against SPY and portfolio beta calculation.
applications/portfolio_manager/src/portfolio_manager/alpaca_client.py	Adds Alpaca trading client wrapper + error mapping + rate-limit sleeps.
applications/portfolio_manager/pyproject.toml	Renames package to `portfolio_manager`.
applications/portfolio_manager/Dockerfile	Updates build paths, PYTHONPATH, and entrypoint package/module names.
applications/equitypricemodel/Dockerfile	Removes old equity price model Dockerfile (replaced by ensemble manager + tide trainer).
applications/ensemble_manager/tests/test_server.py	Updates imports/patch targets to `ensemble_manager.server`.
applications/ensemble_manager/tests/test_preprocess.py	Updates imports to `ensemble_manager.preprocess`.
applications/ensemble_manager/tests/test_predictions_schema.py	Updates imports to `ensemble_manager.predictions_schema`.
applications/ensemble_manager/tests/conftest.py	Adds shared Polars raw-data fixture for tests.
applications/ensemble_manager/src/ensemble_manager/server.py	Switches to shared internal schema + tide model/data; updates env var names; defines SSM parameter constant.
applications/ensemble_manager/src/ensemble_manager/preprocess.py	Adds equity bar filtering by average price/volume thresholds.
applications/ensemble_manager/src/ensemble_manager/predictions_schema.py	Adds Pandera schema and checks for predictions outputs.
applications/ensemble_manager/pyproject.toml	Renames package to `ensemble_manager`; depends on `tide`; removes tinygrad/numpy direct deps.
applications/ensemble_manager/Dockerfile	Adds Docker build for ensemble manager service including `models/tide`.
applications/data_manager/tests/test_storage.rs	Updates crate path to `data_manager`.
applications/data_manager/tests/test_state_and_health.rs	Updates crate path to `data_manager`.
applications/data_manager/tests/test_handlers.rs	Updates crate path to `data_manager`.
applications/data_manager/tests/test_errors.rs	Updates crate path to `data_manager`.
applications/data_manager/tests/test_data.rs	Updates crate path to `data_manager`.
applications/data_manager/tests/common/mod.rs	Adds shared LocalStack + server harness utilities for integration tests.
applications/data_manager/src/storage.rs	Implements S3 read/write and DuckDB S3 querying utilities with sanitization helpers.
applications/data_manager/src/state.rs	Adds env-based state initialization (AWS clients + Massive API secrets).
applications/data_manager/src/startup.rs	Updates env var naming; updates service log names.
applications/data_manager/src/router.rs	Adds Axum router module for all endpoints + tracing/Sentry layers.
applications/data_manager/src/predictions.rs	Adds endpoints for saving/querying predictions (URL-encoded JSON query param).
applications/data_manager/src/portfolios.rs	Adds endpoints for saving/querying portfolios and serializing to JSON.
applications/data_manager/src/main.rs	Updates crate name imports and env var naming in tests.
applications/data_manager/src/lib.rs	Exposes module structure for the crate.
applications/data_manager/src/health.rs	Adds health endpoint implementation.
applications/data_manager/src/errors.rs	Adds unified error enum for DuckDB/credentials/Polars/etc.
applications/data_manager/src/equity_details.rs	Adds GET equity details CSV and stub sync endpoint (501).
applications/data_manager/src/equity_bars.rs	Adds sync/query endpoints for equity bars, including Massive API integration.
applications/data_manager/src/data.rs	Adds DataFrame builders and normalization for bars/predictions/portfolios/details.
applications/data_manager/bacon.toml	Adds Rust dev tooling config for `bacon`.
applications/data_manager/Dockerfile	Updates build paths and binary naming to `data_manager`.
applications/data_manager/Cargo.toml	Renames crate/lib/bin from `datamanager` to `data_manager`.
Cargo.toml	Updates Rust workspace member path to `applications/data_manager`.
Cargo.lock	Updates crate name entry to `data_manager`.
.github/workflows/launch_infrastructure.yaml	Updates service matrix to new app names/paths and continues to call mask build/push.

Comments suppressed due to low confidence (2)

tools/pyproject.toml:11

tools/Dockerfile runs prefect worker start via uv run --package tools ..., but the tools package no longer declares Prefect as a dependency. This will make the training worker image fail at runtime because the prefect CLI won't be installed. Add prefect>=3,<4 back to tools/pyproject.toml, or update the Dockerfile/entrypoint to run Prefect from a package that depends on it.
applications/ensemble_manager/src/ensemble_manager/server.py:64
MODEL_VERSION_SSM_PARAMETER is set to /fund/ensemble_manager/model_version, but the infrastructure SSM parameter is created at /fund/ensemble-manager/model_version (dash, not underscore). With the current value, the service will never find the parameter and will always fall back to latest discovery. Update this constant to match the actual parameter name.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

applications/portfolio_manager/Dockerfile (1)

9-31: ⚠️ Potential issue | 🟠 Major

Drop root before starting the Python server.

The final stage never switches away from root, so uvicorn and the application keep full container privileges at runtime. There isn't anything after COPY --from=builder /app /app that still needs root.

🔒 Proposed hardening

 COPY --from=builder /app /app
+RUN useradd --system --uid 10001 --create-home appuser && \
+    chown -R appuser:appuser /app
+USER appuser
 
 EXPOSE 8080
 
 ENTRYPOINT ["uv", "run", "--package", "portfolio_manager", "uvicorn", "portfolio_manager.server:application", "--host", "0.0.0.0", "--port", "8080", "--app-dir", "applications/portfolio_manager/src"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@applications/portfolio_manager/Dockerfile` around lines 9 - 31, The final
stage leaves the container running as root; after COPY --from=builder /app /app
create a non-root user (e.g., appuser), set appropriate ownership on /app (chown
/app and any runtime dirs), and switch to that user with USER before the
ENTRYPOINT so uv and the portfolio_manager app run unprivileged; update ENV/PATH
only if needed and ensure no subsequent steps require root privileges after the
switch.

applications/data_manager/Dockerfile (1)

34-84: ⚠️ Potential issue | 🟠 Major

The path rename is fine, but the runtime image still runs as root.

The final stage never sets USER, so data_manager keeps full container privileges at runtime even though it only binds to port 8080. That leaves avoidable blast radius if the process or one of its native dependencies is compromised.
🔒 Proposed hardening
 COPY --from=builder /tmp/data_manager /usr/local/bin/data_manager
+RUN useradd --system --uid 10001 --create-home appuser && \
+    chown appuser:appuser /usr/local/bin/data_manager /app
+USER appuser
 
 EXPOSE 8080
 
 ENTRYPOINT ["/usr/local/bin/data_manager"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@applications/data_manager/Dockerfile` around lines 34 - 84, The final
Dockerfile stage leaves the process running as root; create a non-root user and
group (e.g. data_manager user), chown the installed binary and any required
runtime dirs to that UID/GID, and add a USER instruction before ENTRYPOINT so
/usr/local/bin/data_manager runs unprivileged; update any file permissions
(binary and /app if necessary) and ensure the chosen user has no sudo/root
privileges and a locked shell.

models/tide/src/tide/notifications.py (1)

19-23: ⚠️ Potential issue | 🟠 Major

Avoid logging raw sender/recipient emails.

This logs email addresses directly in warning/exception paths, which is a privacy/compliance risk. Log booleans/counts instead of PII.

🔧 Suggested fix

     if not sender_email or not recipient_emails_raw:
         logger.warning(
             "Notification emails not configured, skipping",
-            sender_email=sender_email,
-            recipient_emails=recipient_emails_raw,
+            has_sender_email=bool(sender_email),
+            has_recipient_emails=bool(recipient_emails_raw),
         )
         return
@@
     except Exception:
         logger.exception(
             "Failed to send training notification",
-            sender_email=sender_email,
-            recipients=recipient_emails,
+            has_sender_email=bool(sender_email),
+            recipient_count=len(recipient_emails),
             state=state_name,
         )

Also applies to: 81-85

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@models/tide/src/tide/notifications.py` around lines 19 - 23, The warning
currently logs raw PII (sender_email and recipient_emails_raw); change the
logger.warning call(s) (the one shown and the similar call around lines 81-85)
to avoid printing email addresses and instead log non-PII indicators such as
whether sender_email is set (e.g., bool(sender_email)) and the recipient count
(e.g., len(recipient_emails_raw)) or a masked placeholder; update the logger
call signatures to pass these booleans/counts (and a short descriptive message)
rather than the raw sender_email/recipient_emails_raw values.

models/tide/tests/test_notifications.py (1)

41-50: ⚠️ Potential issue | 🟡 Minor

Clear os.environ in these notification tests.

patch.dict keeps the rest of the process environment by default. That can mask a broken env-var rename if the runner already has notification variables set. Pass clear=True so each case exercises only the FUND_TRAINING_NOTIFICATION_* keys it declares.
🧪 Representative fix
         patch.dict(
             "os.environ",
             {
                 "FUND_TRAINING_NOTIFICATION_SENDER_EMAIL": "sender@example.com",
                 "FUND_TRAINING_NOTIFICATION_RECIPIENT_EMAILS": "recipient@example.com",
             },
+            clear=True,
         ),
Apply the same change to the other patch.dict("os.environ", ...) blocks in this file.
Also applies to: 76-85, 104-115, 133-142, 153-162, 176-186
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/tide/tests/test_notifications.py` around lines 41 - 50, The tests in
test_notifications.py use patch.dict("os.environ", {...}) without clearing the
rest of the process env which can hide broken env-var renames; update each
patch.dict(...) call that sets FUND_TRAINING_NOTIFICATION_SENDER_EMAIL and
FUND_TRAINING_NOTIFICATION_RECIPIENT_EMAILS (and the other patches at the
locations that set FUND_TRAINING_NOTIFICATION_* keys) to pass clear=True so only
the supplied keys exist during the test — change patch.dict("os.environ", {...})
to patch.dict("os.environ", {...}, clear=True) for every such block in the file.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@applications/ensemble_manager/Dockerfile`:
- Around line 1-33: The container currently runs as root; add an unprivileged
runtime user and switch to it before starting the app: in the Dockerfile (server
stage) create a non-root user (e.g., "app" or "uvuser") and group, chown /app
and any runtime dirs (referenced by WORKDIR /app) to that user, set HOME if
needed, and add a USER instruction so the ENTRYPOINT (uv run ... defined in
ENTRYPOINT) runs unprivileged; ensure the created user has no shell or sudo and
that file ownership permits reading/writing by that user.

In `@applications/ensemble_manager/src/ensemble_manager/server.py`:
- Around line 62-63: The default DATAMANAGER_BASE_URL environment fallback is
using the wrong hostname; update the default string used by DATAMANAGER_BASE_URL
(and the related env var name FUND_DATAMANAGER_BASE_URL) from
"http://datamanager:8080" to "http://data-manager:8080" so it matches the
docker-compose service name (data-manager) used by the infrastructure; locate
the DATAMANAGER_BASE_URL constant in ensemble_manager/server.py and change only
the default URL string.

In `@infrastructure/compute.py`:
- Around line 1-2: The current import "from secrets import data_manager_secret,
portfolio_manager_secret, shared_secret" shadows Python's stdlib secrets module;
rename the local module (e.g., to secrets_manager or aws_secrets) and update
this import to "from secrets_manager import data_manager_secret,
portfolio_manager_secret, shared_secret" (or the chosen name) and adjust any
other references to that module across the repo so stdlib secrets remains
available for functions like secrets.token_hex().
- Around line 127-128: The hardcoded acm_certificate_arn = None should be
replaced with a configurable value read from your config module; update
infrastructure/compute.py to import and use the config entry (e.g.,
acm_certificate_arn = config.get("acm_certificate_arn") or
config.acm_certificate_arn) instead of the literal None and ensure the config.py
exposes an optional acm_certificate_arn setting with a default of None so HTTPS
can be toggled without code changes; reference the acm_certificate_arn symbol in
compute.py and add the new optional acm_certificate_arn key in config.py with
default None.

In `@infrastructure/config.py`:
- Around line 72-75: The config fetch uses stack_config.require_object(...) to
read budgetAlertEmailAddresses which loses Pulumi secret tracking; change the
call that populates budget_alert_email_addresses to use
stack_config.require_secret_object(...) so the returned value remains a secret
Output and is redacted in state. Locate the assignment to
budget_alert_email_addresses and replace require_object with
require_secret_object while keeping the existing cast and variable name.

In `@infrastructure/docker-compose.yaml`:
- Around line 40-41: The docker-compose service "ensemble-manager" is using the
wrong environment variable name; update the image reference to use the renamed
ARN variable AWS_ECR_ENSEMBLE_MANAGER_SERVER_IMAGE_ARN instead of
AWS_ECR_EQUITY_PRICE_MODEL_SERVER_IMAGE_ARN so it follows the established
AWS_ECR_{SERVICE_NAME}_SERVER_IMAGE_ARN pattern; locate the "ensemble-manager"
service block and replace the image value accordingly.

In `@infrastructure/iam.py`:
- Around line 2-9: The local module name "secrets" shadows Python's stdlib
module; update imports used in this file (data_manager_secret,
data_manager_secret_name, portfolio_manager_secret,
portfolio_manager_secret_name, shared_secret, shared_secret_name) to avoid the
conflict — either rename the local module (e.g., infrastructure_secrets.py) and
import its symbols, or switch to an explicit relative import (use the
package-local module via a relative import or alias the local module) so stdlib
"secrets" is not shadowed and the references to those symbols
(data_manager_secret, portfolio_manager_secret, shared_secret, etc.) still
resolve correctly.
- Around line 92-103: The IAM policy statement with "Sid":
"ManageEC2ECSELBBudgetsAndServiceDiscovery" currently uses broad service
wildcards (ec2:*, ecs:*, elasticloadbalancing:*, budgets:*, servicediscovery:*)
and Resource:"*", so replace the wildcards with a minimal set of required
actions (enumerate create/update/delete and necessary describe/list actions) for
each service (e.g., explicit ElasticLoadBalancing actions instead of
elasticloadbalancing:*) and where possible constrain "Resource" to specific ARN
patterns (load balancer, target group, ECS cluster/service ARNs, budget ARNs,
servicediscovery namespaces) while retaining global/wildcard only for actions
that truly require it; update the policy block identified by the Sid
"ManageEC2ECSELBBudgetsAndServiceDiscovery" in infrastructure/iam.py accordingly
and verify against the AWS service authorization reference for supported
resource-level permissions.

In `@infrastructure/networking.py`:
- Around line 95-101: The NAT Gateway is created only as nat using
subnet_id=public_subnet_1.id which creates a single-AZ availability risk; either
create a second NAT Gateway (e.g., nat_ha) in public_subnet_2 and update route
tables for private subnets to use the AZ-local NAT, or explicitly document and
accept the single-NAT cost trade-off—modify the resource creation to add a
second aws.ec2.NatGateway (reference public_subnet_2.id and a new eip) and
ensure route table entries point private subnets to the appropriate NAT gateway
for HA.

In `@infrastructure/notifications.py`:
- Around line 1-50: The budget notifications are configured to send emails
directly via subscriber_email_addresses on the aws.budgets.Budget resource
instead of using the infrastructure_alerts_topic SNS topic; update the Budget
resource to use subscriber_sns_topic_arns=[infrastructure_alerts_topic.arn] and
add an SNS TopicPolicy on infrastructure_alerts_topic that grants Principal
"budgets.amazonaws.com" permission to Publish so AWS Budgets can post to the
topic; alternatively remove the infrastructure_alerts_topic and its
TopicSubscription loop if you prefer direct-email notifications instead of
SNS-based delivery.

In `@infrastructure/secrets.py`:
- Around line 34-68: Update the secret config key names passed into
serialize_secret_config_object to match the new service naming convention:
replace "datamanagerSecretValue" with a consistent key like
"dataManagerSecretValue" or "data_manager_secret_value" in the
aws.secretsmanager.SecretVersion call that constructs
"data_manager_secret_version", and replace "portfoliomanagerSecretValue" with
"portfolioManagerSecretValue" or "portfolio_manager_secret_value" in the
aws.secretsmanager.SecretVersion call for "portfolio_manager_secret_version";
keep the "sharedSecretValue" key decision consistent as well and then update the
corresponding Pulumi stack configuration entries so the code in
serialize_secret_config_object and any consumers read the new keys.
- Around line 13-32: The three Secret resources data_manager_secret,
portfolio_manager_secret, and shared_secret currently set
recovery_window_in_days=0 which prevents recovery after deletion; update
secrets.py to make recovery_window_in_days a configurable non-zero value for
production (e.g., 7–30 days) by introducing a variable (e.g.,
recovery_window_days or secrets_recovery_window) read from configuration or
environment and use it for these aws.secretsmanager.Secret creations so
production stacks get a safe recovery window while allowing 0 for local/dev via
config.

In `@infrastructure/storage.py`:
- Around line 97-161: Add ECR lifecycle policies to each repository to prevent
unbounded image accumulation; for each repository resource
(data_manager_repository, portfolio_manager_repository,
ensemble_manager_repository, tide_trainer_repository,
training_server_repository, training_worker_repository) create a corresponding
aws.ecr.LifecyclePolicy resource (e.g., data_manager_repository_lifecycle) that
references repository=<repository>.name and uses a JSON policy (via json.dumps)
to expire old images (for example keep last 10 images with rulePriority 1,
selection tagStatus "any", countType "imageCountMoreThan", countNumber 10, and
action type "expire"); repeat this pattern after each repository definition to
manage storage costs.

In `@libraries/python/tests/test_infrastructure_iam.py`:
- Around line 20-24: The assertions in tests/test_infrastructure_iam.py are
brittle because they check exact substrings in the rendered Terraform/JSON
(variable infrastructure_iam) which breaks on harmless formatting changes;
update the two assertions that reference "github_actions_infrastructure_policy"
and the managed_policy_arns check to use whitespace-tolerant regex matching
instead of direct substring checks (e.g. use re.search on infrastructure_iam
with a pattern that matches the quoted policy name and a separate pattern like
r"managed_policy_arns\s*=\s*\[.*github_actions_infrastructure_policy\.arn.*\]"
to allow arbitrary whitespace/formatting), and apply the same change to the
similar assertions at lines 30-32 so the tests validate intent not formatting.

In `@libraries/python/tests/test_infrastructure_storage.py`:
- Around line 7-22: The tests currently grep the raw file text via
load_infrastructure_storage/INFRASTRUCTURE_STORAGE_PATH which is brittle;
instead parse or import the module and assert on actual exported resource
definitions: update test_storage_contains_s3_bucket_encryption_resources and
test_storage_contains_s3_public_access_block_resources to load the module (or
ast.parse INFRASTRUCTURE_STORAGE_PATH.read_text()) and inspect the AST or the
module object for the presence of the resource definitions (e.g. names
"data_bucket_encryption", "model_artifacts_bucket_encryption",
"data_bucket_public_access_block", "model_artifacts_bucket_public_access_block")
rather than asserting on raw quoted strings — use ast to find assignments/dict
keys or importlib to get the exported dict/object and assert those keys exist.

In `@maskfile.md`:
- Around line 71-75: The build chooses dockerfile paths using
applications/${application_name}/Dockerfile which fails for hyphenated option
names (e.g., portfolio-manager) because directories are underscore-based
(portfolio_manager); update the logic that builds dockerfile to normalize
application_name (replace '-' with '_') before composing the path (use the
normalized var wherever dockerfile path is constructed, e.g., the dockerfile
variable assignment and any other uses that reference application_name).

In `@models/tide/Dockerfile`:
- Around line 17-43: The trainer stage in the Dockerfile runs as root; create an
unprivileged runtime user (e.g., "trainer" or "tideuser") inside the trainer
stage, give it a non-zero UID/GID, chown the application files (the /app tree
copied from the builder) and any runtime directories to that user, and add a
USER instruction to switch to that account before the ENTRYPOINT so the
container runs unprivileged; ensure the created user has a home or workable
WORKDIR and that ENTRYPOINT ("uv run --package tide python -m tide.workflow")
remains executable by that user.
- Around line 1-43: The Dockerfile copies a virtual environment built in the
builder stage into the trainer stage which is brittle because uv creates a .venv
with absolute paths; instead rebuild the Python environment in the final trainer
stage: after copying /bin/uv and the source (the COPY --from=builder /app /app
step) run uv sync --no-dev --package tide in the trainer stage (or switch the
trainer base to the identical python:3.12.10-slim image) so the runtime .venv is
created against the trainer's Python install; ensure this happens before
ENTRYPOINT uses uv to run tide.workflow.

---

Outside diff comments:
In `@applications/data_manager/Dockerfile`:
- Around line 34-84: The final Dockerfile stage leaves the process running as
root; create a non-root user and group (e.g. data_manager user), chown the
installed binary and any required runtime dirs to that UID/GID, and add a USER
instruction before ENTRYPOINT so /usr/local/bin/data_manager runs unprivileged;
update any file permissions (binary and /app if necessary) and ensure the chosen
user has no sudo/root privileges and a locked shell.

In `@applications/portfolio_manager/Dockerfile`:
- Around line 9-31: The final stage leaves the container running as root; after
COPY --from=builder /app /app create a non-root user (e.g., appuser), set
appropriate ownership on /app (chown /app and any runtime dirs), and switch to
that user with USER before the ENTRYPOINT so uv and the portfolio_manager app
run unprivileged; update ENV/PATH only if needed and ensure no subsequent steps
require root privileges after the switch.

In `@models/tide/src/tide/notifications.py`:
- Around line 19-23: The warning currently logs raw PII (sender_email and
recipient_emails_raw); change the logger.warning call(s) (the one shown and the
similar call around lines 81-85) to avoid printing email addresses and instead
log non-PII indicators such as whether sender_email is set (e.g.,
bool(sender_email)) and the recipient count (e.g., len(recipient_emails_raw)) or
a masked placeholder; update the logger call signatures to pass these
booleans/counts (and a short descriptive message) rather than the raw
sender_email/recipient_emails_raw values.

In `@models/tide/tests/test_notifications.py`:
- Around line 41-50: The tests in test_notifications.py use
patch.dict("os.environ", {...}) without clearing the rest of the process env
which can hide broken env-var renames; update each patch.dict(...) call that
sets FUND_TRAINING_NOTIFICATION_SENDER_EMAIL and
FUND_TRAINING_NOTIFICATION_RECIPIENT_EMAILS (and the other patches at the
locations that set FUND_TRAINING_NOTIFICATION_* keys) to pass clear=True so only
the supplied keys exist during the test — change patch.dict("os.environ", {...})
to patch.dict("os.environ", {...}, clear=True) for every such block in the file.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8a55260a-86c9-42fc-8aa0-499713b9722a

📥 Commits

Reviewing files that changed from the base of the PR and between eef2f58 and 9fcaddd.

⛔ Files ignored due to path filters (2)

Cargo.lock is excluded by !**/*.lock
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (104)

.github/workflows/launch_infrastructure.yaml
Cargo.toml
applications/data_manager/Cargo.toml
applications/data_manager/Dockerfile
applications/data_manager/bacon.toml
applications/data_manager/src/data.rs
applications/data_manager/src/equity_bars.rs
applications/data_manager/src/equity_details.rs
applications/data_manager/src/errors.rs
applications/data_manager/src/health.rs
applications/data_manager/src/lib.rs
applications/data_manager/src/main.rs
applications/data_manager/src/portfolios.rs
applications/data_manager/src/predictions.rs
applications/data_manager/src/router.rs
applications/data_manager/src/startup.rs
applications/data_manager/src/state.rs
applications/data_manager/src/storage.rs
applications/data_manager/tests/common/mod.rs
applications/data_manager/tests/test_data.rs
applications/data_manager/tests/test_errors.rs
applications/data_manager/tests/test_handlers.rs
applications/data_manager/tests/test_state_and_health.rs
applications/data_manager/tests/test_storage.rs
applications/ensemble_manager/Dockerfile
applications/ensemble_manager/pyproject.toml
applications/ensemble_manager/src/ensemble_manager/predictions_schema.py
applications/ensemble_manager/src/ensemble_manager/preprocess.py
applications/ensemble_manager/src/ensemble_manager/server.py
applications/ensemble_manager/tests/conftest.py
applications/ensemble_manager/tests/test_predictions_schema.py
applications/ensemble_manager/tests/test_preprocess.py
applications/ensemble_manager/tests/test_server.py
applications/equitypricemodel/Dockerfile
applications/portfolio_manager/Dockerfile
applications/portfolio_manager/pyproject.toml
applications/portfolio_manager/src/portfolio_manager/alpaca_client.py
applications/portfolio_manager/src/portfolio_manager/beta.py
applications/portfolio_manager/src/portfolio_manager/consolidation.py
applications/portfolio_manager/src/portfolio_manager/data_client.py
applications/portfolio_manager/src/portfolio_manager/enums.py
applications/portfolio_manager/src/portfolio_manager/exceptions.py
applications/portfolio_manager/src/portfolio_manager/portfolio_schema.py
applications/portfolio_manager/src/portfolio_manager/regime.py
applications/portfolio_manager/src/portfolio_manager/report.py
applications/portfolio_manager/src/portfolio_manager/risk_management.py
applications/portfolio_manager/src/portfolio_manager/server.py
applications/portfolio_manager/src/portfolio_manager/statistical_arbitrage.py
applications/portfolio_manager/tests/test_alpaca_client.py
applications/portfolio_manager/tests/test_beta.py
applications/portfolio_manager/tests/test_consolidation.py
applications/portfolio_manager/tests/test_data_client.py
applications/portfolio_manager/tests/test_portfolio_schema.py
applications/portfolio_manager/tests/test_portfolio_server.py
applications/portfolio_manager/tests/test_regime.py
applications/portfolio_manager/tests/test_report.py
applications/portfolio_manager/tests/test_risk_management.py
applications/portfolio_manager/tests/test_statistical_arbitrage.py
infrastructure/__main__.py
infrastructure/compute.py
infrastructure/config.py
infrastructure/docker-compose.yaml
infrastructure/iam.py
infrastructure/networking.py
infrastructure/notifications.py
infrastructure/parameters.py
infrastructure/secrets.py
infrastructure/storage.py
libraries/python/pyproject.toml
libraries/python/src/internal/combine_data.py
libraries/python/src/internal/equity_details_schema.py
libraries/python/tests/test_equity_details_schema.py
libraries/python/tests/test_infrastructure_config.py
libraries/python/tests/test_infrastructure_configuration.py
libraries/python/tests/test_infrastructure_iam.py
libraries/python/tests/test_infrastructure_networking.py
libraries/python/tests/test_infrastructure_storage.py
maskfile.md
models/__init__.py
models/tide/Dockerfile
models/tide/__init__.py
models/tide/pyproject.toml
models/tide/src/tide/__init__.py
models/tide/src/tide/deploy.py
models/tide/src/tide/notifications.py
models/tide/src/tide/run.py
models/tide/src/tide/tasks.py
models/tide/src/tide/tide_data.py
models/tide/src/tide/tide_model.py
models/tide/src/tide/trainer.py
models/tide/src/tide/workflow.py
models/tide/tests/__init__.py
models/tide/tests/conftest.py
models/tide/tests/test_deploy.py
models/tide/tests/test_notifications.py
models/tide/tests/test_run.py
models/tide/tests/test_tasks.py
models/tide/tests/test_tide_data.py
models/tide/tests/test_tide_model.py
models/tide/tests/test_trainer.py
models/tide/tests/test_workflow.py
pyproject.toml
tools/Dockerfile
tools/pyproject.toml

💤 Files with no reviewable changes (3)

tools/pyproject.toml
applications/equitypricemodel/Dockerfile
libraries/python/tests/test_infrastructure_configuration.py

…t container users - Fix default DATAMANAGER_BASE_URL fallback in ensemble_manager/server.py from "http://datamanager:8080" to "http://data-manager:8080" to match the docker-compose service name after the renaming in this PR - Fix ECR repository names in infrastructure/storage.py to use underscore-prefixed application names (fund/data_manager-server, fund/portfolio_manager-server, fund/ensemble_manager-server) to match the CI matrix application_name values used in the mask push command (fund/${application_name}-${stage_name}) - Fix ensemble-manager docker-compose service to reference AWS_ECR_ENSEMBLE_MANAGER_SERVER_IMAGE_ARN instead of the stale AWS_ECR_EQUITY_PRICE_MODEL_SERVER_IMAGE_ARN - Add non-root appuser (uid 10001) to applications/ensemble_manager/Dockerfile server stage - Add non-root appuser (uid 10001) to models/tide/Dockerfile trainer stage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

models/tide/Dockerfile (1)

39-48: ⚠️ Potential issue | 🟠 Major

Rebuild .venv in trainer instead of copying it from builder.

Line 39 copies /app (including the builder-created environment), and Line 48 runs it via uv run in a different base image. This is brittle and can break due to interpreter/path/ABI mismatch between stages. Recreate the environment in the trainer stage.

Suggested fix

 COPY --from=builder /app /app
+RUN rm -rf /app/.venv && uv sync --no-dev --package tide
 
 ENV PYTHONPATH=/app/models/tide/src

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@models/tide/Dockerfile` around lines 39 - 48, The Dockerfile currently copies
the entire /app from the builder (COPY --from=builder /app /app) which includes
the builder-created virtualenv and can cause interpreter/ABI mismatches at
runtime; instead, remove copying of the builder .venv and rebuild the virtual
environment in the trainer stage: ensure the trainer stage installs dependencies
and creates a fresh venv (or uses system site-packages) prior to setting ENV
PYTHONPATH=/app/models/tide/src and before setting ENTRYPOINT
["uv","run","--package","tide","python","-m","tide.workflow"]; update any
USER/chown steps (useradd appuser) to run after venv creation so the environment
is owned by appuser.

applications/ensemble_manager/Dockerfile (1)

36-38: 🧹 Nitpick | 🔵 Trivial

Add a Docker HEALTHCHECK for liveness/readiness.

This is still missing, so supervisors cannot detect a hung-but-running process.

💡 Proposed patch

 USER appuser
+
+HEALTHCHECK --interval=30s --timeout=3s --start-period=20s --retries=3 \
+  CMD python -c "import socket; s = socket.create_connection(('127.0.0.1', 8080), 2); s.close()" || exit 1
+
 EXPOSE 8080

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@applications/ensemble_manager/Dockerfile` around lines 36 - 38, Add a Docker
HEALTHCHECK to the Dockerfile so supervisors can detect hung-but-running
processes; add a HEALTHCHECK instruction after EXPOSE/ENTRYPOINT that
periodically probes the running uvicorn app started by the ENTRYPOINT
(ensemble_manager.server:application) — prefer an HTTP GET against a health
endpoint (e.g., /health or /) on port 8080 and return success only on HTTP 200,
with sensible interval, timeout and retries; if no HTTP health endpoint exists,
use a TCP check against port 8080 instead. Ensure the HEALTHCHECK command is
idempotent and exits with 0 on healthy and non-0 on unhealthy so orchestration
can mark the container accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@applications/ensemble_manager/src/ensemble_manager/server.py`:
- Line 65: The SSM parameter constant MODEL_VERSION_SSM_PARAMETER uses an
underscore but the infra defines a hyphen; update the constant value from
"/fund/ensemble_manager/model_version" to "/fund/ensemble-manager/model_version"
so the SSM lookup (the code that reads MODEL_VERSION_SSM_PARAMETER) will find
the correct parameter and avoid falling back to "latest".

In `@infrastructure/storage.py`:
- Around line 97-160: The ECR repositories (data_manager_repository,
portfolio_manager_repository, ensemble_manager_repository,
tide_trainer_repository, training_server_repository, training_worker_repository)
are configured with mutable tags which allows downstream exports to reference
unstable ":latest" semantics; change the exports to return only the repository
URI (not a full image:tag), stop exporting any deployable image URIs with
mutable tags, and ensure CI injects a concrete image tag or digest at deploy
time; additionally set image_tag_mutability="IMMUTABLE" on these
aws.ecr.Repository resources (and the similar repositories referenced later) so
pushed tags cannot be overwritten.

---

Duplicate comments:
In `@applications/ensemble_manager/Dockerfile`:
- Around line 36-38: Add a Docker HEALTHCHECK to the Dockerfile so supervisors
can detect hung-but-running processes; add a HEALTHCHECK instruction after
EXPOSE/ENTRYPOINT that periodically probes the running uvicorn app started by
the ENTRYPOINT (ensemble_manager.server:application) — prefer an HTTP GET
against a health endpoint (e.g., /health or /) on port 8080 and return success
only on HTTP 200, with sensible interval, timeout and retries; if no HTTP health
endpoint exists, use a TCP check against port 8080 instead. Ensure the
HEALTHCHECK command is idempotent and exits with 0 on healthy and non-0 on
unhealthy so orchestration can mark the container accordingly.

In `@models/tide/Dockerfile`:
- Around line 39-48: The Dockerfile currently copies the entire /app from the
builder (COPY --from=builder /app /app) which includes the builder-created
virtualenv and can cause interpreter/ABI mismatches at runtime; instead, remove
copying of the builder .venv and rebuild the virtual environment in the trainer
stage: ensure the trainer stage installs dependencies and creates a fresh venv
(or uses system site-packages) prior to setting ENV
PYTHONPATH=/app/models/tide/src and before setting ENTRYPOINT
["uv","run","--package","tide","python","-m","tide.workflow"]; update any
USER/chown steps (useradd appuser) to run after venv creation so the environment
is owned by appuser.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 89835f82-e8ef-41fe-9613-c4faac099273

📥 Commits

Reviewing files that changed from the base of the PR and between 9fcaddd and aea25fc.

📒 Files selected for processing (5)

applications/ensemble_manager/Dockerfile
applications/ensemble_manager/src/ensemble_manager/server.py
infrastructure/docker-compose.yaml
infrastructure/storage.py
models/tide/Dockerfile

Copilot

Pull request overview

Large-scale reorganization to improve naming consistency and separate concerns across infrastructure, applications, and model training (introducing the new models/tide package/workflow and renaming services/packages for readability).

Changes:

Restructures Pulumi infrastructure/ into modular files and updates resource/service naming.
Introduces models/tide as a first-class workspace package with its own Docker image, workflow, and tests.
Renames/realigns application packages (e.g., portfolio_manager, ensemble_manager, data_manager) and updates tests/imports accordingly.

Reviewed changes

Copilot reviewed 72 out of 108 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tools/pyproject.toml	Removes Prefect dependency from `tools` package.
tools/Dockerfile	Updates build context/PYTHONPATH for tooling image; still defines Prefect worker stage.
pyproject.toml	Updates uv workspace members; adds Tide tests path.
models/tide/tests/test_workflow.py	Updates workflow import/patch targets to `tide.workflow`.
models/tide/tests/test_trainer.py	Switches trainer import to `tide.trainer`.
models/tide/tests/test_tide_model.py	Switches model import to `tide.tide_model`.
models/tide/tests/test_tide_data.py	Switches data import to `tide.tide_data`.
models/tide/tests/test_tasks.py	Switches task import/patch targets to `tide.tasks`.
models/tide/tests/test_run.py	Switches run import/patch targets to `tide.run`; updates expected artifact naming.
models/tide/tests/test_notifications.py	Switches notifications import and env var names to `FUND_*`.
models/tide/tests/test_deploy.py	Switches deploy import/patch targets to `tide.deploy`.
models/tide/tests/conftest.py	Adds shared test fixture for generating raw data.
models/tide/tests/init.py	Marks Tide tests package.
models/tide/src/tide/workflow.py	Implements Prefect flow/tasks for Tide training pipeline; updates env var names and artifact paths.
models/tide/src/tide/trainer.py	Refactors trainer module; removes `__main__` script behavior.
models/tide/src/tide/tide_model.py	Adds TiDE model implementation, training utilities, and serialization.
models/tide/src/tide/tasks.py	Refactors training-data prep tasks; removes `__main__` script behavior.
models/tide/src/tide/run.py	Updates to call `tide.workflow.training_pipeline`; renames env vars.
models/tide/src/tide/notifications.py	Renames notification env vars to `FUND_*`.
models/tide/src/tide/deploy.py	Adds flow deployment helper using module-path entrypoint.
models/tide/src/tide/init.py	Marks Tide package.
models/tide/pyproject.toml	New Tide package definition and dependencies.
models/tide/init.py	Marks models/tide as a module.
models/tide/Dockerfile	Adds CUDA-based trainer image for Tide workflow.
models/init.py	Marks models as a Python package.
maskfile.md	Updates infra/image commands, imports, service naming, and model commands for Tide.
libraries/python/tests/test_infrastructure_storage.py	Adds tests asserting infra storage module contains key resources.
libraries/python/tests/test_infrastructure_networking.py	Adds tests asserting networking module contains NAT alarm configuration.
libraries/python/tests/test_infrastructure_iam.py	Adds tests asserting IAM module constraints/policy wiring.
libraries/python/tests/test_infrastructure_configuration.py	Removes monolithic infra entrypoint tests (superseded by modular tests).
libraries/python/tests/test_infrastructure_config.py	Adds config-focused infra tests (Pulumi config + OIDC claim constants).
libraries/python/tests/test_equity_details_schema.py	Updates import path to `internal.equity_details_schema`.
libraries/python/src/internal/equity_details_schema.py	Introduces shared Pandera schema for equity details.
libraries/python/src/internal/combine_data.py	Renames combine function and removes CLI entrypoint.
libraries/python/pyproject.toml	Adds `structlog` dependency to shared Python library.
infrastructure/storage.py	Adds storage module (S3 + ECR repos + lifecycle policies).
infrastructure/secrets.py	Adds secrets module (Secrets Manager + versions).
infrastructure/parameters.py	Renames parameters and scopes paths by stack name.
infrastructure/notifications.py	Adds budget/SNS alerting resources.
infrastructure/networking.py	Adds VPC/subnets/SGs/endpoints + NAT alarm.
infrastructure/iam.py	Adds IAM/OIDC/roles/policies + SES identity/SSM params for notifications.
infrastructure/docker-compose.yaml	Renames compose services and updates base URL env vars.
infrastructure/config.py	Centralizes Pulumi config loading/validation and shared constants.
infrastructure/Pulumi.production.yaml	Updates encrypted config values (secret rotations/renames).
applications/portfolio_manager/tests/test_statistical_arbitrage.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_risk_management.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_report.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_regime.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_portfolio_server.py	Updates imports/patch targets to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_portfolio_schema.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_data_client.py	Updates imports/patch targets to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_consolidation.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_beta.py	Updates imports to `portfolio_manager.*`.
applications/portfolio_manager/tests/test_alpaca_client.py	Updates imports/patch targets to `portfolio_manager.*`.
applications/portfolio_manager/src/portfolio_manager/statistical_arbitrage.py	Adds pair selection logic for stat arb based on correlations/z-scores/signals.
applications/portfolio_manager/src/portfolio_manager/server.py	Renames env vars and updates predictions base URL to ensemble manager.
applications/portfolio_manager/src/portfolio_manager/risk_management.py	Adds vol-parity sizing + beta-neutral optimization.
applications/portfolio_manager/src/portfolio_manager/report.py	Adds formatted reporting for regime/beta/signals/pairs/portfolio sizing.
applications/portfolio_manager/src/portfolio_manager/regime.py	Adds regime classifier based on SPY volatility/autocorrelation.
applications/portfolio_manager/src/portfolio_manager/portfolio_schema.py	Adds Pandera schemas and custom checks for pairs/portfolio outputs.
applications/portfolio_manager/src/portfolio_manager/exceptions.py	Adds domain-specific exceptions.
applications/portfolio_manager/src/portfolio_manager/enums.py	Adds enums for sides/actions.
applications/portfolio_manager/src/portfolio_manager/data_client.py	Adds data-manager HTTP client helpers.
applications/portfolio_manager/src/portfolio_manager/consolidation.py	Adds prediction consolidation + realized volatility computation.
applications/portfolio_manager/src/portfolio_manager/beta.py	Adds market beta estimation and portfolio beta computation.
applications/portfolio_manager/src/portfolio_manager/alpaca_client.py	Adds Alpaca client wrapper with error mapping and rate limiting.
applications/portfolio_manager/pyproject.toml	Renames package to `portfolio_manager`.
applications/portfolio_manager/Dockerfile	Updates COPY paths, PYTHONPATH, and uv package entrypoint.
applications/equitypricemodel/Dockerfile	Removes legacy equity price model Dockerfile.
applications/ensemble_manager/tests/test_server.py	Updates imports/patch targets to `ensemble_manager.*`.
applications/ensemble_manager/tests/test_preprocess.py	Updates import to `ensemble_manager.preprocess`.
applications/ensemble_manager/tests/test_predictions_schema.py	Updates import to `ensemble_manager.predictions_schema`.
applications/ensemble_manager/tests/conftest.py	Adds shared test fixture for generating raw data.
applications/ensemble_manager/src/ensemble_manager/server.py	Refactors server to use shared schema + Tide model/data; updates env var naming and SSM path scheme.
applications/ensemble_manager/src/ensemble_manager/preprocess.py	Adds equity bar filtering helper.
applications/ensemble_manager/src/ensemble_manager/predictions_schema.py	Adds Pandera schema + validation checks for predictions.
applications/ensemble_manager/pyproject.toml	Renames package to `ensemble_manager` and depends on `tide`.
applications/ensemble_manager/Dockerfile	Adds Docker build for ensemble manager service.
applications/data_manager/tests/test_storage.rs	Updates crate name imports to `data_manager`.
applications/data_manager/tests/test_state_and_health.rs	Updates crate name imports to `data_manager`.
applications/data_manager/tests/test_handlers.rs	Updates crate name imports to `data_manager`.
applications/data_manager/tests/test_errors.rs	Updates crate name imports to `data_manager`.
applications/data_manager/tests/test_data.rs	Updates crate name imports to `data_manager`.
applications/data_manager/tests/common/mod.rs	Adds shared LocalStack/DuckDB env + server spawning helpers for tests.
applications/data_manager/src/storage.rs	Adds/extends S3 IO + DuckDB query layer for bars/predictions/portfolios/details.
applications/data_manager/src/state.rs	Adds state construction from env (AWS + Massive + S3).
applications/data_manager/src/startup.rs	Updates env var names; updates logging identifiers.
applications/data_manager/src/router.rs	Adds axum router wiring for endpoints.
applications/data_manager/src/predictions.rs	Adds predictions save/query handlers.
applications/data_manager/src/portfolios.rs	Adds portfolios save/get handlers.
applications/data_manager/src/main.rs	Updates crate import name and env var names in tests.
applications/data_manager/src/lib.rs	Exposes modules for the Rust crate.
applications/data_manager/src/health.rs	Adds health endpoint.
applications/data_manager/src/errors.rs	Adds unified error type.
applications/data_manager/src/equity_details.rs	Adds equity details get/sync handlers (sync stub).
applications/data_manager/src/equity_bars.rs	Adds equity bars sync/query handlers.
applications/data_manager/src/data.rs	Adds DataFrame construction helpers for all domain types.
applications/data_manager/bacon.toml	Adds local Rust dev task config.
applications/data_manager/Dockerfile	Updates paths/bin name to `data_manager`.
applications/data_manager/Cargo.toml	Renames crate/bin to `data_manager`.
Cargo.toml	Updates workspace member path.
Cargo.lock	Updates crate name entry.
CLAUDE.md	Documents new `models/` folder purpose.
.github/workflows/launch_infrastructure.yaml	Updates service names/paths to new package directories.

Comments suppressed due to low confidence (2)

tools/pyproject.toml:11

tools no longer declares prefect as a dependency, but the tooling image still starts a Prefect worker (prefect worker start in tools/Dockerfile). As-is, the container will fail at runtime because the prefect CLI/module won’t be installed. Re-add prefect>=3,<4 to tools dependencies (or switch the worker image to install Prefect via another package and ensure it’s present).
tools/Dockerfile:19
The Prefect worker image doesn’t include the models/tide package, but deployments are registered with EntrypointType.MODULE_PATH pointing at tide.workflow (see models/tide/src/tide/deploy.py). The worker won’t be able to import/execute the flow code without tide installed. Copy models/tide/ into this image and ensure uv sync installs the tide package (which will also pull in Prefect).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@applications/ensemble_manager/src/ensemble_manager/server.py`:
- Around line 65-66: The code currently defaults _environment to "development"
which causes MODEL_VERSION_SSM_PARAMETER to point to the wrong SSM namespace;
change behavior so FUND_ENVIRONMENT is required instead of silently defaulting:
replace the defaulting logic for _environment (the variable named _environment
and the derived MODEL_VERSION_SSM_PARAMETER) with a check that reads
FUND_ENVIRONMENT and raises a clear error (or logs and aborts) if it is unset,
so SSM lookups fail fast and never silently use the wrong namespace.

In `@infrastructure/storage.py`:
- Around line 117-166: Repository names use a mixed underscore/hyphen convention
(e.g., resource data_manager_repository uses name "fund/data_manager-server"
while other repos use hyphens like "fund/tide-trainer"); standardize naming by
updating the name fields on the ECR resources (data_manager_repository,
portfolio_manager_repository, ensemble_manager_repository and any other
aws.ecr.Repository instances) to use a single convention (preferably hyphen-only
or underscore-only) and ensure lifecycle resources reference the updated
repository.name values consistently.
- Around line 185-200: The training_server ECR resources are created but not
imported into the stack entrypoint or exported as outputs; import the symbol
training_server_repository in the module that aggregates repositories (same
place other repos are imported) and add a stack export for
training_server_repository (the repository name or id) and
training_server_image_uri (the full image URI) alongside the existing exports
for data_manager, portfolio_manager, ensemble_manager, tide_trainer, and
training_worker so the resource is accessible to other stacks and consumers;
locate where other repositories are imported and exported and mirror their
pattern using the names training_server_repository and
training_server_image_uri.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 713235a0-88ef-48a6-b959-f7c61e4be03d

📥 Commits

Reviewing files that changed from the base of the PR and between aea25fc and 98e2e51.

📒 Files selected for processing (8)

CLAUDE.md
applications/ensemble_manager/src/ensemble_manager/server.py
infrastructure/Pulumi.production.yaml
infrastructure/iam.py
infrastructure/parameters.py
infrastructure/storage.py
libraries/python/tests/test_infrastructure_storage.py
maskfile.md

chrisaddy

Overall this is a clean reorganization. A few items worth addressing below.

forstmeier added 4 commits March 13, 2026 11:01

Broad repository reorganization

5e2e57e

Update infrastructure resource/variable names

c466b94

Refactor Mask commands

f797fcb

Refactor infrastructure definitions into separate files

9fcaddd

Copilot AI review requested due to automatic review settings March 14, 2026 00:44

github-project-automation Bot added this to Overview Mar 14, 2026

github-project-automation Bot moved this to In Progress in Overview Mar 14, 2026

github-actions Bot requested a review from chrisaddy March 14, 2026 00:44

github-actions Bot assigned forstmeier Mar 14, 2026

github-actions Bot added rust Rust code updates python Python code updates markdown Markdown code updates yaml YAML code updates labels Mar 14, 2026

Copilot started reviewing on behalf of forstmeier March 14, 2026 00:44 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Comment thread infrastructure/storage.py

Comment thread infrastructure/docker-compose.yaml

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

coderabbitai Bot previously approved these changes Mar 14, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread applications/ensemble_manager/src/ensemble_manager/server.py Outdated

Comment thread infrastructure/storage.py

Add bot pull request feedback

98e2e51

Copilot AI review requested due to automatic review settings March 14, 2026 03:28

forstmeier dismissed coderabbitai[bot]’s stale review via 98e2e51 March 14, 2026 03:28

Copilot started reviewing on behalf of forstmeier March 14, 2026 03:29 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Comment thread maskfile.md

Comment thread applications/ensemble_manager/Dockerfile

Comment thread infrastructure/config.py

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread applications/ensemble_manager/src/ensemble_manager/server.py

Comment thread infrastructure/storage.py

Comment thread infrastructure/storage.py

chrisaddy reviewed Mar 14, 2026

View reviewed changes

Comment thread applications/portfolio_manager/src/portfolio_manager/server.py

Comment thread infrastructure/storage.py

Comment thread infrastructure/iam.py

Comment thread applications/ensemble_manager/pyproject.toml

chrisaddy approved these changes Mar 14, 2026

View reviewed changes

coderabbitai Bot approved these changes Mar 14, 2026

View reviewed changes

chrisaddy merged commit 9418e89 into master Mar 14, 2026
11 checks passed

chrisaddy deleted the repository-reorganization-phase-four branch March 14, 2026 05:43

github-project-automation Bot moved this from In Progress to Done in Overview Mar 14, 2026

coderabbitai Bot mentioned this pull request Mar 16, 2026

Cleanup Docker image handling, ECS service management, and model training resources #793

Merged

This was referenced Mar 23, 2026

Add internal scheduler to portfolio manager #803

Merged

Migrate to Prefect Cloud with optimized TiDE model #796

Merged

coderabbitai Bot mentioned this pull request Apr 5, 2026

Cleanup ECR resources and Docker image handling #813

Merged

coderabbitai Bot mentioned this pull request Apr 14, 2026

Update data/portfolio manager scheduled invocations #823

Merged

Conversation

forstmeier commented Mar 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Context

Summary by CodeRabbit

Uh oh!

greptile-apps Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chrisaddy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

forstmeier commented Mar 14, 2026 •

edited by coderabbitai Bot

Loading

greptile-apps Bot commented Mar 14, 2026 •

edited

Loading

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading