Skip to content

Fix infrastructure/update data manager Docker definition#826

Merged
forstmeier merged 4 commits intomasterfrom
various-fixes
Apr 16, 2026
Merged

Fix infrastructure/update data manager Docker definition#826
forstmeier merged 4 commits intomasterfrom
various-fixes

Conversation

@forstmeier
Copy link
Copy Markdown
Collaborator

@forstmeier forstmeier commented Apr 16, 2026

Overview

Changes

  • fix incorrect billing infrastructure resource definition
  • adjust data manager Dockerfile configuration

Context

Some broken stuff.

Summary by CodeRabbit

  • Chores

    • Reduced Docker build parallelism and lowered release optimization to reduce resource use.
    • Bumped Pulumi AWS dependency constraint.
  • Refactor

    • Migrated cost anomaly subscription to a structured configuration, switched from email to SNS subscriber, and added an SNS topic policy to allow publishing.
  • New Features

    • Updated service log group names, added an S3 model artifact path env var, adjusted task family name, and increased ECS health-check grace period.

Copilot AI review requested due to automatic review settings April 16, 2026 00:38
@github-project-automation github-project-automation Bot moved this to In Progress in Overview Apr 16, 2026
@github-actions github-actions Bot added the python Python code updates label Apr 16, 2026
@github-actions github-actions Bot requested a review from chrisaddy April 16, 2026 00:38
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

Warning

Rate limit exceeded

@forstmeier has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 41 minutes and 54 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 41 minutes and 54 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 65af61ca-f5c5-4d29-a6d6-d274ea98b0c5

📥 Commits

Reviewing files that changed from the base of the PR and between 6270721 and edb20d7.

📒 Files selected for processing (2)
  • applications/data_manager/Dockerfile
  • infrastructure/training.py
📝 Walkthrough

Walkthrough

Docker build flags changed for data_manager; Pulumi notifications refactored to use SNS + typed AnomalySubscription args and topic policy; ECS log group names, task family, container logging, env var, and ensemble service health-check grace updated.

Changes

Cohort / File(s) Summary
Build config
applications/data_manager/Dockerfile
Reduced CARGO_BUILD_JOBS from 4 to 1; added CARGO_PROFILE_RELEASE_OPT_LEVEL=1 in the builder stage.
Notifications / Cost Explorer
infrastructure/notifications.py
Added infrastructure_alerts_topic_policy granting SNS:Publish to costalerts.amazonaws.com; replaced JSON threshold_expression with AnomalySubscriptionThresholdExpressionArgs (explicit dimension); changed subscribers from computed EMAIL list to a single SNS subscriber using infrastructure_alerts_topic.arn; added depends_on to ensure topic policy ordering.
Compute / ECS task & service
infrastructure/compute.py
Renamed CloudWatch log groups from "/ecs/fund/<service>-server" to "/ecs/fund/applications-<service>-server"; added AWS_S3_MODEL_ARTIFACT_PATH=artifacts/tide/ env var to ensemble-manager-server container; set health_check_grace_period_seconds=180 on ensemble_manager_service.
Training / TaskDefinition & logging
infrastructure/training.py
Added import pulumi; changed ECS task family to tide-runner; built container_definitions via pulumi.Output.all(models_log_group.name, tide_runner_image_uri).apply(...) to inject dynamic awslogs-group and image URI; enabled json.dumps(..., sort_keys=True).
Packaging
infrastructure/pyproject.toml
Bumped pulumi-aws constraint from >=7.4.0 to >=7.7.0.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: fixing infrastructure/billing resources and updating the data manager Docker configuration, both of which are central to the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch various-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 16, 2026

Greptile Summary

This PR resolves two previously flagged infrastructure bugs — the Dockerfile opt-level cache-invalidation issue and the missing SNS topic policy for Cost Anomaly Detection — and makes several related improvements: log group name alignment, a new AWS_S3_MODEL_ARTIFACT_PATH env var on the ensemble manager, an ECS health-check grace period extension, and a switch to typed Pulumi args for the anomaly subscription threshold. All previous review concerns have been addressed correctly.

Confidence Score: 5/5

Safe to merge — both previously flagged P1 issues are resolved and no new issues were found.

All findings from the prior review round are addressed: CARGO_PROFILE_RELEASE_OPT_LEVEL is now an ENV directive consistent across cook and build, and the SNS topic policy with the correct costalerts.amazonaws.com principal and depends_on guard is in place. No new P0/P1 issues were identified in this pass.

No files require special attention.

Important Files Changed

Filename Overview
applications/data_manager/Dockerfile Reduces parallel build jobs to 1 and promotes CARGO_PROFILE_RELEASE_OPT_LEVEL=1 to an ENV directive, fixing the previously flagged cache invalidation issue between cook and build steps.
infrastructure/notifications.py Adds SNS topic policy granting costalerts.amazonaws.com publish rights, migrates anomaly subscription from raw JSON + email to typed args + SNS, with correct depends_on ordering.
infrastructure/compute.py Renames CloudWatch log groups to include the applications prefix, adds AWS_S3_MODEL_ARTIFACT_PATH env var to the ensemble manager task definition, and extends the ECS health-check grace period to 180s.
infrastructure/training.py Renames task family to tide-runner, replaces hardcoded log group name and region strings with dynamic Pulumi outputs, adds sort_keys for deterministic JSON serialization.
infrastructure/pyproject.toml Bumps pulumi-aws minimum version to 7.7.0, required for the typed AnomalySubscriptionThresholdExpressionArgs used in notifications.py.
uv.lock Lock file updated to reflect the pulumi-aws>=7.7.0 constraint change.

Reviews (4): Last reviewed commit: "Address pull request #826 feedback: use ..." | Re-trigger Greptile

Comment thread applications/data_manager/Dockerfile Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates infrastructure and build configuration to address broken billing alert resource definition and adjust the data manager container build behavior.

Changes:

  • Fix Cost Explorer anomaly subscription threshold_expression definition to use Pulumi AWS typed args instead of JSON.
  • Adjust applications/data_manager Docker build settings (cargo build parallelism and release opt-level during cargo chef cook).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
infrastructure/notifications.py Updates anomaly subscription threshold expression definition for AWS Cost Explorer.
applications/data_manager/Dockerfile Tweaks Rust build configuration in the builder stage (jobs + release opt-level for dependency cook).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread applications/data_manager/Dockerfile Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@infrastructure/notifications.py`:
- Around line 50-55: The project uses typed Pulumi classes like
AnomalySubscriptionThresholdExpressionArgs and
AnomalySubscriptionThresholdExpressionDimensionArgs (used in
threshold_expression) which require pulumi-aws>=7.7.0; update the dependency
constraint in pyproject.toml from pulumi-aws>=7.4.0 to pulumi-aws>=7.7.0 and
then reinstall or lock dependencies (poetry install/poetry lock or pip tooling)
so the runtime has the Pulumi AWS v7.7.0+ classes available.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4b17d2a9-be0b-4290-9295-137289811fc3

📥 Commits

Reviewing files that changed from the base of the PR and between 2608402 and 1aa03f8.

📒 Files selected for processing (2)
  • applications/data_manager/Dockerfile
  • infrastructure/notifications.py

Comment thread infrastructure/notifications.py
@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Apr 16, 2026

Coverage Status

coverage: 77.342% (-0.06%) from 77.402% — various-fixes into master

Comment thread infrastructure/notifications.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@infrastructure/notifications.py`:
- Around line 56-61: Create and attach an SNS topic policy resource that grants
"sns:Publish" to the service principal "costalerts.amazonaws.com" for the
infrastructure_alerts_topic ARN, and ensure the cost anomaly subscription
depends on that policy; specifically, add an aws.sns.TopicPolicy (or equivalent)
referencing infrastructure_alerts_topic.arn with a statement allowing Action
"sns:Publish" for Principal {"Service":"costalerts.amazonaws.com"}, then add a
depends_on linking the aws.costexplorer.AnomalySubscriptionSubscriberArgs (the
subscribers creation) to that TopicPolicy so the policy exists before the
subscription is created.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 26da53a6-c0cc-4ce8-9f0a-2c327a2d7a7d

📥 Commits

Reviewing files that changed from the base of the PR and between 1aa03f8 and dfe2e13.

📒 Files selected for processing (3)
  • infrastructure/compute.py
  • infrastructure/notifications.py
  • infrastructure/training.py

Comment thread infrastructure/notifications.py
…and SNS topic policy

- Move CARGO_PROFILE_RELEASE_OPT_LEVEL=1 to an ENV directive in the builder
  stage so both cargo chef cook and cargo build --release use the same opt-level.
  Previously the inline env var only applied to the cook step, causing Cargo to
  invalidate the chef cache and recompile dependencies at opt-level 3 during the
  final build.

- Add aws.sns.TopicPolicy granting costalerts.amazonaws.com SNS:Publish on the
  infrastructure_alerts_topic. Without this policy, Cost Anomaly Detection
  deployments succeed but alerts fail to deliver at runtime.

- Update pulumi-aws minimum version from 7.4.0 to 7.7.0 to match the
  AnomalySubscriptionThresholdExpressionArgs and
  AnomalySubscriptionThresholdExpressionDimensionArgs classes introduced in
  that release.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 16, 2026 17:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread infrastructure/training.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@applications/data_manager/Dockerfile`:
- Line 42: The Dockerfile currently sets CARGO_PROFILE_RELEASE_OPT_LEVEL=1 which
lowers Rust release optimizations; decide whether to restore the default
(opt-level 3) for production or explicitly document and gate the opt-level
change. Either remove or change the CARGO_PROFILE_RELEASE_OPT_LEVEL assignment
to the higher optimization (or make it configurable via build-time ARG) to
preserve runtime performance, or add a clear comment and a README note
describing the trade-off and monitoring expectations so maintainers know why
opt-level=1 was chosen; reference the CARGO_PROFILE_RELEASE_OPT_LEVEL
environment var in the Dockerfile and any build scripts that consume it.
- Line 40: Add a short explanatory comment immediately above or on the same line
as the ENV CARGO_BUILD_JOBS=1 declaration in the Dockerfile explaining why the
build jobs are limited to 1 (e.g., to avoid high parallel memory/CPU usage in CI
or on constrained builder nodes, to ensure deterministic build ordering, or
because of known cargo/rustc concurrency issues), and state whether it’s for
resource constraints, build stability, or compatibility with the build
environment so reviewers understand the intent and can reevaluate later.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 04670c0b-df2c-468e-8257-cfd67249946a

📥 Commits

Reviewing files that changed from the base of the PR and between dfe2e13 and 6270721.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • applications/data_manager/Dockerfile
  • infrastructure/notifications.py
  • infrastructure/pyproject.toml

Comment thread applications/data_manager/Dockerfile
Comment thread applications/data_manager/Dockerfile
…ckerfile build vars

- infrastructure/training.py: import region from config and use it for
  awslogs-region instead of hardcoded "us-east-1", consistent with all
  other ECS task definitions in compute.py
- applications/data_manager/Dockerfile: add comment explaining
  CARGO_BUILD_JOBS=1 (OOM prevention on constrained builder nodes) and
  CARGO_PROFILE_RELEASE_OPT_LEVEL=1 (cache consistency trade-off)

Co-Authored-By: Claude <noreply@anthropic.com>
@forstmeier forstmeier merged commit 9df855f into master Apr 16, 2026
14 checks passed
@forstmeier forstmeier deleted the various-fixes branch April 16, 2026 17:40
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Overview Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Python code updates

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants