Skip to content

Merge Zappa Feature Branch#3827

Draft
rudransh-shrivastava wants to merge 47 commits intomainfrom
feature/nest-zappa-migration
Draft

Merge Zappa Feature Branch#3827
rudransh-shrivastava wants to merge 47 commits intomainfrom
feature/nest-zappa-migration

Conversation

@rudransh-shrivastava
Copy link
Collaborator

Proposed change

Resolves #3826

Merge feature/nest-zappa-migration to main

Checklist

  • Required: I followed the contributing workflow
  • Required: I verified that my code works as intended and resolves the issue as described
  • Required: I ran make check-test locally: all warnings addressed, tests passed
  • I used AI for code, documentation, tests, or communication related to this PR

rudransh-shrivastava and others added 30 commits November 23, 2025 15:50
* Install zappa and initial example configuration

* add temporary IaC code for testing

* Update code

* Remove IAM and let Zappa manage it

* add README

* add RDS proxy, pre-commit hooks, clean-up code

* feat: add ecs tasks

* refactor/clean cache module

* refactor/clean database module

* refactor/clean ecs module

* refactor/clean networking module

* address Sonar Qube bot suggestions

* keep some Sonar Qube bot suggestions but add #NOSONAR

* add terraform dictionary

* refactor/clean security module

* fix pre-commit hooks and add terraform_validate

* add SHA hash and remove terraform_validate

* refactor/clean storage module

* Update docs

* add S3 support for load-data task

* Update README and necessary examples

* coderabbit suggestions

* Update code: install awscli before invoking aws in task

* update README

* NOSONAR

* update README

* fix spell check

* Update README

* fix: typo

* remove immutable tagged images

* fix: load-data task fail due to no permissions

* fix spell check: add PYTHONPATH

* add AWS CLI

* add set -e

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
…2551)

* Use AWS SSM Parameter Store to handle environment variables

* Use focused policy for read access

* Update documentation

* Add flag for create_rds_proxy

* set default value of create_rds_proxy to false

* Populate Zappa/Lambda environment variables from ssm/parameter store

* Update documentation

* Update example

* add default configurations

* add security group db from lambda
* Use AWS SSM Parameter Store to handle environment variables

* Use focused policy for read access

* Update documentation

* Add flag for create_rds_proxy

* set default value of create_rds_proxy to false

* Populate Zappa/Lambda environment variables from ssm/parameter store

* Update documentation

* Update example

* add default configurations

* add security group db from lambda

* fix load-data task by adding a --fixture-path flag

* fix ecs tasks by introducing ecs-* make targets

* change ecs run steps

* remove ecs-* and clean code

* add --no-cache

* use call_command

* add test for --fixture-path

* Update code

* Update backend/wsgi.py

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
* add remote state management

* refactor staging into its own directory

* change steps to account for staging

* bot suggestions

* sort alphabetically

* Update code

* Update documentation

* update formatting

* update code
* refactor s3 buckets into a module

* add vpc flow logs, cache logs, and update ecs logs

* update code
* add `random_id` to bucket name

* Update documentation

* change default force_destroy_bucket

* make bucket names configurable and prevent_destroy

* remove specifies

* remove `force_destroy` flag

* add prevent_destroy

* add note

* update docs

* update docs
* add NACLs and VPC endpoints

* remove random

* update code
* improve `database` module

* update code

* fix spell check

* remove default recovery window

* make deletion_protection a variable
* delete .terraform.lock.hcl files

* use stricter versioning

* fix: rule cycling

* sort alphabetically

* add tflint config

* make README.md formatting consistent

* make descriptions consistent

* add tags

* add periods

* update description and add validation

* tighten egress

* remove secret outputs and replace with ARNs
* allow more origins

* add frontend and alb

* update readme

* fix spell check

* sort alphabetically

* fix issues

* redirect http to https
…-migration-sync-main

Sync feature/zappa-migration with main
* Add conditional creation to NAT and VPC endpoints

* Add conditional FARGATE SPOT to ECS

* Update security module

* Add conditional FARGATE SPOT to frontend

* Update staging configuration

* Simplify frontend deployment steps

* Create VPC endpoints conditionally

* Simplify README

* Refactor alb to be a module

* Add backend to ALB

* split routes

* Update routes and remove code

* fix spell check

* Fix domain example

* Update example

* Update code

* chamge assert to precondition

* drop invalid headers

* never use Fargate SPOT for migrate task

* Delete frontend alb module

* remove create_nat_gateway flag

* rename use_fargate_spot to ecs_use_fargate_spot

* never use fargate spot for index and load data tasks

* Update code

* fix indentation
* Add tests for cache module

* Add validations for cache module

* Add tests for database module

* Update code
…-migration-sync-main

Sync feature/nest-zappa-migration with main
* improve cold starts

* add xray and snap_start

* Update zappa

* Add aliasing to Lambda function version

* Add zip callback to cleanup zappa package

* add alias update zappa callback

* add alias to cleanup older lambda function versions

* Update readme

* add NOSONAR

* Update code

* Update code

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
…-migration-sync-main

Sync feature/nest-zappa-migration with main
* Add CI/CD to run terraform plan

* Add CI/CD to run terraform apply and zappa deploy

* Update zappa settings

* update code

* cleanup old staging setup

* update terraform outputs and variables and use terraform output as source of ids

* append _NAME to environment secrets
…-migration-sync-main

Sync feature/nest-zappa-migration with main
@github-actions github-actions bot added docs Improvements or additions to documentation backend backend-tests deploy makefile ci labels Feb 7, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 7, 2026

Summary by CodeRabbit

  • New Features

    • Terraform-driven staging infrastructure and AWS integration (ALB, ECS, RDS, Redis, ECR, S3); Zappa deployment template and helpers; Django management command to load fixture data.
  • Infrastructure

    • New VPC, subnets, NAT/IGW, NACLs, security groups, VPC endpoints, storage and parameter management; CI updated with Terraform plan/apply, ECR tagging, and infrastructure test job.
  • Tests

    • Extensive Terraform module test suites and unit tests for the new management command.

Walkthrough

Removed Ansible/Compose staging deploy; added comprehensive Terraform staging infrastructure and tests; updated CI to run Terraform/ECR/ECS flows; added Zappa/SSM integration and helpers; introduced backend fixture loader and tests; minor backend, Docker, Makefile, pre-commit and spelling/tooling updates.

Changes

Cohort / File(s) Summary
Ansible & Compose removal
​.github/ansible/inventory.yaml, ​.github/ansible/staging/nest.yaml, ​.github/ansible/staging/proxy.yaml, docker-compose/staging/compose.yaml, cron/staging
Removed staging Ansible hosts/plays and the Docker Compose staging stack (deployment/secret/cron steps deleted).
CI / GitHub Actions
​.github/workflows/run-ci-cd.yaml
Reworked workflow: added run-infrastructure-tests job, Terraform init/plan/apply steps, TFLint/pre-commit Terraform tooling, AWS/ECR login and ECR tagging, capture/upload of Terraform plan/outputs, and ECS task orchestration replacing SSH/Ansible steps.
Terraform root & tooling
infrastructure/*, infrastructure/Makefile, infrastructure/.tflint.hcl, .pre-commit-config.yaml
Added staging backend and infra test tooling, TFLint config, Makefile test target, pre-commit Terraform hooks, and extensive infra README.
Terraform modules (complete set)
infrastructure/modules/... (alb, cache, database, ecs, frontend, kms, networking, parameters, security, storage, ...)
Added full suite of modules with main.tf, variables, outputs, provider lockfiles, and many TFTest suites implementing ALB, ECS/ECR, RDS (+proxy), ElastiCache, VPC/NACLs/flow-logs/vpc-endpoints, KMS, S3, SSM parameters, security groups, storage and related modules.
Terraform staging environment
infrastructure/staging/*
Added staging composition wiring modules together, staging backend example, variables, outputs and provider configs.
Terraform tests & lockfiles
infrastructure/modules/**/tests/*.tftest.hcl, **/.terraform.lock.hcl
Added extensive TFTest suites across modules and numerous .terraform.lock.hcl files for deterministic providers.
Backend: WSGI, Zappa & zappa settings
backend/wsgi.py, backend/zappa_callback.py, backend/zappa_settings.template.json
Added SSM parameter population in wsgi, AWS X-Ray wiring for Lambda, Zappa helpers (clean_package/update_alias/cleanup_versions/post), and a Zappa staging template.
Backend: management command + tests
backend/apps/common/management/commands/load_data.py, backend/tests/.../load_data_test.py
New transactional Django management command to load fixture data with indexing disabled and unit tests covering normal and error paths.
Backend: minor changes & deps
backend/apps/github/models/..., backend/settings/*.py, backend/pyproject.toml
Scoped some imports to function scope, switched Redis scheme to rediss:// when auth enabled, removed hardcoded AWS creds in staging settings, ALLOWED_ORIGINS env-driven, added aws-xray-sdk and zappa deps.
Docker / Makefile
docker/backend/Dockerfile, backend/Makefile, Makefile
Dockerfile copy/symlink/make install changes; backend Makefile gains EXEC_MODE branching to allow direct execution; top-level Makefile includes infrastructure/Makefile.
Tooling / spelling / gitignore
.gitignore, cspell/*
Added Terraform ignore patterns and backend artifacts, added terraform dictionary and dependency for cspell.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

frontend, frontend-tests, docker

Suggested reviewers

  • kasya
🚥 Pre-merge checks | ✅ 5 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (21 files):

⚔️ .github/ansible/inventory.yaml (content)
⚔️ .github/workflows/run-ci-cd.yaml (content)
⚔️ .gitignore (content)
⚔️ .pre-commit-config.yaml (content)
⚔️ Makefile (content)
⚔️ backend/Makefile (content)
⚔️ backend/apps/github/models/common.py (content)
⚔️ backend/apps/github/models/repository.py (content)
⚔️ backend/poetry.lock (content)
⚔️ backend/pyproject.toml (content)
⚔️ backend/settings/base.py (content)
⚔️ backend/settings/staging.py (content)
⚔️ backend/wsgi.py (content)
⚔️ cspell/cspell.json (content)
⚔️ cspell/custom-dict.txt (content)
⚔️ cspell/package.json (content)
⚔️ cspell/pnpm-lock.yaml (content)
⚔️ docker/backend/Dockerfile (content)
⚔️ docs/poetry.lock (content)
⚔️ frontend/package.json (content)
⚔️ frontend/pnpm-lock.yaml (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Merge Zappa Feature Branch' directly and clearly summarizes the main objective of merging the feature/nest-zappa-migration branch into main.
Description check ✅ Passed The PR description is related to the changeset, referencing issue #3826 and the specific branch merge (feature/nest-zappa-migration to main) that this PR accomplishes.
Linked Issues check ✅ Passed The linked issue #3826 objective to merge feature/nest-zappa-migration is fully met; all code changes implement infrastructure, backend, and frontend Zappa migration requirements.
Out of Scope Changes check ✅ Passed All changes are within scope of the Zappa migration feature: infrastructure modules, backend Zappa integration, frontend support, CI/CD pipeline updates, and Ansible removal related to the feature.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/nest-zappa-migration
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch feature/nest-zappa-migration
  • Create stacked PR with resolved conflicts
  • Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.32%. Comparing base (df271b1) to head (fc6363b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3827   +/-   ##
=======================================
  Coverage   93.31%   93.32%           
=======================================
  Files         512      513    +1     
  Lines       15816    15827   +11     
  Branches     2178     2178           
=======================================
+ Hits        14759    14770   +11     
  Misses        695      695           
  Partials      362      362           
Flag Coverage Δ
backend 92.88% <100.00%> (+<0.01%) ⬆️
frontend 94.64% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ckend/apps/common/management/commands/load_data.py 100.00% <100.00%> (ø)
backend/apps/github/models/common.py 100.00% <100.00%> (ø)
backend/apps/github/models/repository.py 95.20% <100.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df271b1...fc6363b. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rudransh-shrivastava rudransh-shrivastava changed the title Merge Zappa Migration Feature Branch Merge Zappa Feature Branch Feb 7, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 113 files

Confidence score: 3/5

  • Risk is moderate because backend/wsgi.py lacks error handling around AWS SSM calls, so startup can fail on permissions/network/rate-limit errors.
  • backend/zappa_callback.py does not paginate list_versions_by_function, which can miss versions beyond 50 and undermine cleanup behavior in larger deployments.
  • Several infrastructure issues (manual ARN reconstruction, missing KMS key_id, README path confusion, and inconsistent actions/setup-python pin) add configuration drift and ops friction but are less likely to break runtime behavior immediately.
  • Pay close attention to backend/wsgi.py, backend/zappa_callback.py, infrastructure/modules/cache/main.tf, infrastructure/modules/storage/main.tf, infrastructure/README.md, .github/workflows/run-ci-cd.yaml - startup resilience, Lambda cleanup completeness, and config consistency.

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/wsgi.py">

<violation number="1" location="backend/wsgi.py:19">
P1: Missing error handling for AWS SSM API calls. If any boto3 call fails (permissions, network, rate limiting), the application will fail to start. Consider wrapping the AWS calls in a try/except block with appropriate logging.</violation>
</file>

<file name="infrastructure/modules/storage/main.tf">

<violation number="1" location="infrastructure/modules/storage/main.tf:36">
P2: Use module output `module.fixtures_bucket.arn` instead of reconstructing the bucket ARN manually. This duplicates the bucket name logic and could lead to mismatches if the module's naming convention ever changes.</violation>
</file>

<file name="infrastructure/README.md">

<violation number="1" location="infrastructure/README.md:169">
P2: Missing directory change instruction before 'Populate ECR Repositories' section. After step 6 (Configure ALB Routing), the user is in `infrastructure/staging/`, but the Docker commands use paths relative to the repository root. Users following the documentation sequentially will encounter command failures.

Consider adding a directory change instruction at the start of this section, e.g., `cd ../../` or `cd` to repository root.</violation>
</file>

<file name="infrastructure/modules/cache/main.tf">

<violation number="1" location="infrastructure/modules/cache/main.tf:88">
P2: The SSM parameter storing the Redis password should use the same custom KMS key specified for other resources in this module. Without `key_id`, it uses AWS's default managed key, which is inconsistent with the encryption strategy applied to CloudWatch logs and may not meet your key management policies.

Consider adding: `key_id = var.kms_key_arn`</violation>
</file>

<file name="backend/zappa_callback.py">

<violation number="1" location="backend/zappa_callback.py:56">
P2: Missing pagination when listing Lambda versions. AWS `list_versions_by_function` returns max 50 versions per call. With more than 50 versions, this function won't see all versions and may fail to clean up old ones properly. Use a paginator:
```python
paginator = client.get_paginator('list_versions_by_function')
versions = []
for page in paginator.paginate(FunctionName=zappa.lambda_name):
    versions.extend(page['Versions'])
```</violation>
</file>

<file name=".github/workflows/run-ci-cd.yaml">

<violation number="1" location=".github/workflows/run-ci-cd.yaml:847">
P2: Inconsistent `actions/setup-python` version. This workflow uses `@a309ff8b426b58ec0e2a45f0f869d46889d02405` in the pre-commit job but `@83679a892e2d95755f2dac6acb0bfd1e9ac5d548` here. Using different versions of the same action can lead to subtle inconsistencies and makes maintenance harder. Consider using the same commit SHA across all instances.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines +19 to +25
client = boto3.client("ssm")
paginator = client.get_paginator("get_parameters_by_path")
response_iterator = paginator.paginate(Path=ssm_param_path, WithDecryption=True)

for page in response_iterator:
for param in page["Parameters"]:
os.environ[Path(param["Name"]).name] = param["Value"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Missing error handling for AWS SSM API calls. If any boto3 call fails (permissions, network, rate limiting), the application will fail to start. Consider wrapping the AWS calls in a try/except block with appropriate logging.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/wsgi.py, line 19:

<comment>Missing error handling for AWS SSM API calls. If any boto3 call fails (permissions, network, rate limiting), the application will fail to start. Consider wrapping the AWS calls in a try/except block with appropriate logging.</comment>

<file context>
@@ -1,10 +1,35 @@
+    if not (ssm_param_path := os.getenv("AWS_SYSTEMS_MANAGER_PARAM_STORE_PATH")):
+        return
+
+    client = boto3.client("ssm")
+    paginator = client.get_paginator("get_parameters_by_path")
+    response_iterator = paginator.paginate(Path=ssm_param_path, WithDecryption=True)
</file context>
Suggested change
client = boto3.client("ssm")
paginator = client.get_paginator("get_parameters_by_path")
response_iterator = paginator.paginate(Path=ssm_param_path, WithDecryption=True)
for page in response_iterator:
for param in page["Parameters"]:
os.environ[Path(param["Name"]).name] = param["Value"]
try:
client = boto3.client("ssm")
paginator = client.get_paginator("get_parameters_by_path")
response_iterator = paginator.paginate(Path=ssm_param_path, WithDecryption=True)
for page in response_iterator:
for param in page["Parameters"]:
os.environ[Path(param["Name"]).name] = param["Value"]
except Exception as exc:
import logging
logging.getLogger(__name__).warning(
"Failed to populate environment from SSM: %s", exc
)

Statement = [{
Effect = "Allow"
Action = ["s3:GetObject"]
Resource = ["arn:aws:s3:::${var.fixtures_bucket_name}-${random_id.suffix.hex}/*"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Use module output module.fixtures_bucket.arn instead of reconstructing the bucket ARN manually. This duplicates the bucket name logic and could lead to mismatches if the module's naming convention ever changes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At infrastructure/modules/storage/main.tf, line 36:

<comment>Use module output `module.fixtures_bucket.arn` instead of reconstructing the bucket ARN manually. This duplicates the bucket name logic and could lead to mismatches if the module's naming convention ever changes.</comment>

<file context>
@@ -0,0 +1,49 @@
+    Statement = [{
+      Effect   = "Allow"
+      Action   = ["s3:GetObject"]
+      Resource = ["arn:aws:s3:::${var.fixtures_bucket_name}-${random_id.suffix.hex}/*"]
+    }]
+  })
</file context>

terraform apply
```

## Populate ECR Repositories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing directory change instruction before 'Populate ECR Repositories' section. After step 6 (Configure ALB Routing), the user is in infrastructure/staging/, but the Docker commands use paths relative to the repository root. Users following the documentation sequentially will encounter command failures.

Consider adding a directory change instruction at the start of this section, e.g., cd ../../ or cd to repository root.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At infrastructure/README.md, line 169:

<comment>Missing directory change instruction before 'Populate ECR Repositories' section. After step 6 (Configure ALB Routing), the user is in `infrastructure/staging/`, but the Docker commands use paths relative to the repository root. Users following the documentation sequentially will encounter command failures.

Consider adding a directory change instruction at the start of this section, e.g., `cd ../../` or `cd` to repository root.</comment>

<file context>
@@ -0,0 +1,339 @@
+    terraform apply
+    ```
+
+## Populate ECR Repositories
+ECR Repositories are used to store images used by ECS (Frontend + Backend Tasks)
+
</file context>

})
}

resource "aws_ssm_parameter" "django_redis_password" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The SSM parameter storing the Redis password should use the same custom KMS key specified for other resources in this module. Without key_id, it uses AWS's default managed key, which is inconsistent with the encryption strategy applied to CloudWatch logs and may not meet your key management policies.

Consider adding: key_id = var.kms_key_arn

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At infrastructure/modules/cache/main.tf, line 88:

<comment>The SSM parameter storing the Redis password should use the same custom KMS key specified for other resources in this module. Without `key_id`, it uses AWS's default managed key, which is inconsistent with the encryption strategy applied to CloudWatch logs and may not meet your key management policies.

Consider adding: `key_id = var.kms_key_arn`</comment>

<file context>
@@ -0,0 +1,94 @@
+  })
+}
+
+resource "aws_ssm_parameter" "django_redis_password" {
+  description = "The password of Redis cache (Required by Django)."
+  name        = "/${var.project_name}/${var.environment}/DJANGO_REDIS_PASSWORD"
</file context>

print("Updating Lambda alias...")

client = boto3.client("lambda")
versions = client.list_versions_by_function(FunctionName=zappa.lambda_name)["Versions"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing pagination when listing Lambda versions. AWS list_versions_by_function returns max 50 versions per call. With more than 50 versions, this function won't see all versions and may fail to clean up old ones properly. Use a paginator:

paginator = client.get_paginator('list_versions_by_function')
versions = []
for page in paginator.paginate(FunctionName=zappa.lambda_name):
    versions.extend(page['Versions'])
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/zappa_callback.py, line 56:

<comment>Missing pagination when listing Lambda versions. AWS `list_versions_by_function` returns max 50 versions per call. With more than 50 versions, this function won't see all versions and may fail to clean up old ones properly. Use a paginator:
```python
paginator = client.get_paginator('list_versions_by_function')
versions = []
for page in paginator.paginate(FunctionName=zappa.lambda_name):
    versions.extend(page['Versions'])
```</comment>

<file context>
@@ -0,0 +1,93 @@
+    print("Updating Lambda alias...")
+
+    client = boto3.client("lambda")
+    versions = client.list_versions_by_function(FunctionName=zappa.lambda_name)["Versions"]
+
+    if not (published := [v["Version"] for v in versions if v["Version"] != "$LATEST"]):
</file context>

run: pipx install poetry

- name: Set up Python
uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Inconsistent actions/setup-python version. This workflow uses @a309ff8b426b58ec0e2a45f0f869d46889d02405 in the pre-commit job but @83679a892e2d95755f2dac6acb0bfd1e9ac5d548 here. Using different versions of the same action can lead to subtle inconsistencies and makes maintenance harder. Consider using the same commit SHA across all instances.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/run-ci-cd.yaml, line 847:

<comment>Inconsistent `actions/setup-python` version. This workflow uses `@a309ff8b426b58ec0e2a45f0f869d46889d02405` in the pre-commit job but `@83679a892e2d95755f2dac6acb0bfd1e9ac5d548` here. Using different versions of the same action can lead to subtle inconsistencies and makes maintenance harder. Consider using the same commit SHA across all instances.</comment>

<file context>
@@ -644,144 +689,264 @@ jobs:
+        run: pipx install poetry
+
+      - name: Set up Python
+        uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548
+        with:
+          cache: 'poetry'
</file context>
Suggested change
uses: actions/setup-python@83679a892e2d95755f2dac6acb0bfd1e9ac5d548
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🤖 Fix all issues with AI agents
In @.github/workflows/run-ci-cd.yaml:
- Around line 755-761: The "Show plan in summary" step currently writes the raw
terraform show -no-color tfplan into $GITHUB_STEP_SUMMARY (working-directory:
infrastructure/staging), which may leak sensitive infra details; modify this by
either removing the step entirely, gating it so it only runs for
private/internal runs, or replace the direct dump with a filtered output: run
terraform show -json tfplan and pipe through jq to extract only non-sensitive
summary fields (or redact VPC/subnet/security group IDs) before appending to
$GITHUB_STEP_SUMMARY, or alternatively upload the full tfplan as a protected
artifact instead of writing it to $GITHUB_STEP_SUMMARY.

In `@backend/tests/apps/common/management/commands/load_data_test.py`:
- Around line 19-21: Remove the unused MagicMock instances `mock_model` and
`mock_app_config` (and the `mock_app_config.get_models.return_value =
[mock_model]` line) from the tests `test_handle` and
`test_handle_with_custom_fixture_path`; locate the variable assignments to
`mock_model` and `mock_app_config` in those test functions and delete them so
the tests no longer define dead variables.
- Around line 74-94: The test test_handle_with_exception_during_call_command
incorrectly patches contextlib.suppress (making the suppression accidental);
instead remove the patch of contextlib.suppress and replace the with-context
using pytest.raises(Exception): call call_command("load_data") inside
pytest.raises to assert the exception is raised, keep
mock_call_command.side_effect = Exception(...), add import pytest at the top and
remove the now-unused import contextlib, and retain the existing assertions that
mock_unregister.assert_called_once(), mock_register.assert_called_once(), and
mock_atomic.assert_called_once().

In `@backend/zappa_callback.py`:
- Around line 83-85: The deletion loop can fail when an alias (notably any alias
other than "live") points to a version in to_delete; update the code in the
to_delete loop to skip or handle versions that have alias associations by either
(A) calling client.list_aliases(FunctionName=<function name>) / list_aliases
with the version/ARN to detect any aliases pointing to that version and only
call client.delete_function(FunctionName=version["FunctionArn"]) when no aliases
exist, or (B) wrap the client.delete_function call in a try/except (catching
botocore.exceptions.ClientError) and skip/log the version when the error
indicates a resource conflict due to an alias; use the existing to_delete
variable and version["FunctionArn"] identifier and ensure "live" alias is not
assumed to be the only alias.
- Around line 39-43: The regex excludes are being tested against the full
filesystem path (str(filepath)) instead of the archive-relative path, so update
the exclude check in the loop to compute the relative path once (e.g., rel =
filepath.relative_to(temp_path) and use str(rel) or rel.as_posix()) and run the
any(re.search(p, str(rel)) for p in excludes) against that relative path before
calling tf.add(filepath, rel); keep the existing new_archive_path comparison
as-is.
- Around line 55-60: The current single call to
boto3.client("lambda").list_versions_by_function may miss versions due to AWS
pagination; replace it with a paginator
(client.get_paginator("list_versions_by_function")) to iterate all pages and
accumulate Versions before computing the published list (the code that builds
published := [v["Version"] ...] used by update_alias) and apply the same
paginated collection pattern inside cleanup_versions so both update_alias and
cleanup_versions operate on the full set of versions rather than only the first
page.

In `@infrastructure/Makefile`:
- Around line 1-12: The find invocation in the test-infrastructure Makefile
target prints entries before filtering for directories, so move the -type d
predicate before -print0 to ensure only directories named "tests" are returned;
also add a .PHONY: test-infrastructure declaration so the Makefile target is
always executed rather than treated as a file. Update the test-infrastructure
target (and add a .PHONY entry) accordingly.

In `@infrastructure/modules/cache/main.tf`:
- Around line 88-94: The aws_ssm_parameter resource "django_redis_password" is
using SecureString but not specifying a KMS CMK; update the resource to include
the KMS key by setting the key_id attribute to the module variable used
elsewhere (var.kms_key_arn) so the parameter is encrypted with the same CMK as
other resources (ensure var.kms_key_arn is available in this module and used in
aws_ssm_parameter "django_redis_password" alongside value =
local.redis_auth_token).

In `@infrastructure/modules/database/main.tf`:
- Around line 81-87: The aws_ssm_parameter resource "django_db_password" is
creating a SecureString but not specifying a KMS key; change the resource to use
the customer-managed KMS key by setting key_id = var.kms_key_arn so it uses the
same key as the module's Secrets Manager usage (reference
aws_ssm_parameter.django_db_password and var.kms_key_arn when updating the
resource).
- Around line 37-61: The aws_db_instance.main resource sets storage_encrypted =
true but doesn't supply a CMK: update aws_db_instance.main to reference the
supplied KMS key (var.kms_key_arn or a derived kms_key_id) by setting kms_key_id
= var.kms_key_arn so RDS uses the customer-managed key; additionally enable
Performance Insights for production by setting performance_insights_enabled =
true and performance_insights_kms_key_id = var.kms_key_arn (or a dedicated ARN
variable) so Performance Insights is encrypted with the same CMK; ensure names
match the resource aws_db_instance.main and variables var.kms_key_arn (or add a
new var if needed) and validate the ARN vs key ID format expected by RDS.

In `@infrastructure/modules/ecs/main.tf`:
- Around line 84-100: The IAM policy resource
aws_iam_policy.ecs_tasks_execution_role_ssm_policy currently only allows
"ssm:GetParameters"; update its policy document (the Action array in
ecs_tasks_execution_role_ssm_policy) to include "ssm:GetParameter" as well so
tasks can call the singular GetParameter API (matching the frontend policy);
ensure the jsonencode block lists both actions and keep tags/metadata unchanged.

In `@infrastructure/modules/parameters/main.tf`:
- Around line 129-135: The aws_ssm_parameter resources for the generated secrets
(e.g., aws_ssm_parameter.django_secret_key and
aws_ssm_parameter.nextauth_secret) currently track the value from
random_string/random_password and will be overwritten if those resources are
recreated; add a lifecycle block to each SSM parameter with ignore_changes =
[value] so Terraform will not update the stored secret after initial creation
(allowing deliberate rotation instead). Locate the aws_ssm_parameter resource
blocks for django_secret_key and nextauth_secret and add lifecycle {
ignore_changes = [value] } inside each resource.
- Around line 258-266: Replace the two random_string resources with
random_password resources so the secrets are marked sensitive: change resource
"random_string" "django_secret_key" and resource "random_string"
"nextauth_secret" to resource "random_password" "django_secret_key" and resource
"random_password" "nextauth_secret" (keeping length and special attributes), and
update any references elsewhere that use random_string.django_secret_key or
random_string.nextauth_secret to random_password.django_secret_key and
random_password.nextauth_secret so the .result values remain the same but are
treated as sensitive.

In `@infrastructure/modules/security/main.tf`:
- Around line 77-86: Add an egress security group rule allowing HTTPS from the
RDS Proxy security group to the VPC endpoint SG: create a new resource named
aws_security_group_rule.rds_proxy_to_vpc_endpoints that is conditional on
var.create_rds_proxy && var.create_vpc_endpoint_rules, with description "Allow
HTTPS to VPC endpoints", type "egress", protocol "tcp", from_port/to_port 443,
security_group_id set to aws_security_group.rds_proxy[0].id and
source_security_group_id set to var.vpc_endpoint_sg_id so the RDS Proxy can
reach Secrets Manager via the VPC endpoint.

In `@infrastructure/modules/storage/main.tf`:
- Around line 28-40: The IAM policy resource aws_iam_policy.fixtures_read_only
is reconstructing the bucket ARN using
${var.fixtures_bucket_name}-${random_id.suffix.hex} which duplicates naming
logic and can drift; replace the hardcoded Resource value with the S3 module's
ARN output (e.g., use module.<s3_module_name>.arn or module.fixtures_bucket.arn)
so the policy references the canonical bucket ARN output from the s3-bucket
submodule instead of rebuilding it from var.fixtures_bucket_name and
random_id.suffix.hex.

In `@infrastructure/README.md`:
- Around line 41-46: Two adjacent blockquote notes ("Copy the state bucket name
from the output." and "It is recommended to not destroy the backend resources
unless absolutely necessary.") are separated by a blank line which triggers
MD028; fix by either merging the consecutive blockquotes into a single
blockquote block (combine their lines so they appear in one contiguous >
paragraph) or add a comment directive <!-- markdownlint-disable MD028 --> above
the first blockquote and re-enable it after if preferred; update the README.md
sections containing those quoted lines accordingly.
🧹 Nitpick comments (38)
backend/apps/common/management/commands/load_data.py (1)

28-32: _options is accessed — drop the underscore prefix.

The leading underscore conventionally signals an unused parameter, but _options["fixture_path"] is referenced on Line 32. This is misleading to readers and linters.

Proposed fix
-    def handle(self, *_args, **_options) -> None:
+    def handle(self, *_args, **options) -> None:
         """Load data into the OWASP Nest application."""
-        with index.disable_indexing(), transaction.atomic():
-            # Run loaddata
-            call_command("loaddata", _options["fixture_path"], "-v", "3")
+        with index.disable_indexing(), transaction.atomic():
+            call_command("loaddata", options["fixture_path"], "-v", "3")
backend/tests/apps/common/management/commands/load_data_test.py (1)

23-24: Consider using MagicMock(spec=contextlib.AbstractContextManager) for mock_atomic.

Manually wiring __enter__ and __exit__ on the mock is verbose. Using MagicMock() as a context manager already works in modern unittest.mockMagicMock supports the context-manager protocol out of the box, so these lines can simply be removed.

Also applies to: 53-54

infrastructure/staging/terraform.tfbackend.example (1)

1-3: Consider adding key and encrypt fields to the backend example.

Without a key field, Terraform defaults the state path to terraform.tfstate at the bucket root, which will collide if multiple environments or stacks share the same bucket. Adding encrypt = true is also standard practice for S3 backends.

Suggested additions
 bucket         = "${STATE_BUCKET_NAME}"
 dynamodb_table = "nest-terraform-state-lock"
+encrypt        = true
+key            = "staging/terraform.tfstate"
 region         = "us-east-2"
docker/backend/Dockerfile (1)

52-54: Minor: apk update writes an index cache that --no-cache on add won't clean up.

Consider either using --no-cache consistently (which skips the index fetch) or adding && rm -rf /var/cache/apk/* at the end. The current form works but leaves a few hundred KB of stale index data in the final image.

Suggested fix
-RUN apk update && apk --no-cache add make && \
+RUN apk --no-cache add make && \
     addgroup -S -g ${OWASP_GID} owasp && \
     adduser -S -h /home/owasp -u ${OWASP_UID} -G owasp owasp
docker-compose/local/compose.yaml (1)

144-150: Consider removing the -zappa suffix before merging to main.

Per learnings, the -zappa suffix was introduced on the feature branch to prevent volume clashes during parallel development. Once this merges to main, the isolation rationale no longer applies, and the suffix becomes a permanent naming oddity. Reverting to the original volume names (e.g., backend-venv, cache-data, db-data) would keep naming clean on the main branch.

Note: developers will need to recreate volumes either way after the merge, so there's no data-loss concern specific to renaming.

backend/wsgi.py (1)

14-25: populate_environ_from_ssm — no error handling around SSM calls.

If SSM is unreachable or the IAM role lacks permissions, the unhandled ClientError / BotoCoreError will crash the Lambda cold start with no clear diagnostic message. For Lambda, fail-fast is reasonable, but a targeted except with a clear log message would aid debugging.

This is a minor observation since fail-fast is a valid strategy here — just noting it could improve operational debuggability.

infrastructure/README.md (1)

97-100: Step 3 numbering: "Populate Secrets" could clarify which secrets need manual entry.

The instruction to populate DJANGO_* secrets with to-be-set-in-aws-console values is helpful. Consider listing the specific parameter names (or referencing the Terraform module that creates them) so operators don't have to hunt through the Parameter Store for which ones need attention.

infrastructure/modules/storage/modules/s3-bucket/main.tf (1)

20-41: Minor: Use "*" instead of { AWS = "*" } for the Principal in deny policies.

AWS documentation recommends using Principal = "*" (a plain string) rather than { AWS = "*" } for bucket policies that deny access. While functionally equivalent in deny statements, the canonical form is cleaner and avoids potential confusion.

Suggested diff
     Statement = [{
       Sid       = "HTTPSOnly"
       Effect    = "Deny"
-      Principal = { AWS = "*" }
+      Principal = "*"
       Action    = "s3:*"
infrastructure/modules/cache/main.tf (1)

44-50: Unnecessary count = 1 on random_password.

The count = 1 serves no purpose here since there's no conditional — it always creates exactly one resource. Removing it simplifies the code and eliminates the need for [0] indexing in local.redis_auth_token.

Suggested diff
 resource "random_password" "redis_auth_token" {
-  count  = 1
   length = 32
   # Redis auth token has specific requirements for special characters.
   override_special = "!$^-"
   special          = true
 }

Then update the local:

 locals {
   parameter_group_name = "default.redis${local.redis_major_version}"
-  redis_auth_token     = random_password.redis_auth_token[0].result
+  redis_auth_token     = random_password.redis_auth_token.result
   redis_major_version  = split(".", var.redis_engine_version)[0]
 }
infrastructure/modules/storage/modules/s3-bucket/outputs.tf (1)

6-9: Consider exposing specific attributes instead of the entire resource object.

Outputting the full aws_s3_bucket.this resource object couples consumers to the internal resource schema. If consumers only need id, arn, bucket, etc., exposing those individually is more intentional and resilient to provider changes.

That said, for an internal module this is acceptable if it's convenient for your use case.

infrastructure/modules/networking/modules/nacl/variables.tf (1)

1-4: Inconsistent: common_tags lacks a default here but has default = {} in sibling modules.

Other modules (e.g., security/variables.tf, s3-bucket/variables.tf) define default = {} for common_tags. Missing the default here means callers must always explicitly pass it, even if empty. Consider adding default = {} for consistency, or deliberately omit it if tags are required for this module.

infrastructure/modules/ecs/variables.tf (1)

44-48: Consider whether "latest" is a safe default for image_tag.

Defaulting to "latest" can cause non-deterministic deployments — Terraform won't detect image changes since the tag string doesn't change, and rollbacks become harder. Consider requiring this variable explicitly (no default) to force callers to pin a specific tag.

Proposed change
 variable "image_tag" {
   description = "The Docker image tag to use for ECS tasks."
   type        = string
-  default     = "latest"
 }
infrastructure/modules/database/main.tf (1)

29-34: Some special characters in override_special may cause issues in connection strings.

Characters like #, %, and & can break database connection URIs if not properly URL-encoded. Consider limiting to safer special characters.

Proposed change
 resource "random_password" "db_password" {
   count  = local.generate_db_password ? 1 : 0
   length = 32
-  # Avoid special characters that might cause issues
-  override_special = "!#$%&*()-_=+[]{}<>:?"
+  # Use only URL-safe special characters to avoid connection string issues
+  override_special = "!_-"
   special          = true
 }
infrastructure/modules/networking/.terraform.lock.hcl (1)

1-25: Lockfile is consistent with the project's exact-pinning convention.

The latest AWS provider releases go up to 6.31.0, and there's also a 6.22.1 patch release. While the team deliberately uses exact version pinning for reproducibility, consider planning a coordinated version bump across all modules when convenient — you're 9 minor versions behind. Based on learnings, the team prefers exact version pinning for maximum reproducibility.

infrastructure/modules/kms/tests/kms.tftest.hcl (2)

3-8: Policy tests validate mock data, not actual module logic.

The override_data block on Line 3-8 replaces data.aws_iam_policy_document.key_policy with a hardcoded JSON string. The test_cloudwatch_logs_policy_statement_exists and test_iam_root_policy_statement_exists runs then regex-match against this same hardcoded string. These tests will always pass regardless of what policy statements the module actually constructs, so they don't validate the real KMS key policy logic.

Consider removing the override_data block or, if it's needed for plan-mode execution, acknowledging in comments that these two tests are smoke-checking the test harness rather than the policy content.

Also applies to: 25-32, 56-63


74-81: Replace customer_master_key_spec with key_spec.

The customer_master_key_spec attribute is deprecated in the AWS API and Terraform provider; use key_spec instead. Both behave identically and accept the same values, but AWS recommends transitioning to key_spec going forward.

Suggested fix
   assert {
-    condition     = aws_kms_key.main.customer_master_key_spec == "SYMMETRIC_DEFAULT"
+    condition     = aws_kms_key.main.key_spec == "SYMMETRIC_DEFAULT"
     error_message = "KMS key must use SYMMETRIC_DEFAULT key spec."
   }
infrastructure/modules/parameters/tests/parameters.tftest.hcl (1)

23-421: Consider consolidating path-format and type assertions into single run blocks per parameter.

Every parameter currently uses two separate run blocks (one for path format, one for type). Since a run block supports multiple assert blocks, you could combine each pair into one run, cutting the number of test runs roughly in half and improving readability without losing any coverage. This mirrors how test_iam_role_policy_attachments_link in the frontend test file uses multiple asserts in one run.

Example consolidation for one parameter
-run "test_django_algolia_application_id_path_format" {
-  command = plan
-  assert {
-    condition     = aws_ssm_parameter.django_algolia_application_id.name == "/${var.project_name}/${var.environment}/DJANGO_ALGOLIA_APPLICATION_ID"
-    error_message = "DJANGO_ALGOLIA_APPLICATION_ID must follow path: /{project}/{environment}/DJANGO_ALGOLIA_APPLICATION_ID."
-  }
-}
-
-run "test_django_algolia_application_id_is_secure_string" {
-  command = plan
-  assert {
-    condition     = aws_ssm_parameter.django_algolia_application_id.type == "SecureString"
-    error_message = "DJANGO_ALGOLIA_APPLICATION_ID must be stored as SecureString."
-  }
-}
+run "test_django_algolia_application_id" {
+  command = plan
+  assert {
+    condition     = aws_ssm_parameter.django_algolia_application_id.name == "/${var.project_name}/${var.environment}/DJANGO_ALGOLIA_APPLICATION_ID"
+    error_message = "DJANGO_ALGOLIA_APPLICATION_ID must follow path: /{project}/{environment}/DJANGO_ALGOLIA_APPLICATION_ID."
+  }
+  assert {
+    condition     = aws_ssm_parameter.django_algolia_application_id.type == "SecureString"
+    error_message = "DJANGO_ALGOLIA_APPLICATION_ID must be stored as SecureString."
+  }
+}
infrastructure/modules/database/outputs.tf (1)

6-9: Description is misleading when create_rds_proxy is false.

When var.create_rds_proxy is false, this output returns the RDS instance address, not a proxy endpoint. Consider updating the description to reflect the conditional behavior.

Suggested fix
 output "db_proxy_endpoint" {
-  description = "The endpoint of the RDS proxy."
+  description = "The database connection endpoint (RDS Proxy when enabled, otherwise RDS instance address)."
   value       = var.create_rds_proxy ? aws_db_proxy.main[0].endpoint : aws_db_instance.main.address
 }
infrastructure/modules/cache/outputs.tf (1)

1-4: Minor grammar nit in description.

Suggested fix
 output "redis_password_arn" {
-  description = "The SSM Parameter ARN of password of Redis."
+  description = "The SSM Parameter ARN of the Redis password."
   value       = aws_ssm_parameter.django_redis_password.arn
 }
infrastructure/modules/ecs/outputs.tf (1)

11-14: Minor description inconsistency: "ECS tasks cluster" vs "ECS cluster".

The frontend module's equivalent output (Line 12 of frontend/outputs.tf) uses "The name of the ECS cluster." while this says "The name of the ECS tasks cluster." Consider aligning the wording for consistency across modules.

Suggested fix
 output "ecs_cluster_name" {
-  description = "The name of the ECS tasks cluster."
+  description = "The name of the ECS cluster."
   value       = aws_ecs_cluster.main.name
 }
infrastructure/modules/security/main.tf (1)

48-56: Frontend port 3000 is hardcoded in security group rules.

The ALB module receives frontend_port as a variable (set to 3000 in staging), but the security module hardcodes port 3000 in both alb_to_frontend egress (Line 50/54) and frontend_from_alb ingress (Line 99/103). If frontend_port ever changes, these rules would silently become stale.

Consider passing frontend_port as a variable to the security module to keep these in sync.

♻️ Suggested approach

Add a frontend_port variable to the security module and use it in both rules:

 resource "aws_security_group_rule" "alb_to_frontend" {
   description              = "Allow traffic to frontend ECS tasks"
-  from_port                = 3000
+  from_port                = var.frontend_port
   protocol                 = "tcp"
   security_group_id        = aws_security_group.alb.id
   source_security_group_id = aws_security_group.frontend.id
-  to_port                  = 3000
+  to_port                  = var.frontend_port
   type                     = "egress"
 }

Apply the same to frontend_from_alb.

Also applies to: 97-105

infrastructure/modules/networking/tests/networking.tftest.hcl (1)

244-263: Good positive and negative testing for VPC endpoint module creation.

The default-disabled test ensures no unnecessary resources are created, while the create_vpc_s3_endpoint = true override validates that enabling any single endpoint flag triggers module creation. Consider adding a test that enables multiple endpoints simultaneously to verify there's still only one module instance.

infrastructure/backend/main.tf (1)

195-211: Consider using KMS encryption for the state bucket instead of AES256.

The DynamoDB lock table uses the customer-managed KMS key from module.kms (line 93), but both S3 buckets fall back to AES256 (SSE-S3). Since the state bucket holds sensitive Terraform state (potentially containing secrets and resource identifiers), using the same KMS key would provide consistent encryption posture and key management controls.

♻️ Proposed fix for state bucket encryption
 resource "aws_s3_bucket_server_side_encryption_configuration" "state" {
   bucket = aws_s3_bucket.state.id
   rule {
     apply_server_side_encryption_by_default {
-      sse_algorithm = "AES256"
+      sse_algorithm     = "aws:kms"
+      kms_master_key_id = module.kms.key_arn
     }
+    bucket_key_enabled = true
   }
 }
infrastructure/modules/ecs/modules/task/main.tf (1)

37-37: Consider parameterizing the container name.

The container name is hardcoded to "backend", but this task module is reused for multiple task types (sync_data, migrate, index_data, etc.). Using var.task_name (or a dedicated variable) would make the module more self-describing and easier to target for container overrides in RunTask API calls.

♻️ Suggested change
-      name = "backend"
+      name = var.task_name
infrastructure/modules/networking/modules/nacl/main.tf (1)

38-46: Private ephemeral inbound from 0.0.0.0/0 is intentionally broad — confirm this is expected.

Rule 100 allows inbound TCP 1024–65535 from 0.0.0.0/0 on the private NACL. This is standard for accepting return traffic through a NAT Gateway, but it also admits unsolicited traffic on ephemeral ports from any source that somehow reaches the private subnet. Since NACLs are stateless, this is the typical trade-off. Just flagging for awareness — security groups provide the stateful second layer.

infrastructure/modules/alb/outputs.tf (1)

31-35: depends_on on output block — valid but worth a brief inline comment.

The depends_on = [aws_lb_listener.https] ensures the HTTPS listener exists before downstream modules consume the target group ARN. This is a correct ordering safeguard. Consider adding a brief comment explaining why this dependency exists, since depends_on on outputs is uncommon and might surprise future maintainers.

infrastructure/modules/frontend/variables.tf (1)

51-55: image_tag defaults to "latest" — acceptable for testing but worth noting.

For a testing environment this is fine. If this module is ever promoted to production, pinning to a specific tag would be important to ensure reproducible deployments.

infrastructure/modules/alb/variables.tf (1)

45-49: Minor naming inconsistency: log_retention_days vs log_retention_in_days in other modules.

The networking module and frontend module use log_retention_in_days, while this ALB module uses log_retention_days. Consider aligning the naming for consistency across the infrastructure modules.

infrastructure/modules/networking/modules/vpc-endpoint/main.tf (1)

37-122: Interface endpoints are consistently structured — well-factored repetitive resources.

All five interface endpoints follow the same pattern (private DNS, shared SG, private subnets, consistent tagging). If the number of endpoints grows further, you could consider using for_each over a map of service names to reduce repetition, but at five endpoints the current approach is clear and maintainable.

infrastructure/modules/frontend/main.tf (1)

109-136: Consider adding deployment_circuit_breaker to the ECS service.

Without a circuit breaker, a bad deployment can leave the service in a degraded state with ECS continuously trying to place failing tasks. Enabling it with rollback provides automatic recovery.

♻️ Suggested addition
 resource "aws_ecs_service" "frontend" {
   cluster                           = aws_ecs_cluster.frontend.id
   desired_count                     = var.desired_count
   health_check_grace_period_seconds = 60
   name                              = "${var.project_name}-${var.environment}-frontend-service"
   ...
   task_definition = aws_ecs_task_definition.frontend.arn
 
+  deployment_circuit_breaker {
+    enable   = true
+    rollback = true
+  }
+
   capacity_provider_strategy {
infrastructure/modules/alb/main.tf (2)

83-101: enable_deletion_protection is hardcoded to false.

If this module is intended for reuse across staging and production, consider parameterizing this. Accidental deletion of a production ALB could cause an outage.

♻️ Suggested change
-  enable_deletion_protection = false
+  enable_deletion_protection = var.enable_deletion_protection

Add to variables.tf:

variable "enable_deletion_protection" {
  description = "Whether to enable deletion protection on the ALB."
  type        = bool
  default     = false
}

134-149: Listener rule priority uses each.key which is a string, not an integer.

The for_each map uses string keys (from for idx, chunk). While Terraform will implicitly convert the string index to a number in 100 + each.key, this relies on implicit type coercion. The expression works correctly because idx produces numeric strings ("0", "1", "2"), but it would be clearer to use tonumber(each.key).

♻️ Minor clarification
-  priority     = 100 + each.key
+  priority     = 100 + tonumber(each.key)
.github/workflows/run-ci-cd.yaml (3)

537-545: Long-lived AWS access keys used instead of OIDC.

The workflow already uses OIDC (id-token: write) for Codecov in other jobs. Using OIDC with aws-actions/configure-aws-credentials for AWS access would eliminate the need for long-lived AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY secrets, reducing the blast radius if credentials are compromised. This applies to all three jobs that configure AWS credentials (build-staging-images, plan-staging-nest, deploy-staging-nest).


818-826: Plan artifact may become stale between plan and apply jobs.

If infrastructure state changes between the plan-staging-nest and deploy-staging-nest jobs (e.g., manual changes, concurrent runs), terraform apply tfplan will fail because the saved plan's state serial won't match. This is actually the correct behavior — Terraform will refuse to apply a stale plan. Just be aware that this can cause deploy failures requiring a re-run of the pipeline.

Also, note that -auto-approve is redundant when applying a saved plan file.


857-870: Zappa deployment depends on Terraform outputs but has no rollback if zappa update fails.

If zappa update staging (line 870) fails mid-deploy, the Terraform infrastructure will already be applied but the Lambda function may be in an inconsistent state. Consider adding error handling or a rollback step, or at minimum ensuring zappa update is idempotent and safe to re-run.

infrastructure/modules/networking/main.tf (1)

60-67: Single NAT gateway — acceptable for staging, worth noting for production.

The NAT gateway is pinned to aws_subnet.public[0], meaning all private-subnet egress routes through a single AZ. This is cost-effective for staging but creates a single point of failure. If this module is reused for production, consider provisioning one NAT gateway per AZ for high availability.

infrastructure/modules/parameters/main.tf (1)

16-26: SecureString parameters use AWS-managed KMS — track CMK migration.

Checkov flags CKV_AWS_337 on all SecureString parameters because they rely on the default AWS-managed aws/ssm key rather than a customer-managed KMS CMK. Based on learnings, KMS CMK encryption is planned after S3 state management is completed. Leaving this as a tracking note — no action needed now, but ensure this is on the roadmap.

Also applies to: 109-119, 149-159, 162-172, 174-184, 218-228

infrastructure/staging/variables.tf (1)

1-5: No validation that AZ count matches subnet CIDR counts.

The networking module indexes into availability_zones, public_subnet_cidrs, and private_subnet_cidrs by count.index (e.g., var.availability_zones[count.index]). If the lists have mismatched lengths, Terraform will fail with an index-out-of-range error at plan time. Consider adding a cross-variable validation.

🛡️ Example validation (add to any one of the three variables)
variable "availability_zones" {
  description = "A list of availability zones for the VPC."
  type        = list(string)
  default     = ["us-east-2a", "us-east-2b", "us-east-2c"]

  validation {
    condition     = length(var.availability_zones) > 0
    error_message = "At least one availability zone must be specified."
  }
}

Note: Terraform variable-level validations cannot cross-reference other variables. The best place for a cross-variable check is a check block or a precondition in the module call in main.tf:

locals {
  az_count = length(var.availability_zones)
}

check "subnet_az_count_match" {
  assert {
    condition     = length(var.public_subnet_cidrs) == local.az_count && length(var.private_subnet_cidrs) == local.az_count
    error_message = "The number of public/private subnet CIDRs must match the number of availability zones."
  }
}

Also applies to: 201-231

Comment on lines +755 to +761
- name: Show plan in summary
working-directory: infrastructure/staging
run: |
echo "## Terraform Plan Output" >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY
terraform show -no-color tfplan >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Terraform plan output in step summary may expose sensitive infrastructure details.

$GITHUB_STEP_SUMMARY is visible to anyone who can view the workflow run. For a public repository like OWASP/Nest, this could leak VPC IDs, subnet CIDRs, security group IDs, ARNs, and other infrastructure details. While Terraform masks values marked sensitive, many resource attributes are not marked sensitive yet could aid reconnaissance.

Consider either removing this step, restricting it to private visibility, or filtering the output.

🤖 Prompt for AI Agents
In @.github/workflows/run-ci-cd.yaml around lines 755 - 761, The "Show plan in
summary" step currently writes the raw terraform show -no-color tfplan into
$GITHUB_STEP_SUMMARY (working-directory: infrastructure/staging), which may leak
sensitive infra details; modify this by either removing the step entirely,
gating it so it only runs for private/internal runs, or replace the direct dump
with a filtered output: run terraform show -json tfplan and pipe through jq to
extract only non-sensitive summary fields (or redact VPC/subnet/security group
IDs) before appending to $GITHUB_STEP_SUMMARY, or alternatively upload the full
tfplan as a protected artifact instead of writing it to $GITHUB_STEP_SUMMARY.

Comment on lines +19 to +21
mock_model = MagicMock()
mock_app_config = MagicMock()
mock_app_config.get_models.return_value = [mock_model]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Dead code: mock_model and mock_app_config are created but never used.

These mocks appear in test_handle (Lines 19-21) and test_handle_with_custom_fixture_path (Lines 49-51). Remove them to avoid confusion.

Proposed fix (test_handle)
     ):
-        mock_model = MagicMock()
-        mock_app_config = MagicMock()
-        mock_app_config.get_models.return_value = [mock_model]
-
         mock_atomic.return_value.__enter__ = MagicMock()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mock_model = MagicMock()
mock_app_config = MagicMock()
mock_app_config.get_models.return_value = [mock_model]
🤖 Prompt for AI Agents
In `@backend/tests/apps/common/management/commands/load_data_test.py` around lines
19 - 21, Remove the unused MagicMock instances `mock_model` and
`mock_app_config` (and the `mock_app_config.get_models.return_value =
[mock_model]` line) from the tests `test_handle` and
`test_handle_with_custom_fixture_path`; locate the variable assignments to
`mock_model` and `mock_app_config` in those test functions and delete them so
the tests no longer define dead variables.

Comment on lines +74 to +94
def test_handle_with_exception_during_call_command(
self,
mock_atomic,
mock_call_command,
mock_register,
mock_unregister,
):
"""Test that indexing is re-enabled even if call_command fails."""
mock_call_command.side_effect = Exception("Call command failed")

with patch("contextlib.suppress") as mock_suppress:
mock_suppress.return_value.__enter__ = MagicMock()
mock_suppress.return_value.__exit__ = MagicMock()

with contextlib.suppress(Exception):
call_command("load_data")

mock_unregister.assert_called_once()
mock_register.assert_called_once()

mock_atomic.assert_called_once()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Exception test is fragile — patching contextlib.suppress undermines its own usage.

Line 84 replaces contextlib.suppress with a mock, then Line 88 calls contextlib.suppress(Exception) — which now invokes the mock, not the real suppressor. The exception is only suppressed because MagicMock().__exit__() returns a truthy MagicMock, which accidentally signals "suppress the exception." This is brittle and misleading.

Use pytest.raises instead for a straightforward, intention-revealing test:

Proposed fix
     def test_handle_with_exception_during_call_command(
         self,
         mock_atomic,
         mock_call_command,
         mock_register,
         mock_unregister,
     ):
         """Test that indexing is re-enabled even if call_command fails."""
+        mock_atomic.return_value.__enter__ = MagicMock()
+        mock_atomic.return_value.__exit__ = MagicMock()
         mock_call_command.side_effect = Exception("Call command failed")
 
-        with patch("contextlib.suppress") as mock_suppress:
-            mock_suppress.return_value.__enter__ = MagicMock()
-            mock_suppress.return_value.__exit__ = MagicMock()
-
-            with contextlib.suppress(Exception):
-                call_command("load_data")
+        with pytest.raises(Exception, match="Call command failed"):
+            call_command("load_data")
 
         mock_unregister.assert_called_once()
         mock_register.assert_called_once()
-
         mock_atomic.assert_called_once()

Note: you'll also need import pytest at the top of the file, and can remove import contextlib since it would no longer be used.

🤖 Prompt for AI Agents
In `@backend/tests/apps/common/management/commands/load_data_test.py` around lines
74 - 94, The test test_handle_with_exception_during_call_command incorrectly
patches contextlib.suppress (making the suppression accidental); instead remove
the patch of contextlib.suppress and replace the with-context using
pytest.raises(Exception): call call_command("load_data") inside pytest.raises to
assert the exception is raised, keep mock_call_command.side_effect =
Exception(...), add import pytest at the top and remove the now-unused import
contextlib, and retain the existing assertions that
mock_unregister.assert_called_once(), mock_register.assert_called_once(), and
mock_atomic.assert_called_once().

Comment on lines +39 to +43
for filepath in temp_path.rglob("*"):
if filepath == new_archive_path:
continue
if filepath.is_file() and not any(re.search(p, str(filepath)) for p in excludes):
tf.add(filepath, filepath.relative_to(temp_path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: Regex patterns are matched against the full temp directory path, not the archive-relative path.

str(filepath) includes the temp directory prefix (e.g., /tmp/abc123/some/file.py), but regex_excludes patterns are presumably written for archive-relative paths. This will cause patterns to either fail to match or match unintended paths.

Use the relative path instead:

Suggested fix
             for filepath in temp_path.rglob("*"):
                 if filepath == new_archive_path:
                     continue
-                if filepath.is_file() and not any(re.search(p, str(filepath)) for p in excludes):
+                relative = str(filepath.relative_to(temp_path))
+                if filepath.is_file() and not any(re.search(p, relative) for p in excludes):
                     tf.add(filepath, filepath.relative_to(temp_path))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for filepath in temp_path.rglob("*"):
if filepath == new_archive_path:
continue
if filepath.is_file() and not any(re.search(p, str(filepath)) for p in excludes):
tf.add(filepath, filepath.relative_to(temp_path))
for filepath in temp_path.rglob("*"):
if filepath == new_archive_path:
continue
relative = str(filepath.relative_to(temp_path))
if filepath.is_file() and not any(re.search(p, relative) for p in excludes):
tf.add(filepath, filepath.relative_to(temp_path))
🤖 Prompt for AI Agents
In `@backend/zappa_callback.py` around lines 39 - 43, The regex excludes are being
tested against the full filesystem path (str(filepath)) instead of the
archive-relative path, so update the exclude check in the loop to compute the
relative path once (e.g., rel = filepath.relative_to(temp_path) and use str(rel)
or rel.as_posix()) and run the any(re.search(p, str(rel)) for p in excludes)
against that relative path before calling tf.add(filepath, rel); keep the
existing new_archive_path comparison as-is.

Comment on lines +55 to +60
client = boto3.client("lambda")
versions = client.list_versions_by_function(FunctionName=zappa.lambda_name)["Versions"]

if not (published := [v["Version"] for v in versions if v["Version"] != "$LATEST"]):
print("No published versions found, skipping alias update")
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

list_versions_by_function is paginated — this may miss versions.

The AWS Lambda list_versions_by_function API returns at most 50 versions per call. Both update_alias (Line 56) and cleanup_versions (Line 76) use a single call without handling pagination. If the function accumulates more than 50 versions, update_alias may not find the actual latest version, and cleanup_versions will not delete all old versions.

Consider using a paginator:

Suggested approach
-    versions = client.list_versions_by_function(FunctionName=zappa.lambda_name)["Versions"]
+    paginator = client.get_paginator("list_versions_by_function")
+    versions = [
+        v
+        for page in paginator.paginate(FunctionName=zappa.lambda_name)
+        for v in page["Versions"]
+    ]

Apply the same pattern in cleanup_versions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
client = boto3.client("lambda")
versions = client.list_versions_by_function(FunctionName=zappa.lambda_name)["Versions"]
if not (published := [v["Version"] for v in versions if v["Version"] != "$LATEST"]):
print("No published versions found, skipping alias update")
return
client = boto3.client("lambda")
paginator = client.get_paginator("list_versions_by_function")
versions = [
v
for page in paginator.paginate(FunctionName=zappa.lambda_name)
for v in page["Versions"]
]
if not (published := [v["Version"] for v in versions if v["Version"] != "$LATEST"]):
print("No published versions found, skipping alias update")
return
🤖 Prompt for AI Agents
In `@backend/zappa_callback.py` around lines 55 - 60, The current single call to
boto3.client("lambda").list_versions_by_function may miss versions due to AWS
pagination; replace it with a paginator
(client.get_paginator("list_versions_by_function")) to iterate all pages and
accumulate Versions before computing the published list (the code that builds
published := [v["Version"] ...] used by update_alias) and apply the same
paginated collection pattern inside cleanup_versions so both update_alias and
cleanup_versions operate on the full set of versions rather than only the first
page.

Comment on lines +129 to +135
resource "aws_ssm_parameter" "django_secret_key" {
description = "Django Secret Key generated by Terraform."
name = "/${var.project_name}/${var.environment}/DJANGO_SECRET_KEY"
tags = var.common_tags
type = "SecureString"
value = random_string.django_secret_key.result
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Generated secrets lack lifecycle { ignore_changes } — recreation risk.

Unlike the manually-set parameters, django_secret_key and nextauth_secret SSM parameters are bound directly to random_password/random_string resources. If those resources are ever tainted or recreated (e.g., state manipulation, provider upgrade, terraform taint), the secret values will change silently, invalidating all active Django sessions and NextAuth tokens.

Consider adding lifecycle { ignore_changes = [value] } to these two SSM parameters so that once the initial secret is seeded, Terraform won't overwrite it. Rotation can then be handled deliberately.

🛡️ Proposed fix
 resource "aws_ssm_parameter" "django_secret_key" {
   description = "Django Secret Key generated by Terraform."
   name        = "/${var.project_name}/${var.environment}/DJANGO_SECRET_KEY"
   tags        = var.common_tags
   type        = "SecureString"
   value       = random_string.django_secret_key.result
+
+  lifecycle {
+    ignore_changes = [value]
+  }
 }
 resource "aws_ssm_parameter" "nextauth_secret" {
   description = "NextAuth secret key generated by Terraform."
   name        = "/${var.project_name}/${var.environment}/NEXTAUTH_SECRET"
   tags        = var.common_tags
   type        = "SecureString"
   value       = random_string.nextauth_secret.result
+
+  lifecycle {
+    ignore_changes = [value]
+  }
 }

Also applies to: 238-244

🧰 Tools
🪛 Checkov (3.2.334)

[high] 129-135: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

🤖 Prompt for AI Agents
In `@infrastructure/modules/parameters/main.tf` around lines 129 - 135, The
aws_ssm_parameter resources for the generated secrets (e.g.,
aws_ssm_parameter.django_secret_key and aws_ssm_parameter.nextauth_secret)
currently track the value from random_string/random_password and will be
overwritten if those resources are recreated; add a lifecycle block to each SSM
parameter with ignore_changes = [value] so Terraform will not update the stored
secret after initial creation (allowing deliberate rotation instead). Locate the
aws_ssm_parameter resource blocks for django_secret_key and nextauth_secret and
add lifecycle { ignore_changes = [value] } inside each resource.

Comment on lines +258 to +266
resource "random_string" "django_secret_key" {
length = 50
special = true
}

resource "random_string" "nextauth_secret" {
length = 32
special = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Terraform random_password result sensitive attribute vs random_string

💡 Result:

  • random_password vs random_string: They generate values the same way (both use a cryptographic RNG), but random_password.result is marked sensitive, while **random_string.result is not. In practice that means random_password won’t be shown in plan/apply output, whereas random_string will. [1]
  • What “sensitive” actually does: It only affects UI/CLI redaction (e.g., terraform output name shows <sensitive>). The real value is still stored in Terraform state. [2]
  • How to retrieve a sensitive random_password value when needed: If you expose it via an output, terraform output -raw <name> or terraform output -json will print the plaintext value. [2]
  • If your goal is “don’t store it in state”: Prefer ephemeral patterns (e.g., ephemeral outputs / the ephemeral random_password variant when possible) because “sensitive” alone does not prevent state storage. [1] [2]
  • Debug-only unredaction: You can wrap a sensitive value with nonsensitive(...) to display it in plan/output, but this is for troubleshooting and doesn’t make it safer. [3]

Sources:
[1] terraform-provider-random random_password doc/code (sensitive result; identical to random_string) (third-party-mirror.googlesource.com)
[2] HashiCorp Terraform docs: sensitive outputs, state storage, -json/-raw, and ephemeral outputs (developer.hashicorp.com)
[3] HashiCorp support article on using nonsensitive (support.hashicorp.com)

Citations:


🏁 Script executed:

# Check the actual file at the specified lines and find references
head -300 infrastructure/modules/parameters/main.tf | tail -50

Repository: OWASP/Nest

Length of output: 1548


🏁 Script executed:

# Search for references to the random_string/random_password resources
grep -n "random_string\|random_password" infrastructure/modules/parameters/main.tf

Repository: OWASP/Nest

Length of output: 269


Use random_password instead of random_string for secret generation.

random_string outputs are not marked as sensitive, so they appear in plaintext in terraform plan, terraform apply output, and logs. random_password behaves identically but marks its result attribute as sensitive, preventing accidental exposure in CLI output and logs.

Update lines 258–266:

Proposed fix
-resource "random_string" "django_secret_key" {
+resource "random_password" "django_secret_key" {
   length  = 50
   special = true
 }

-resource "random_string" "nextauth_secret" {
+resource "random_password" "nextauth_secret" {
   length  = 32
   special = true
 }

Also update the references on line 134 and line 243:

-  value       = random_string.django_secret_key.result
+  value       = random_password.django_secret_key.result
-  value       = random_string.nextauth_secret.result
+  value       = random_password.nextauth_secret.result
🤖 Prompt for AI Agents
In `@infrastructure/modules/parameters/main.tf` around lines 258 - 266, Replace
the two random_string resources with random_password resources so the secrets
are marked sensitive: change resource "random_string" "django_secret_key" and
resource "random_string" "nextauth_secret" to resource "random_password"
"django_secret_key" and resource "random_password" "nextauth_secret" (keeping
length and special attributes), and update any references elsewhere that use
random_string.django_secret_key or random_string.nextauth_secret to
random_password.django_secret_key and random_password.nextauth_secret so the
.result values remain the same but are treated as sensitive.

Comment on lines +77 to +86
resource "aws_security_group_rule" "ecs_to_vpc_endpoints" {
count = var.create_vpc_endpoint_rules ? 1 : 0
description = "Allow HTTPS to VPC endpoints"
from_port = 443
protocol = "tcp"
security_group_id = aws_security_group.ecs.id
source_security_group_id = var.vpc_endpoint_sg_id
to_port = 443
type = "egress"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Explore the database module structure
fd "main.tf" --full-path --glob '**/modules/database/**' --type f

Repository: OWASP/Nest

Length of output: 163


🏁 Script executed:

# Check for RDS Proxy configuration and Secrets Manager references
rg -i "rds.*proxy|db.*proxy" infrastructure/modules/database/ -A 5

Repository: OWASP/Nest

Length of output: 9730


🏁 Script executed:

# Search for Secrets Manager usage in the codebase
rg -i "secrets.*manager|secretsmanager" infrastructure/ -l

Repository: OWASP/Nest

Length of output: 572


🏁 Script executed:

cat -n infrastructure/modules/security/main.tf | head -200

Repository: OWASP/Nest

Length of output: 8202


🏁 Script executed:

# Check if rds_proxy security group is created in security module
rg -i "rds_proxy\|rds.*proxy" infrastructure/modules/security/

Repository: OWASP/Nest

Length of output: 36


🏁 Script executed:

# Look for VPC endpoint configuration and Secrets Manager endpoint
cat -n infrastructure/modules/networking/modules/vpc-endpoint/main.tf

Repository: OWASP/Nest

Length of output: 5074


🏁 Script executed:

cat -n infrastructure/modules/security/main.tf | tail -100

Repository: OWASP/Nest

Length of output: 4605


🏁 Script executed:

# Search for all egress rules related to rds_proxy
rg "rds_proxy" infrastructure/modules/security/main.tf -A 5 -B 5

Repository: OWASP/Nest

Length of output: 3152


Add VPC endpoint egress rule for RDS Proxy security group.

When RDS Proxy is enabled with Secrets Manager VPC endpoint, it needs HTTPS egress to the VPC endpoint security group to fetch DB credentials. Currently, only the ECS, frontend, and Lambda security groups have this rule. Add a matching egress rule for the RDS Proxy security group:

Example
resource "aws_security_group_rule" "rds_proxy_to_vpc_endpoints" {
  count                    = var.create_rds_proxy && var.create_vpc_endpoint_rules ? 1 : 0
  description              = "Allow HTTPS to VPC endpoints"
  from_port                = 443
  protocol                 = "tcp"
  security_group_id        = aws_security_group.rds_proxy[0].id
  source_security_group_id = var.vpc_endpoint_sg_id
  to_port                  = 443
  type                     = "egress"
}

Without this rule, RDS Proxy authentication will fail when trying to retrieve secrets from Secrets Manager via VPC endpoint.

🤖 Prompt for AI Agents
In `@infrastructure/modules/security/main.tf` around lines 77 - 86, Add an egress
security group rule allowing HTTPS from the RDS Proxy security group to the VPC
endpoint SG: create a new resource named
aws_security_group_rule.rds_proxy_to_vpc_endpoints that is conditional on
var.create_rds_proxy && var.create_vpc_endpoint_rules, with description "Allow
HTTPS to VPC endpoints", type "egress", protocol "tcp", from_port/to_port 443,
security_group_id set to aws_security_group.rds_proxy[0].id and
source_security_group_id set to var.vpc_endpoint_sg_id so the RDS Proxy can
reach Secrets Manager via the VPC endpoint.

Comment on lines +28 to +40
resource "aws_iam_policy" "fixtures_read_only" {
name = "${var.project_name}-${var.environment}-fixtures-read-only"
description = "Allows read-only access to the fixtures S3 bucket"
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject"]
Resource = ["arn:aws:s3:::${var.fixtures_bucket_name}-${random_id.suffix.hex}/*"]
}]
})
tags = var.common_tags
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

fd 'outputs.tf' --full-path --glob '**/storage/modules/s3-bucket/**'

Repository: OWASP/Nest

Length of output: 172


🏁 Script executed:

# Find the s3-bucket module and check its structure
find . -type d -path '*/storage/modules/s3-bucket' 2>/dev/null

Repository: OWASP/Nest

Length of output: 104


🏁 Script executed:

# List the main storage module to understand its structure
ls -la infrastructure/modules/storage/ 2>/dev/null || ls -la .

Repository: OWASP/Nest

Length of output: 533


🏁 Script executed:

cat infrastructure/modules/storage/modules/s3-bucket/outputs.tf

Repository: OWASP/Nest

Length of output: 251


🏁 Script executed:

cat infrastructure/modules/storage/modules/s3-bucket/main.tf

Repository: OWASP/Nest

Length of output: 1803


🏁 Script executed:

cat infrastructure/modules/storage/main.tf

Repository: OWASP/Nest

Length of output: 1253


Consider referencing the module output for the bucket ARN instead of reconstructing it.

Line 36 manually rebuilds the bucket name (${var.fixtures_bucket_name}-${random_id.suffix.hex}) to construct the ARN, duplicating the logic from line 22. If the s3-bucket submodule ever changes its naming, this policy resource will silently drift. Use the module's arn output instead.

♻️ Proposed fix
   policy = jsonencode({
     Version = "2012-10-17"
     Statement = [{
       Effect   = "Allow"
       Action   = ["s3:GetObject"]
-      Resource = ["arn:aws:s3:::${var.fixtures_bucket_name}-${random_id.suffix.hex}/*"]
+      Resource = ["${module.fixtures_bucket.arn}/*"]
     }]
   })
🤖 Prompt for AI Agents
In `@infrastructure/modules/storage/main.tf` around lines 28 - 40, The IAM policy
resource aws_iam_policy.fixtures_read_only is reconstructing the bucket ARN
using ${var.fixtures_bucket_name}-${random_id.suffix.hex} which duplicates
naming logic and can drift; replace the hardcoded Resource value with the S3
module's ARN output (e.g., use module.<s3_module_name>.arn or
module.fixtures_bucket.arn) so the policy references the canonical bucket ARN
output from the s3-bucket submodule instead of rebuilding it from
var.fixtures_bucket_name and random_id.suffix.hex.

Comment on lines +41 to +46
> [!NOTE]
> Copy the state bucket name from the output.

> [!NOTE]
> It is recommended to not destroy the backend resources unless absolutely necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consecutive blockquotes separated by blank lines trigger MD028.

Markdownlint flags blank lines between adjacent blockquotes (Lines 43, 81, 178). This is cosmetic and only affects strict linting — GitHub renders these correctly. If you want to silence the warning, combine adjacent notes into a single blockquote block or add an <!-- markdownlint-disable MD028 --> directive.

🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 43-43: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🤖 Prompt for AI Agents
In `@infrastructure/README.md` around lines 41 - 46, Two adjacent blockquote notes
("Copy the state bucket name from the output." and "It is recommended to not
destroy the backend resources unless absolutely necessary.") are separated by a
blank line which triggers MD028; fix by either merging the consecutive
blockquotes into a single blockquote block (combine their lines so they appear
in one contiguous > paragraph) or add a comment directive <!--
markdownlint-disable MD028 --> above the first blockquote and re-enable it after
if preferred; update the README.md sections containing those quoted lines
accordingly.

@rudransh-shrivastava rudransh-shrivastava marked this pull request as ready for review February 7, 2026 13:31
"""Load data into the OWASP Nest application."""
with index.disable_indexing(), transaction.atomic():
# Run loaddata
call_command("loaddata", _options["fixture_path"], "-v", "3")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to have been migrated to a psql based command. I will update this and the infrastructure to use that in a new PR.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/settings/base.py (1)

156-164: ⚠️ Potential issue | 🟠 Major

RQ_QUEUES missing TLS configuration will cause connection failures to Redis if TLS is enforced.

The cache LOCATION uses rediss:// (TLS-enabled) when REDIS_AUTH_ENABLED is true, but RQ_QUEUES connects to the same REDIS_HOST via plain HOST/PORT without SSL settings. When the Redis instance enforces encryption in transit, RQ worker connections will be rejected.

Update RQ_QUEUES to mirror the TLS logic applied to the cache configuration:

Proposed fix
     RQ_QUEUES = {
         "ai": {
-            "HOST": REDIS_HOST,
-            "PORT": 6379,
-            "PASSWORD": REDIS_PASSWORD,
-            "DB": 1,
-            "DEFAULT_TIMEOUT": 300,
+            "URL": f"rediss://:{REDIS_PASSWORD}@{REDIS_HOST}:6379/1"
+            if REDIS_AUTH_ENABLED
+            else f"redis://{REDIS_HOST}:6379/1",
+            "DEFAULT_TIMEOUT": 300,
         }
     }
🤖 Fix all issues with AI agents
In @.github/workflows/run-ci-cd.yaml:
- Around line 765-767: The current echo lines hardcode indices for
private_subnet_a/private_subnet_b/private_subnet_c using jq -r
'.[0]'/'[1]'/'[2]' which fails if terraform output has fewer than 3 subnets;
change this to either (a) validate the count from terraform output (e.g.,
capture terraform output -json private_subnet_ids into a variable, use jq
'length' and fail with a clear error if < 3) or (b) dynamically iterate the
array and emit outputs for each entry (e.g., use jq -r '.[]' and a shell loop to
index and write "private_subnet_<index>=<id>" to $GITHUB_OUTPUT) so downstream
ECS run-task and Zappa receive only valid subnet IDs; update places that expect
fixed names if you choose dynamic naming.
- Around line 700-873: The deploy-staging-nest job lacks a timeout-minutes
setting which can let long-running steps (Terraform Apply, Zappa update, ECS
task waits such as Run ECS migrate task and Wait for frontend service stability)
consume runner time; add a timeout-minutes entry to the deploy-staging-nest job
(e.g. timeout-minutes: 60 or an appropriate value for staging) directly under
the job definition (deploy-staging-nest) after the job-level keys and before
steps to ensure the entire job is bounded.
- Around line 797-799: The CI step named "Update Zappa deployment" is using
`zappa update staging`, which fails on initial deployments because the Lambda
doesn't exist yet; change the step to run `zappa deploy staging` (or add
conditional logic to check if the Zappa-managed Lambda exists and run `deploy`
for first-time/absent functions and `update` otherwise) — locate the workflow
step by the step name "Update Zappa deployment" (working-directory: backend) and
replace the command accordingly.

In `@infrastructure/backend/main.tf`:
- Around line 136-150: The lifecycle for the S3 logs bucket (resource
aws_s3_bucket_lifecycle_configuration "logs" targeting aws_s3_bucket.logs)
currently only uses expiration for current objects, so noncurrent object
versions will never be removed; update the lifecycle to add a
noncurrent_version_expiration block (e.g., days = var.noncurrent_expire_days or
a new variable) and/or noncurrent_version_transition rules as appropriate to
ensure older versions are purged, or else remove/disable versioning on
aws_s3_bucket.logs if retaining versions is not required; ensure you reference
the existing abort_incomplete_multipart_upload, expiration and variable names
(var.expire_log_days) when adding the noncurrent rules so the configuration is
consistent.
- Around line 73-95: The DynamoDB state_lock resource
(aws_dynamodb_table.state_lock) needs AWS-level deletion protection and the KMS
key must allow DynamoDB to use it: add deletion_protection_enabled = true to the
aws_dynamodb_table.state_lock resource, and update the KMS module key policy (in
infrastructure/modules/kms main policy referenced by module.kms.key_arn) to
include a statement granting DynamoDB the actions kms:Encrypt, kms:Decrypt,
kms:DescribeKey, kms:GenerateDataKey*, kms:ReEncrypt*, kms:CreateGrant with a
Condition kms:ViaService set to dynamodb.<region>.amazonaws.com (or wildcard for
region) so DynamoDB can use the key for server_side_encryption; ensure the
principal and resource fields match the key policy structure and include the
account root as needed.

In `@infrastructure/README.md`:
- Around line 245-251: The README's S3 upload step uses an unexplained
placeholder `owasp-nest-fixtures-<id>`; add a short NOTE near that command
clarifying that `<id>` is the unique bucket suffix from the Terraform output
`fixtures_bucket_name` (or other source) and instruct the user to replace it
with that exact value; reference the command that mentions `data/nest.json.gz`
and the bucket name `owasp-nest-fixtures-<id>` so reviewers can find where to
insert the NOTE.
🧹 Nitpick comments (12)
infrastructure/README.md (3)

61-65: Prefer cp over cat > for copying files.

The command cat terraform.tfvars.example > terraform.tfvars works but is less idiomatic than cp terraform.tfvars.example terraform.tfvars. The standard copy command is clearer and more direct.

📝 Suggested simplification
-  - Copy the contents from the example file:
-
-    ```bash
-    cat terraform.tfvars.example > terraform.tfvars
-    ```
+  - Copy the example file:
+
+    ```bash
+    cp terraform.tfvars.example terraform.tfvars
+    ```

73-77: Prefer cp over cat > for copying files.

Same as the previous comment—using cp is more idiomatic than cat > for file copying.

📝 Suggested simplification
-  - Copy the contents from the example file:
-
-    ```bash
-    cat terraform.tfbackend.example > terraform.tfbackend
-    ```
+  - Copy the example file:
+
+    ```bash
+    cp terraform.tfbackend.example terraform.tfbackend
+    ```

132-136: Prefer cp over cat > for copying files.

Same pattern as earlier—cp is more idiomatic than cat > for file copying.

📝 Suggested simplification
-  - Copy the contents from the template file into your new local environment file:
-
-    ```bash
-    cat zappa_settings.template.json > zappa_settings.json
-    ```
+  - Copy the template file:
+
+    ```bash
+    cp zappa_settings.template.json zappa_settings.json
+    ```
.github/workflows/run-ci-cd.yaml (1)

621-626: Consider using OIDC for AWS authentication instead of static access keys.

Both plan-staging-nest and deploy-staging-nest use long-lived AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY static credentials. GitHub Actions supports OIDC federation with AWS via role-to-assume, which eliminates the need for stored secrets and provides short-lived credentials. This is a security best practice, especially for a public repository.

Example OIDC configuration
       - name: Configure AWS credentials
         uses: aws-actions/configure-aws-credentials@61815dcd50bd041e203e49132bacad1fd04d2708
         with:
-          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
           aws-region: ${{ vars.AWS_REGION }}
-          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}

This requires setting up an IAM OIDC provider and role in AWS, and adding id-token: write to the job's permissions.

Also applies to: 718-723

infrastructure/backend/main.tf (2)

23-35: Consider adding an HTTPS-only policy for the logs bucket as well.

The state bucket correctly enforces HTTPS-only access via state_https_only, but the logs bucket has no equivalent deny policy. IAM principals with access to the logs bucket could potentially interact with it over plain HTTP. For consistency and defense-in-depth, apply the same SecureTransport deny condition to the logs bucket.

♻️ Suggested addition
+data "aws_iam_policy_document" "logs_https_only" {
+  policy_id = "ForceHTTPS"
+
+  statement {
+    actions = ["s3:*"]
+    sid     = "HTTPSOnly"
+    effect  = "Deny"
+
+    condition {
+      test     = "Bool"
+      values   = ["false"]
+      variable = "aws:SecureTransport"
+    }
+    principals {
+      identifiers = ["*"]
+      type        = "AWS"
+    }
+    resources = [
+      aws_s3_bucket.logs.arn,
+      "${aws_s3_bucket.logs.arn}/*",
+    ]
+  }
+}

Then combine both statements in the logs bucket policy (or use source_policy_documents / override_policy_documents to merge them):

 resource "aws_s3_bucket_policy" "logs" {
   bucket = aws_s3_bucket.logs.id
-  policy = data.aws_iam_policy_document.logs.json
+  policy = data.aws_iam_policy_document.logs_combined.json
 }

195-213: AES256 encryption is acknowledged via trivy:ignore — consider upgrading the state bucket to KMS-CMK.

The Trivy suppressions indicate this is intentional. AES256 for the logs bucket is reasonable since S3 server access logging can have compatibility issues with KMS-encrypted target buckets. However, the state bucket holds sensitive infrastructure state and would benefit from CMK encryption (matching the DynamoDB table's approach), with key rotation and audit trail via CloudTrail. This could be deferred if the KMS module is still being stabilized.

infrastructure/modules/ecs/main.tf (1)

14-28: ECS tasks cluster is missing containerInsights.

The frontend module's cluster (Line 64-67 of frontend/main.tf) explicitly enables containerInsights, but this tasks cluster does not. Without it, you lose ECS-level CPU/memory/task metrics in CloudWatch, making it harder to troubleshoot scheduled task failures or resource pressure.

♻️ Proposed fix to enable Container Insights
 resource "aws_ecs_cluster" "main" {
   name = "${var.project_name}-${var.environment}-tasks-cluster"
   tags = var.common_tags
+
+  setting {
+    name  = "containerInsights"
+    value = "enabled"
+  }
 }
infrastructure/modules/frontend/main.tf (1)

109-136: Consider enabling a deployment circuit breaker for the frontend ECS service.

The aws_ecs_service resource does not configure deployment_circuit_breaker. Without it, a broken image deployment can leave the service stuck in a rollout loop with failing tasks draining capacity. Enabling the circuit breaker with rollback provides a safety net.

♻️ Suggested addition
 resource "aws_ecs_service" "frontend" {
   cluster                           = aws_ecs_cluster.frontend.id
   desired_count                     = var.desired_count
   health_check_grace_period_seconds = 60
   name                              = "${var.project_name}-${var.environment}-frontend-service"
   ...
   task_definition = aws_ecs_task_definition.frontend.arn
 
+  deployment_circuit_breaker {
+    enable   = true
+    rollback = true
+  }
+
   capacity_provider_strategy {
infrastructure/modules/networking/main.tf (2)

60-67: Single NAT gateway creates a single point of failure across AZs.

All private subnets route through one NAT gateway in public[0]. If that AZ becomes unavailable, all private subnet egress (ECR pulls, DB connections via public endpoints, external API calls) will fail. For a staging environment this is a reasonable cost trade-off, but for production consider deploying one NAT gateway per AZ.


22-43: Subnet creation assumes matching array lengths for AZs and CIDRs.

count = length(var.public_subnet_cidrs) (Line 25) indexes into var.availability_zones[count.index] (Line 23). If the AZ list is shorter than the CIDR list, this will fail at plan time with an index-out-of-range error. Consider adding a validation block in the variables file to ensure length(var.availability_zones) >= length(var.public_subnet_cidrs) and similarly for private subnets.

#!/bin/bash
# Check if there's any validation in the networking module's variables.tf
fd -p 'modules/networking/variables.tf' --exec grep -A 10 'validation' {} \;
infrastructure/modules/alb/main.tf (2)

135-150: Listener rule priority relies on implicit string-to-number conversion.

each.key from the for_each map is a string (e.g., "0", "1", "2"), so 100 + each.key relies on Terraform's implicit type coercion. While this works, an explicit tonumber() call would improve clarity.

♻️ Explicit type conversion
-  priority     = 100 + each.key
+  priority     = 100 + tonumber(each.key)

83-102: ALB has enable_deletion_protection = false.

This is fine for staging/dev, but for production the ALB should have deletion protection enabled to prevent accidental terraform destroy. Consider making this a variable so it can be toggled per environment.

Comment on lines +765 to +767
echo "private_subnet_a=$(terraform output -json private_subnet_ids | jq -r '.[0]')" >> $GITHUB_OUTPUT
echo "private_subnet_b=$(terraform output -json private_subnet_ids | jq -r '.[1]')" >> $GITHUB_OUTPUT
echo "private_subnet_c=$(terraform output -json private_subnet_ids | jq -r '.[2]')" >> $GITHUB_OUTPUT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded subnet indices assume exactly 3 availability zones.

Lines 765–767 use jq -r '.[0]', '.[1]', '.[2]' to extract private subnets. If the Terraform output returns fewer than 3 subnets, these will resolve to null, causing downstream ECS run-task and Zappa configuration to receive invalid subnet IDs.

Consider either validating the count or dynamically iterating instead of hardcoding indices.

🤖 Prompt for AI Agents
In @.github/workflows/run-ci-cd.yaml around lines 765 - 767, The current echo
lines hardcode indices for private_subnet_a/private_subnet_b/private_subnet_c
using jq -r '.[0]'/'[1]'/'[2]' which fails if terraform output has fewer than 3
subnets; change this to either (a) validate the count from terraform output
(e.g., capture terraform output -json private_subnet_ids into a variable, use jq
'length' and fail with a clear error if < 3) or (b) dynamically iterate the
array and emit outputs for each entry (e.g., use jq -r '.[]' and a shell loop to
index and write "private_subnet_<index>=<id>" to $GITHUB_OUTPUT) so downstream
ECS run-task and Zappa receive only valid subnet IDs; update places that expect
fixed names if you choose dynamic naming.

Comment on lines +797 to +799
- name: Update Zappa deployment
working-directory: backend
run: poetry run zappa update staging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Get the full workflow file to understand the deployment context around lines 797-799
wc -l .github/workflows/run-ci-cd.yaml

Repository: OWASP/Nest

Length of output: 91


🏁 Script executed:

# Read the workflow around the mentioned lines to understand context
sed -n '700,850p' .github/workflows/run-ci-cd.yaml

Repository: OWASP/Nest

Length of output: 6512


🏁 Script executed:

# Search for zappa deploy/update patterns in the workflow
rg -n "zappa deploy|zappa update" .github/workflows/run-ci-cd.yaml

Repository: OWASP/Nest

Length of output: 102


🏁 Script executed:

# Check if there's any zappa configuration or initialization logic
find . -name "zappa*" -o -name ".zappa*" | head -20

Repository: OWASP/Nest

Length of output: 120


🏁 Script executed:

# Look for any deploy vs update conditional logic in the repo
rg -n "if.*deploy\|if.*update" --type yaml .github/workflows/ -A 2 -B 2

Repository: OWASP/Nest

Length of output: 36


🏁 Script executed:

# Check if Terraform creates any Lambda resources
find infrastructure -name "*.tf" | head -20

Repository: OWASP/Nest

Length of output: 855


🏁 Script executed:

# Search for Lambda function definition in Terraform
rg -l "aws_lambda" infrastructure/

Repository: OWASP/Nest

Length of output: 136


🏁 Script executed:

# Read Zappa settings template to understand configuration
cat backend/zappa_settings.template.json

Repository: OWASP/Nest

Length of output: 1529


🏁 Script executed:

# Check if there's any zappa deploy in the entire workflow or infrastructure
rg -n "zappa deploy" .github/workflows/

Repository: OWASP/Nest

Length of output: 36


🏁 Script executed:

# Check if Terraform actually creates any Lambda functions
rg -n "aws_lambda_function\|aws_lambda" infrastructure/

Repository: OWASP/Nest

Length of output: 36


🏁 Script executed:

# Get a clearer picture of what's in the backend Terraform module
cat infrastructure/backend/main.tf | head -100

Repository: OWASP/Nest

Length of output: 1999


🏁 Script executed:

# Check if there's any documentation or comments about deployment strategy
rg -n "zappa\|lambda\|deploy" README.md CONTRIBUTING.md 2>/dev/null | head -30

Repository: OWASP/Nest

Length of output: 36


🏁 Script executed:

# Look for any other CI/CD workflows that might have initial deployment logic
ls -la .github/workflows/ | grep -i deploy

Repository: OWASP/Nest

Length of output: 36


🏁 Script executed:

# Check the staging infrastructure configuration
ls -la infrastructure/staging/

Repository: OWASP/Nest

Length of output: 686


🏁 Script executed:

# Look at the main staging Terraform configuration
cat infrastructure/staging/main.tf | head -150

Repository: OWASP/Nest

Length of output: 5935


🏁 Script executed:

# Check if any Terraform modules create Lambda functions
rg -r "aws_lambda_function" infrastructure/modules/

Repository: OWASP/Nest

Length of output: 36


🏁 Script executed:

# Check git history to see if there's ever been a zappa deploy step
git log --oneline -n 50 -- .github/workflows/run-ci-cd.yaml 2>/dev/null | head -20

Repository: OWASP/Nest

Length of output: 147


🏁 Script executed:

# Check how the ALB module uses the lambda_function_name
cat infrastructure/modules/alb/main.tf | grep -A 5 -B 5 "lambda_function_name"

Repository: OWASP/Nest

Length of output: 2028


🏁 Script executed:

# Check if the ALB module creates a Lambda resource
rg -n "aws_lambda" infrastructure/modules/alb/main.tf

Repository: OWASP/Nest

Length of output: 405


🏁 Script executed:

# Look at the recent commit that mentioned zappa migration
git show 7eea02fd --stat 2>/dev/null | head -50

Repository: OWASP/Nest

Length of output: 2957


🏁 Script executed:

# Check the workflow history to see if there was ever a zappa deploy step before
git log --all --oneline -- .github/workflows/run-ci-cd.yaml 2>/dev/null | head -30

Repository: OWASP/Nest

Length of output: 1804


zappa update will fail on initial deployment—use zappa deploy first.

The Lambda function is managed by Zappa, not Terraform. On first deployment (or if the function is deleted), zappa update staging assumes the Lambda already exists and will fail. Replace with zappa deploy staging, or add conditional logic to detect and handle the initial deployment scenario.

🤖 Prompt for AI Agents
In @.github/workflows/run-ci-cd.yaml around lines 797 - 799, The CI step named
"Update Zappa deployment" is using `zappa update staging`, which fails on
initial deployments because the Lambda doesn't exist yet; change the step to run
`zappa deploy staging` (or add conditional logic to check if the Zappa-managed
Lambda exists and run `deploy` for first-time/absent functions and `update`
otherwise) — locate the workflow step by the step name "Update Zappa deployment"
(working-directory: backend) and replace the command accordingly.

Comment on lines +73 to +95
resource "aws_dynamodb_table" "state_lock" {
name = "${var.project_name}-terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
tags = merge(local.common_tags, {
Name = "${var.project_name}-terraform-state-lock"
})

attribute {
name = "LockID"
type = "S"
}
lifecycle {
prevent_destroy = true
}
point_in_time_recovery {
enabled = true
}
server_side_encryption {
enabled = true
kms_key_arn = module.kms.key_arn
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Locate the KMS module referenced in the DynamoDB config
find . -type f -name "*.tf" | grep -i kms | head -20

Repository: OWASP/Nest

Length of output: 172


🏁 Script executed:

# Find the modules directory structure
find . -type d -name "modules" -o -type d -name "kms"

Repository: OWASP/Nest

Length of output: 352


🏁 Script executed:

# Look for the KMS module more broadly
fd '\.tf$' --path '*kms*'

Repository: OWASP/Nest

Length of output: 269


🏁 Script executed:

# Read the KMS module main.tf to check the key policy
cat -n ./infrastructure/modules/kms/main.tf

Repository: OWASP/Nest

Length of output: 2419


🏁 Script executed:

# Also check if there are any other files in the KMS module
ls -la ./infrastructure/modules/kms/

Repository: OWASP/Nest

Length of output: 475


Add deletion_protection_enabled = true and fix the KMS key policy for DynamoDB encryption.

prevent_destroy only guards against terraform destroy; it does not prevent deletion via the AWS console or API. For a critical resource like the state lock table, setting deletion_protection_enabled = true adds AWS-level protection against accidental deletion.

More critically, the KMS module's key policy is missing the required statement to allow DynamoDB to use the key for encryption. The key policy currently grants permissions only to the root account and CloudWatch Logs, but the DynamoDB table specifies kms_key_arn = module.kms.key_arn without corresponding KMS grants. Add a statement to the KMS key policy in infrastructure/modules/kms/main.tf that allows the DynamoDB service to use the key with the required actions (kms:Encrypt, kms:Decrypt, kms:DescribeKey, kms:GenerateDataKey*, kms:ReEncrypt*, kms:CreateGrant) and a kms:ViaService condition set to dynamodb.*.amazonaws.com. Without this, encryption will fail at runtime.

🛡️ Proposed DynamoDB table fix
 resource "aws_dynamodb_table" "state_lock" {
   name         = "${var.project_name}-terraform-state-lock"
   billing_mode = "PAY_PER_REQUEST"
+  deletion_protection_enabled = true
   hash_key     = "LockID"
🤖 Prompt for AI Agents
In `@infrastructure/backend/main.tf` around lines 73 - 95, The DynamoDB state_lock
resource (aws_dynamodb_table.state_lock) needs AWS-level deletion protection and
the KMS key must allow DynamoDB to use it: add deletion_protection_enabled =
true to the aws_dynamodb_table.state_lock resource, and update the KMS module
key policy (in infrastructure/modules/kms main policy referenced by
module.kms.key_arn) to include a statement granting DynamoDB the actions
kms:Encrypt, kms:Decrypt, kms:DescribeKey, kms:GenerateDataKey*, kms:ReEncrypt*,
kms:CreateGrant with a Condition kms:ViaService set to
dynamodb.<region>.amazonaws.com (or wildcard for region) so DynamoDB can use the
key for server_side_encryption; ensure the principal and resource fields match
the key policy structure and include the account root as needed.

Comment on lines +136 to +150
resource "aws_s3_bucket_lifecycle_configuration" "logs" {
bucket = aws_s3_bucket.logs.id

rule {
id = "expire-logs"
status = "Enabled"

abort_incomplete_multipart_upload {
days_after_initiation = var.abort_incomplete_multipart_upload_days
}
expiration {
days = var.expire_log_days
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Logs bucket: noncurrent versions will accumulate indefinitely.

Versioning is enabled on the logs bucket (Line 225), but the logs lifecycle configuration (Lines 136–150) only defines an expiration rule for current objects. With versioning enabled, expiration merely adds a delete marker — noncurrent versions are never removed. Add a noncurrent_version_expiration rule to prevent unbounded storage growth.

♻️ Proposed fix
 resource "aws_s3_bucket_lifecycle_configuration" "logs" {
   bucket = aws_s3_bucket.logs.id

   rule {
     id     = "expire-logs"
     status = "Enabled"

     abort_incomplete_multipart_upload {
       days_after_initiation = var.abort_incomplete_multipart_upload_days
     }
     expiration {
       days = var.expire_log_days
     }
+    noncurrent_version_expiration {
+      noncurrent_days = var.expire_log_days
+    }
   }
 }

Alternatively, if retaining old log versions is not needed, consider whether versioning on the logs bucket is necessary at all — the state bucket is where versioning provides critical value.

Also applies to: 222-227

🤖 Prompt for AI Agents
In `@infrastructure/backend/main.tf` around lines 136 - 150, The lifecycle for the
S3 logs bucket (resource aws_s3_bucket_lifecycle_configuration "logs" targeting
aws_s3_bucket.logs) currently only uses expiration for current objects, so
noncurrent object versions will never be removed; update the lifecycle to add a
noncurrent_version_expiration block (e.g., days = var.noncurrent_expire_days or
a new variable) and/or noncurrent_version_transition rules as appropriate to
ensure older versions are purged, or else remove/disable versioning on
aws_s3_bucket.logs if retaining versions is not required; ensure you reference
the existing abort_incomplete_multipart_upload, expiration and variable names
(var.expire_log_days) when adding the noncurrent rules so the configuration is
consistent.

Comment on lines +245 to +251
1. **Upload Fixture to S3**:

- Upload the fixture present in `backend/data` to `nest-fixtures` bucket using the following command:

```bash
aws s3 cp data/nest.json.gz s3://owasp-nest-fixtures-<id>/
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the <id> placeholder in the S3 bucket name.

The bucket name includes an <id> placeholder that isn't explained. Unlike the region and account ID placeholders elsewhere, there's no NOTE clarifying what this represents (e.g., random suffix from Terraform output, AWS account ID, etc.).

💡 Suggested clarification

Add a NOTE block before or after the command:

  > [!NOTE]
  > Replace `<id>` with the unique identifier from the Terraform output `fixtures_bucket_name`.
🤖 Prompt for AI Agents
In `@infrastructure/README.md` around lines 245 - 251, The README's S3 upload step
uses an unexplained placeholder `owasp-nest-fixtures-<id>`; add a short NOTE
near that command clarifying that `<id>` is the unique bucket suffix from the
Terraform output `fixtures_bucket_name` (or other source) and instruct the user
to replace it with that exact value; reference the command that mentions
`data/nest.json.gz` and the bucket name `owasp-nest-fixtures-<id>` so reviewers
can find where to insert the NOTE.

@rudransh-shrivastava rudransh-shrivastava marked this pull request as draft February 9, 2026 18:30
rudransh-shrivastava and others added 2 commits February 10, 2026 15:55
* Run make update

* Clean up snapshot generated videos

* Update backend/data/nest.dump

* feat(ui): revamp corporate supporters carousel (Infinite Marquee + Dark Mode fix) (#3837)

* feat(ui): revamp corporate supporters carousel (Infinite Marquee + Dark Mode fix)

* fix: resolve failing test case

* fix: add fallback text for unnamed sponsors

* docs: add docstrings to satisfy coverage requirements

* Run make check and fix tests.

---------

Co-authored-by: Kate <kate@kgthreads.com>

* Fix/redundant typescript assertion (#3834)

* Fix Sonar S4325 by narrowing session user fields instead of casting

* Fix unused ExtendedSession in mentorship page

* fix: redundant-typescript-assertion

* Fix stale latest date displayed in Project Health Dashboard metrics (#3842)

* Fixed latest date in proejct health dashboard

* updated order

* Update code

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* feat: improve backend test coverage to 96% (#3840)

* feat: improve backend test coverage to 96%

* fix comments

* fix issues

* fix issue

* fix cubic-dev-ai comments

* Update code

* Fix tests

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>

* Fix: merge consecutive RUN instructions in frontend Dockerfile (#3644)

* Fix: merge consecutive RUN instructions in frontend Dockerfile

* fix: comment Dockerfile note to prevent syntax error

* Update code

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>

* Fix 'is_merged' not being available on the Issue (#3843)

* Fix 'is_merged' not being available on the Issue

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* CI:  Add ansible-lint workflow for Ansible playbooks (#3796)

* ci: add ansible-lint workflow

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>

* Update .github/workflows/lint-ansible.yaml

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* ci: add ansible-lint make target and workflow

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>

* ci: add ansible-lint pre-commit hook

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>

* fix: whitespace & version

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>

* Update Makefile

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

* ci: enable ansible-lint scanning and add requirements.yml

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>

* chore(ansible):align linting and module usage

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>

* ci(ansible): install collections before deploy playbooks

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>

* Update code

* Update code

* Update .github/workflows/run-ci-cd.yaml

---------

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Fix ElevenLabs API error (#3861)

* use default liam voice

* bump speed by 0.10

---------

Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>

* Add Ime Iyonsi to MENTORS.md (#3866)

* Add mentor profile for Ime Iyonsi

Added Ime Iyonsi's mentor profile.

* Fix GitHub link for Ime Iyonsi

Corrected GitHub link for Ime Iyonsi.

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>

* Update MENTORS.md

* Enabled Strict Mode (#3776)

* Enabled Strict Mode

* fixed ai review

* fix

* fixed review

* fix

* update test

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>

* Resolve case-sensitivity in QueryParser to support Chapters/Members search (#3844)

* resolve query parser blocker

* use case_sensitive flag in QueryParser

* feat: add case_sensitive option to QueryParser and update tests

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>

* Update dependencies (#3874)

* Update dependencies

* Bump django-ninja version

* fix(proxy): pin nginx and certbot images (#3848)

* fix(proxy): pin nginx and certbot images

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>

* fix stable verssions

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>

---------

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>
Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>

* Update docker-compose/proxy/compose.yaml

* Update backend/pyproject.toml

* Update ansible lint configuration (#3880)

* Update .github/ansible/.ansible-lint.yaml

* Improve frontend test coverage above 80% and add missing test files (#3864)

* Imrove test coverage to 80% and added test

* Fixed coderabbit review

* update code

* fixed coderabbit ai

* fixed soanrqube warning

* fixed review

* update

* fixed aloglia cache_key (#3825)

* fixed aloglia cache_key

* change separator val to be semicolon (;)

* Update code

* add tests + use json filters

* add trailing newline

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>

* fix: remove unused className prop from AnchorTitle component (#3822)

* fix: remove unused className prop from AnchorTitle component

Fixes #3805

The className prop was defined in AnchorTitleProps but never used
in the component implementation. Removing it resolves Sonar rule
typescript:S6767 and improves code maintainability.

* fix: use className prop instead of removing it

- Added className back to AnchorTitleProps interface
- Accept className parameter in component
- Apply className to root div element
- Resolves reviewer feedback on PR #3822

* Update code

---------

Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>

---------

Signed-off-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>
Co-authored-by: Arkadii Yakovets <arkadii.yakovets@owasp.org>
Co-authored-by: Yashraj Pahuja <yashrajpahuja9999@gmail.com>
Co-authored-by: Kate <kate@kgthreads.com>
Co-authored-by: CodeAritraDhank <aritradhank21@gmail.com>
Co-authored-by: Anurag Yadav <143180737+anurag2787@users.noreply.github.com>
Co-authored-by: Harshit Verma <harshit1092004@gmail.com>
Co-authored-by: Arkadii Yakovets <2201626+arkid15r@users.noreply.github.com>
Co-authored-by: Shuban Mutagi <shubanmutagi55@gmail.com>
Co-authored-by: Muhammad Hassaan Saleem <iamhassaans@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: emaybu <152900874+emaybu@users.noreply.github.com>
Co-authored-by: sai chethana <saichethanavesireddy@gmail.com>
Co-authored-by: Rahul Paul <179798584+Mr-Rahul-Paul@users.noreply.github.com>
Co-authored-by: Lavanya <lavanyayadawad30@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @.github/workflows/run-ci-cd.yaml:
- Around line 420-437: The run-infrastructure-tests job currently lacks a
timeout-minutes setting which can allow Terraform to hang and waste runner
minutes; add a timeout-minutes (e.g., 15) to the run-infrastructure-tests job
definition so the entire job will be killed after that duration—update the job
block named "run-infrastructure-tests" (the job with name: Run Infrastructure
tests and the step that runs "make test-infrastructure") to include
timeout-minutes: 15.

In `@backend/pyproject.toml`:
- Line 14: The dependency entry dependencies.aws-xray-sdk = "^2.15.0" is fine
for now but AWS X-Ray SDKs enter maintenance mode on 2026-02-25; update the repo
by (A) keeping the current version pin as-is (dependencies.aws-xray-sdk =
"^2.15.0") to avoid breaking changes, (B) add a short note in the project
docs/CHANGELOG mentioning the maintenance-mode date and impact, and (C) create a
tracking issue/task to evaluate migrating to OpenTelemetry (include migration
criteria and target date before 2026-02-25) so the team can plan a replacement
before end-of-support.
🧹 Nitpick comments (2)
.github/workflows/run-ci-cd.yaml (2)

808-816: Guard against empty task ARN from run-task.

If ECS fails to place the task (e.g., capacity issues), tasks[0].taskArn resolves to None (literal string). The subsequent wait and describe-tasks steps would then fail with a confusing error. Add a guard:

Proposed fix
          TASK_ARN=$(aws ecs run-task \
            --cluster "$CLUSTER_NAME" \
            --task-definition "$TASK_DEFINITION" \
            --launch-type FARGATE \
            --network-configuration "awsvpcConfiguration={subnets=[$PRIVATE_SUBNET_IDS],securityGroups=[$ECS_SECURITY_GROUP_ID],assignPublicIp=DISABLED}" \
            --query 'tasks[0].taskArn' \
            --output text)
+
+         if [ -z "$TASK_ARN" ] || [ "$TASK_ARN" = "None" ]; then
+           echo "::error::Failed to start migrate task. Check ECS capacity and task definition."
+           exit 1
+         fi
          echo "task_arn=$TASK_ARN" >> $GITHUB_OUTPUT

499-501: Consider adding a version-specific ECR tag alongside latest.

Both backend and frontend ECR images are tagged only as :latest. Adding a version tag (e.g., using RELEASE_VERSION) would enable rollbacks and make it easier to trace which commit is deployed. This is optional for staging but becomes important if this pattern is replicated for production.

Comment on lines +420 to +437
run-infrastructure-tests:
name: Run Infrastructure tests
needs:
- run-security-scan-code
permissions:
contents: read
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd

- name: Install Terraform
uses: hashicorp/setup-terraform@b9cd54a3c349d3f38e8881555d616ced269862dd
with:
terraform_version: 1.14.0

- name: Run Infrastructure tests
run: make test-infrastructure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing timeout-minutes on run-infrastructure-tests.

Other test jobs in this workflow consistently set timeout-minutes. Without one, the default 360-minute limit applies, which could waste runner minutes if Terraform hangs. Consider adding a reasonable timeout (e.g., 10–15 minutes).

Proposed fix
       - name: Run Infrastructure tests
         run: make test-infrastructure
+    timeout-minutes: 15
🤖 Prompt for AI Agents
In @.github/workflows/run-ci-cd.yaml around lines 420 - 437, The
run-infrastructure-tests job currently lacks a timeout-minutes setting which can
allow Terraform to hang and waste runner minutes; add a timeout-minutes (e.g.,
15) to the run-infrastructure-tests job definition so the entire job will be
killed after that duration—update the job block named "run-infrastructure-tests"
(the job with name: Run Infrastructure tests and the step that runs "make
test-infrastructure") to include timeout-minutes: 15.

packages = [ { include = "apps" } ]
dependencies.algoliasearch = "^4.13.2"
dependencies.algoliasearch-django = "^4.0.0"
dependencies.aws-xray-sdk = "^2.15.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

aws-xray-sdk PyPI latest version 2026

💡 Result:

As of February 10, 2026, the latest PyPI release of aws-xray-sdk is 2.15.0, published Oct 29, 2025. [1]

Install/update:

pip install -U aws-xray-sdk

Sources: [1]


🌐 Web query:

aws-xray-sdk security vulnerabilities CVE 2025 2026

💡 Result:

Findings (CVE 2025/2026 for “aws-xray-sdk”)

  • No CVE-2025 or CVE-2026 entries found for the official npm package aws-xray-sdk in Snyk’s database (it reports no direct vulnerabilities for the latest release). [1]
  • No CVE-2025 or CVE-2026 entries found for the official PyPI package aws-xray-sdk in Safety’s database (it reports no known security vulnerabilities for latest version 2.15.0, released Oct 29, 2025). [2]

Important related 2025 issue: malicious/typosquat packages (not CVEs)

  • OSV flags malicious npm packages with confusingly similar names, e.g. aws-xray-sdk-node (OSV MAL-2025-15232) as containing malicious code. [3]
  • Another example: test-aws-xray-sdk-express (OSV MAL-2025-34766) is also flagged as malicious. [4]

Lifecycle note (affects patch cadence)

AWS announced the AWS X-Ray SDKs/Daemon enter maintenance mode on Feb 25, 2026, with releases limited to critical bug fixes and security updates, and end-of-support on Feb 25, 2027. [5]

What to do

  • Ensure you’re using the official package names (aws-xray-sdk on PyPI / aws-xray-sdk on npm), and block/alert on lookalike packages (like aws-xray-sdk-node). [3][4]
  • Keep dependency scanning enabled (SCA), because “no direct CVE” doesn’t rule out indirect dependency issues. [1][2]
  • Plan migration to OpenTelemetry given the maintenance-mode timeline. [5]

Sources: Snyk package page [1], Safety CLI DB [2], OSV malicious package entries [3][4], AWS announcement [5].


aws-xray-sdk 2.15.0 is the current latest release (Oct 29, 2025) with no known security vulnerabilities, but AWS X-Ray SDKs enter maintenance mode on Feb 25, 2026.

Version ^2.15.0 is valid and secure. However, note that AWS announced the X-Ray SDKs enter maintenance mode on Feb 25, 2026 (11 days away), limiting releases to critical bug fixes and security updates only, with end-of-support on Feb 25, 2027. Consider evaluating OpenTelemetry as a long-term alternative given this timeline.

🤖 Prompt for AI Agents
In `@backend/pyproject.toml` at line 14, The dependency entry
dependencies.aws-xray-sdk = "^2.15.0" is fine for now but AWS X-Ray SDKs enter
maintenance mode on 2026-02-25; update the repo by (A) keeping the current
version pin as-is (dependencies.aws-xray-sdk = "^2.15.0") to avoid breaking
changes, (B) add a short note in the project docs/CHANGELOG mentioning the
maintenance-mode date and impact, and (C) create a tracking issue/task to
evaluate migrating to OpenTelemetry (include migration criteria and target date
before 2026-02-25) so the team can plan a replacement before end-of-support.

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/run-ci-cd.yaml:
- Around line 861-873: The ECS "index-data" run-task step is fire-and-forget and
lacks failure detection; modify the Run ECS index-data task step
(TASK_DEFINITION: nest-staging-index-data) to capture the started task ARN from
the aws ecs run-task JSON output, log that ARN (and CLUSTER_NAME/POLICY
context), then poll or use aws ecs wait/describe-tasks against that task ARN to
wait for completion and inspect the containers[].exitCode and lastStatus,
failing the job (non-zero exit) or printing detailed failure info if the task
fails; mirror the migrate task's pattern for capturing the ARN, waiting, and
checking exit codes.
🧹 Nitpick comments (3)
.github/workflows/run-ci-cd.yaml (3)

473-478: Consider OIDC-based AWS authentication instead of static access keys.

Three jobs (build-staging-images, plan-staging-nest, deploy-staging-nest) use aws-access-key-id / aws-secret-access-key for authentication. For a public repository, using GitHub OIDC with role-to-assume is more secure — it eliminates long-lived credentials and reduces blast radius if secrets are compromised. This would require adding id-token: write to the job permissions and configuring an IAM OIDC identity provider in AWS.

Also applies to: 640-645, 737-742


652-663: Terraform backend configuration is duplicated between plan and deploy jobs.

The Prepare terraform backend step is identical in plan-staging-nest (Lines 652–663) and deploy-staging-nest (Lines 749–760). Consider extracting this into a composite action to reduce duplication and maintenance burden.

Also applies to: 749-760


58-58: Terraform 1.14.0 is valid but not the latest patch.

Terraform 1.14.5 is available as of now. Consider updating to the latest patch release for bug fixes, or pin to ~1.14 to automatically pick up patches.

Also applies to: 434-434, 650-650, 747-747

Comment on lines +861 to +873
- name: Run ECS index-data task
env:
CLUSTER_NAME: ${{ steps.tf-outputs.outputs.tasks_cluster_name }}
ECS_SECURITY_GROUP_ID: ${{ steps.tf-outputs.outputs.ecs_security_group_id }}
PRIVATE_SUBNET_IDS: ${{ steps.tf-outputs.outputs.private_subnet_ids }}
TASK_DEFINITION: nest-staging-index-data
run: |
aws ecs run-task \
--cluster "$CLUSTER_NAME" \
--task-definition "$TASK_DEFINITION" \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[$PRIVATE_SUBNET_IDS],securityGroups=[$ECS_SECURITY_GROUP_ID],assignPublicIp=DISABLED}"
echo "Index-data task started (runs async, ~30 min)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fire-and-forget index-data task has no failure detection.

Unlike the migrate task (Lines 837–858), the index-data ECS task is started without capturing its task ARN, waiting for completion, or checking the exit code. A silent failure here means stale search indexes with no CI signal. Consider at minimum capturing the task ARN and logging it for observability, or adding a non-blocking check step.

🤖 Prompt for AI Agents
In @.github/workflows/run-ci-cd.yaml around lines 861 - 873, The ECS
"index-data" run-task step is fire-and-forget and lacks failure detection;
modify the Run ECS index-data task step (TASK_DEFINITION:
nest-staging-index-data) to capture the started task ARN from the aws ecs
run-task JSON output, log that ARN (and CLUSTER_NAME/POLICY context), then poll
or use aws ecs wait/describe-tasks against that task ARN to wait for completion
and inspect the containers[].exitCode and lastStatus, failing the job (non-zero
exit) or printing detailed failure info if the task fails; mirror the migrate
task's pattern for capturing the ARN, waiting, and checking exit codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend backend-tests ci deploy docs Improvements or additions to documentation makefile

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Merge feature/nest-zappa-migration

2 participants