Skip to content

Conversation

@rudransh-shrivastava
Copy link
Collaborator

@rudransh-shrivastava rudransh-shrivastava commented Nov 13, 2025

Resolves #2595

Proposed change

Fix and ensure that all ECS tasks are functional.

Note: PR will be rebased once #2551 is merged.

Checklist

  • I've read and followed the contributing guidelines.
  • I've run make check-test locally; all checks and tests passed.

@rudransh-shrivastava rudransh-shrivastava changed the title Feature/nest zappa migration ecs tasks Fix long-running ECS/Fargate Tasks Nov 13, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 13, 2025

Summary by CodeRabbit

  • New Features

    • Added support for direct local execution mode in addition to Docker-based task execution.
    • Added configurable fixture path option for data loading commands.
    • Integrated AWS Parameter Store for secure environment variable management in deployments.
  • Documentation

    • Updated deployment instructions with Parameter Store secret configuration guidance.
    • Added helpful commands reference for monitoring deployments.
  • Improvements

    • Reduced default memory allocations for background tasks.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds EXEC_MODE-local execution support in Makefile and ECS tasks, exposes a --fixture-path option for load_data, populates environment variables from AWS SSM at import time, updates Dockerfile to include make, adjusts ECS IAM/policy and task configs to use SSM parameter ARNs, lowers several ECS task memory defaults, and updates infrastructure docs.

Changes

Cohort / File(s) Summary
Makefile & Dockerfile
backend/Makefile, backend/docker/Dockerfile
Makefile targets exec-backend-command and exec-backend-command-it gain EXEC_MODE branching to run commands locally (direct) or via docker exec; Dockerfile copies the Makefile into image, creates backend symlink, and installs make in builder stage.
SSM env population (wsgi)
backend/wsgi.py
Replaced prior internal helpers with a single public populate_environ_from_ssm() that paginates AWS SSM parameters (WithDecryption) and sets env vars using parameter basenames; invoked at import time.
Load data command & tests
backend/apps/common/management/commands/load_data.py, backend/tests/apps/common/management/commands/load_data_test.py
Added add_arguments() to accept --fixture-path (default data/nest.json.gz) and updated handle() to use the provided path; tests refactored to use django.core.management.call_command() and validate fixture-path handling.
ECS module: task commands & IAM
infrastructure/modules/ecs/main.tf
Added aws_caller_identity data, new SSM IAM policy and attachments for task execution role, corrected several role attachments, and updated task definitions to use container_parameters_arns and shell /bin/sh -c sequences invoking Make targets with EXEC_MODE=direct instead of direct Python manage.py calls.
ECS variables & memory defaults
infrastructure/modules/ecs/variables.tf
Renamed django_environment_variablescontainer_parameters_arns (map(string), removed sensitive), and reduced default memory for migrate, sync_data, update_project_health_metrics, and update_project_health_scores from 2048 to 1024.
Infrastructure documentation
infrastructure/README.md
Reworked Step 4 into "Populate Secrets" using AWS SSM Parameter Store, removed Terraform re-apply flow, added guidance on task-definition revision selection and subnet behavior, updated deployment notes and helpful commands (e.g., zappa tail).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Review SSM population: ensure parameter name-to-env mapping and pagination/WithDecryption usage are correct and safe to run at import time.
  • Validate EXEC_MODE branching in Makefile and corresponding Dockerfile changes to avoid breaking CI/container workflows.
  • Verify IAM policy attachments and role references in main.tf are correct and non-duplicative.
  • Confirm module input migration from container_environment to container_parameters_arns is consistently applied across all module calls.
  • Check shell command quoting/escaping in /bin/sh -c sequences and that Make targets referenced exist and behave as expected.
  • Assess impact of reduced task memory defaults on task stability and runtime behavior.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the PR: fixing long-running ECS/Fargate tasks that were getting stuck, which aligns with the changeset modifications to ECS task configurations and Makefile execution modes.
Description check ✅ Passed The description is related to the changeset, referencing issue #2595 about fixing ECS/Fargate tasks and confirming that checks and tests passed locally.
Linked Issues check ✅ Passed The PR comprehensively addresses issue #2595 by implementing multiple changes: adding EXEC_MODE-based execution support in Makefile [backend/Makefile], updating load_data command with fixture-path argument [backend/apps/common/management/commands/load_data.py], introducing SSM Parameter Store integration [backend/wsgi.py], updating ECS task definitions with container_parameters_arns and shell-based commands, and adding IAM policies for SSM access [infrastructure/modules/ecs/main.tf]. All changes work together to ensure ECS tasks function properly.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing ECS/Fargate task functionality: Makefile execution modes, ECS task configuration, SSM parameter integration, IAM policies, and supporting documentation updates are all within scope of resolving the stuck long-running tasks issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8c7f4d and fb72cdb.

📒 Files selected for processing (1)
  • backend/wsgi.py (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.
📚 Learning: 2025-11-23T11:52:15.447Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2699
File: backend/wsgi.py:13-13
Timestamp: 2025-11-23T11:52:15.447Z
Learning: In the OWASP Nest project, the SSM parameter store setup in backend/wsgi.py (using boto3 to fetch parameters from AWS Systems Manager) is designed for staging and production environments, not just for testing purposes.

Applied to files:

  • backend/wsgi.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend e2e tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added docs Improvements or additions to documentation backend makefile labels Nov 13, 2025
@rudransh-shrivastava rudransh-shrivastava changed the base branch from feature/nest-zappa-migration to main November 13, 2025 14:49
@github-actions
Copy link

Test: This PR must be linked to an issue.

@rudransh-shrivastava rudransh-shrivastava changed the base branch from main to feature/nest-zappa-migration November 13, 2025 14:50
@rudransh-shrivastava rudransh-shrivastava changed the base branch from feature/nest-zappa-migration to main November 13, 2025 14:50
@rudransh-shrivastava rudransh-shrivastava changed the base branch from main to feature/nest-zappa-migration November 13, 2025 14:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
infrastructure/modules/ecs/main.tf (1)

119-256: OWASP Make targets missing; load_data_task pattern inconsistent; image URL tagging requires alignment.

Critical issues confirmed:

  1. Missing OWASP Make targets: The tasks reference owasp-update-project-health-requirements, owasp-update-project-health-metrics, and owasp-update-project-health-scores, but these targets do not exist in backend/Makefile. These tasks will fail at runtime.

  2. load_data_task pattern inconsistency (lines 216–242): This task directly executes Python commands rather than wrapping in EXEC_MODE=direct make load-data. While the Make target exists, the task diverges from the pattern used by other tasks. Align with the Make infrastructure or document why this task requires direct execution.

  3. Image URL tagging inconsistency confirmed (lines 131, 160, 181 vs. 201, 230, 250): Three tasks append :latest to the ECR repository URL while others do not. Standardize the tagging strategy across all tasks.

  4. Task module compatibility verified: container_parameters_arns is properly supported in the task module variables and configuration.

🧹 Nitpick comments (4)
infrastructure/README.md (3)

56-60: Clarify which parameters need updating and where to find them.

The instruction references parameters with to-be-set-in-aws-console values but doesn't explain how users identify which DJANGO_* parameters require updating or what values to populate. Consider adding a reference to the parameters module documentation or more explicit guidance (e.g., "Look for parameters with the value to-be-set-in-aws-console and replace with your actual secret values").


106-108: Clarify what causes deployment failures and what "invalid" secrets mean.

The warning states that deployment might fail if DJANGO_* secrets like DJANGO_SLACK_BOT_TOKEN are "invalid," but it doesn't clarify what makes a secret invalid (e.g., missing value, wrong format, wrong key name) or describe the actual failure mode (beyond "no logs"). Consider expanding this note with concrete examples or a troubleshooting reference.


166-166: Verify AWS subnet auto-selection behavior and consider adding explicit subnet guidance.

The instruction assumes that subnets will auto-select when a VPC is chosen in the AWS console. This behavior may not be guaranteed across all AWS console versions or configurations. Consider either:

  1. Verifying this is consistent AWS behavior, or
  2. Adding explicit subnet selection guidance if users may need to manually select them.

This would prevent user confusion if the auto-selection doesn't occur as expected.

backend/apps/common/management/commands/load_data.py (1)

13-26: Consider improving type hint and removing redundant parameter.

The implementation is correct and maintains backward compatibility with the default fixture path. Two minor improvements:

  1. The parser parameter type hint could be more specific: parser: argparse.ArgumentParser
  2. The required=False parameter is redundant since it's the default for add_argument

Apply this diff:

+    def add_arguments(self, parser: argparse.ArgumentParser) -> None:
-    def add_arguments(self, parser) -> None:
         """Add command-line arguments to the parser.
 
         Args:
-            parser (argparse.ArgumentParser): The argument parser instance.
+            parser: The argument parser instance.
 
         """
         parser.add_argument(
             "--fixture-path",
             default="data/nest.json.gz",
-            required=False,
             type=str,
             help="Path to the fixture file",
         )

You'll also need to import argparse at the top of the file:

import argparse
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2175602 and c6e42aa.

📒 Files selected for processing (26)
  • backend/Makefile (1 hunks)
  • backend/apps/common/management/commands/load_data.py (1 hunks)
  • backend/docker/Dockerfile (1 hunks)
  • backend/settings/staging.py (0 hunks)
  • backend/wsgi.py (1 hunks)
  • backend/zappa_settings.example.json (1 hunks)
  • infrastructure/README.md (4 hunks)
  • infrastructure/main.tf (4 hunks)
  • infrastructure/modules/cache/main.tf (2 hunks)
  • infrastructure/modules/cache/variables.tf (0 hunks)
  • infrastructure/modules/database/main.tf (4 hunks)
  • infrastructure/modules/database/outputs.tf (1 hunks)
  • infrastructure/modules/database/variables.tf (2 hunks)
  • infrastructure/modules/ecs/main.tf (9 hunks)
  • infrastructure/modules/ecs/modules/task/main.tf (1 hunks)
  • infrastructure/modules/ecs/modules/task/variables.tf (1 hunks)
  • infrastructure/modules/ecs/variables.tf (5 hunks)
  • infrastructure/modules/parameters/main.tf (1 hunks)
  • infrastructure/modules/parameters/outputs.tf (1 hunks)
  • infrastructure/modules/parameters/variables.tf (1 hunks)
  • infrastructure/modules/security/main.tf (3 hunks)
  • infrastructure/modules/security/outputs.tf (1 hunks)
  • infrastructure/modules/security/variables.tf (1 hunks)
  • infrastructure/outputs.tf (0 hunks)
  • infrastructure/terraform.tfvars.example (1 hunks)
  • infrastructure/variables.tf (2 hunks)
💤 Files with no reviewable changes (3)
  • backend/settings/staging.py
  • infrastructure/outputs.tf
  • infrastructure/modules/cache/variables.tf
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.
📚 Learning: 2025-11-08T11:16:25.725Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.

Applied to files:

  • backend/zappa_settings.example.json
  • infrastructure/main.tf
  • infrastructure/modules/parameters/variables.tf
  • infrastructure/modules/parameters/main.tf
  • infrastructure/terraform.tfvars.example
  • infrastructure/modules/ecs/modules/task/variables.tf
📚 Learning: 2025-10-23T19:22:23.811Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/main.tf:0-0
Timestamp: 2025-10-23T19:22:23.811Z
Learning: In Zappa-based serverless deployments, Lambda functions and IAM execution roles are managed by Zappa at application deployment time (via `zappa deploy`/`zappa update`), not via Terraform. Terraform provisions the supporting infrastructure (VPC, RDS, S3, security groups, RDS Proxy, Secrets Manager), while Zappa handles the Lambda orchestration layer.

Applied to files:

  • backend/zappa_settings.example.json
  • infrastructure/README.md
📚 Learning: 2025-10-17T15:25:53.713Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/modules/database/main.tf:22-60
Timestamp: 2025-10-17T15:25:53.713Z
Learning: The infrastructure code in the `infrastructure/` directory is intended for quick testing purposes only, not production-grade deployment. Production-grade security hardening controls (such as IAM database authentication, deletion protection, Performance Insights KMS encryption) are not required for this testing infrastructure.

Applied to files:

  • infrastructure/README.md
  • infrastructure/terraform.tfvars.example
📚 Learning: 2025-10-26T12:50:50.512Z
Learnt from: ahmedxgouda
Repo: OWASP/Nest PR: 2429
File: backend/Makefile:30-32
Timestamp: 2025-10-26T12:50:50.512Z
Learning: The `exec-backend-e2e-command` and `exec-db-e2e-command` Makefile targets in the backend/Makefile are intended for local development and debugging only, not for CI/CD execution, so the `-it` flags are appropriate.

Applied to files:

  • backend/Makefile
📚 Learning: 2025-11-08T11:43:19.276Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:16-26
Timestamp: 2025-11-08T11:43:19.276Z
Learning: KMS CMK encryption for SSM SecureString parameters in infrastructure/modules/parameters/ is planned to be implemented after S3 state management is completed. Currently using AWS-managed keys for the testing infrastructure.

Applied to files:

  • infrastructure/modules/parameters/main.tf
🧬 Code graph analysis (1)
backend/apps/common/management/commands/load_data.py (2)
backend/apps/common/management/commands/restore_backup.py (1)
  • handle (13-17)
backend/apps/core/utils/index.py (1)
  • disable_indexing (74-92)
🪛 Checkov (3.2.334)
infrastructure/modules/parameters/main.tf

[high] 16-26: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 28-38: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 80-86: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 104-114: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 124-130: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 132-138: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 152-162: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 164-174: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)


[high] 176-186: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: CodeQL (javascript-typescript)
  • GitHub Check: CodeQL (python)
🔇 Additional comments (38)
infrastructure/README.md (1)

189-193: Good addition for observability.

The new "Helpful Commands" section with the zappa tail command is a useful addition for users to troubleshoot deployments and monitor logs.


Summary: The documentation updates align well with the infrastructure migration to AWS SSM Parameter Store and ECS task management. The main areas for improvement are:

  1. Clarity on secret population (lines 56–60): Users need explicit guidance on identifying which parameters to update and with what values.
  2. Deployment failure messaging (lines 106–108): The phrasing about "invalid" secrets and "no logs" could be more specific to help users troubleshoot.
  3. AWS behavior assumption (line 166): Verify or document the subnet auto-selection behavior to avoid user confusion.

The dynamic references (task revisions, Zappa-provided URLs) and new helpful commands are good improvements. Based on learnings, this documentation should ideally reference the parameters module setup to help users understand where the to-be-set-in-aws-console placeholder values come from.

backend/Makefile (2)

24-36: EXEC_MODE conditional pattern is clean and consistent.

Both targets properly branch execution: when EXEC_MODE=direct, commands run locally; otherwise they use Docker as before. The pattern is idiomatic Makefile and maintains backward compatibility (Docker mode is the implicit default when EXEC_MODE is unset).


22-22: Integration pattern is well-established throughout the Makefile.

Calling targets consistently set CMD before invoking exec-backend-command or exec-backend-command-it. This pattern holds across all invocations (create-superuser, migrations, data operations, etc.), ensuring predictable behavior.

Also applies to: 42-48, 72-95

backend/docker/Dockerfile (3)

41-41: Dockerfile changes properly enable Make target execution in both build and runtime contexts.

The Makefile copy + symlink approach cleanly solves path resolution: ln -s . backend converts backend/apps/*/Makefile includes to apps/*/Makefile without duplicating directory structure. Installing make in the final stage (line 54) ensures Make is available when ECS tasks execute inside the container.

Also applies to: 46-47, 54-54


41-41: Build stage structure correctly propagates Make infrastructure from builder to runtime.

The Makefile and symlink are created in the builder stage and inherited by the final stage via the wholesale directory copy (line 64), eliminating duplication and keeping the runtime image lean.

Also applies to: 46-47, 64-64


54-54: Package installation approach is efficient.

The make package is installed in the final stage with appropriate system update sequencing, consolidated into a single RUN layer for image efficiency.

infrastructure/modules/database/variables.tf (2)

7-11: LGTM!

The create_rds_proxy variable is well-defined with an appropriate default value of false, ensuring opt-in behavior for RDS proxy creation.


87-90: Variable rename is complete and consistent across all references.

Verification confirms all references to db_username have been removed and successfully replaced with db_user throughout the codebase. The variable is properly declared and used in all necessary locations.

infrastructure/modules/security/variables.tf (1)

7-11: LGTM!

The create_rds_proxy variable is well-defined and consistent with the same variable in the database module.

backend/apps/common/management/commands/load_data.py (1)

28-34: LGTM!

The parameterization of the fixture path enables flexible configuration for ECS tasks while maintaining backward compatibility through the default value.

backend/wsgi.py (1)

29-29: LGTM!

The noqa: E402 comment is justified since the environment must be populated from SSM before importing Django's WSGI application.

infrastructure/modules/security/outputs.tf (1)

6-9: LGTM!

The conditional output correctly returns the security group ID when the RDS proxy is created, or null otherwise. The indexed access [0] properly references the count-based resource.

infrastructure/modules/ecs/modules/task/main.tf (1)

46-49: LGTM - security improvement!

The migration from environment to secrets is a significant security improvement. Sensitive configuration values are now fetched from SSM Parameter Store at task runtime rather than being stored directly in the task definition.

infrastructure/terraform.tfvars.example (1)

1-10: LGTM - appropriate for testing infrastructure.

The configuration values are appropriate for a testing/staging environment. The db_backup_retention_period = 0 and force_destroy_bucket = true settings are acceptable since this infrastructure is intended for quick testing purposes only, not production-grade deployment.

Based on learnings.

infrastructure/modules/ecs/modules/task/variables.tf (1)

17-21: LGTM!

The variable rename from container_environment to container_parameters_arns clearly reflects the shift to SSM-based secrets. The updated description accurately documents that it maps environment variable names to SSM parameter ARNs.

infrastructure/modules/cache/main.tf (1)

16-20: Verify unconditional Redis auth token generation doesn't break existing deployments.

Line 18 now unconditionally references random_password.redis_auth_token[0].result, meaning Redis auth is always enabled. This changes from prior conditional behavior. While this aligns with security best practices, if deployments existed without auth tokens, this could cause issues.

Please verify:

  1. Whether the parameters module still receives redis_auth_token from the cache module output (since infrastructure/main.tf line 96 shows redis_password = module.cache.redis_auth_token)
  2. Whether existing deployments can handle the mandatory auth token, or if any manual Terraform state interventions are needed
backend/zappa_settings.example.json (1)

4-20: Clarify production environment configuration.

The example shows staging environment with SSM parameter store configuration. Based on the learnings note that the parameters module is "currently configured for staging environment only," verify whether this example needs production-equivalent configuration, or if it's intentionally staging-only for this PR iteration.

infrastructure/modules/database/outputs.tf (1)

7-10: Conditional endpoint selection is correct.

The output properly switches between RDS Proxy and direct database endpoints based on the feature flag, with correct indexing for the conditional resource.

infrastructure/variables.tf (3)

13-17: Good addition of RDS proxy feature flag.

The create_rds_proxy variable with safe default (false) enables optional RDS proxy provisioning without breaking existing deployments.


68-72: Variable rename is clear and consistent.

The db_usernamedb_user rename improves naming consistency and aligns with usage throughout the codebase.


74-82: Environment validation improves deployment safety.

The validation block restricting environment to staging or production prevents misconfiguration and aligns with the parameters module's environment-driven design.

infrastructure/modules/ecs/variables.tf (2)

12-16: SSM parameter ARN mapping variable rename is appropriate.

The django_environment_variablescontainer_parameters_arns rename accurately reflects the shift from direct environment variables to SSM Parameter Store ARN references. Removal of sensitive=true is correct since ARNs are not sensitive.


68-118: Verify that 50% memory reduction won't cause task failures.

Multiple ECS tasks have memory defaults reduced by 50% (2048 MB → 1024 MB):

  • migrate_task_memory (line 71)
  • sync_data_task_memory (line 93)
  • update_project_health_metrics_task_memory (line 105)
  • update_project_health_scores_task_memory (line 117)

Given the PR objective to "fix long-running ECS tasks," verify whether these reductions are intentional optimizations or if they relate to addressing task failures. If tasks were previously hitting memory limits or timeouts, reducing memory could exacerbate issues.

Please confirm:

  1. Whether memory reductions were tested with actual workloads
  2. Whether the original 2048 MB allocation was based on measured peak usage or arbitrary defaults
  3. Any correlation between memory and the long-running task issues mentioned in #2595
infrastructure/main.tf (2)

84-97: Parameters module wiring is well-positioned and correctly ordered.

The new parameters module is properly placed after database and cache modules to consume their outputs, establishing correct dependency ordering. The use of module.database.db_proxy_endpoint correctly adapts to the conditional proxy endpoint logic.


24-36: Verify cache module still exposes redis_auth_token as output.

All module inputs are consistently updated. However, line 96 references module.cache.redis_auth_token, which needs to be confirmed as an available cache module output (not visible in provided cache/main.tf review).

Please confirm that infrastructure/modules/cache/outputs.tf exposes the redis_auth_token value needed by the parameters module.

Also applies to: 38-70, 99-109

infrastructure/modules/security/main.tf (2)

33-48: Clear separation of RDS and RDS Proxy security groups.

The introduction of a dedicated, non-conditional aws_security_group "rds" resource, with the proxy security group now properly conditional, provides clean infrastructure for optional RDS proxy deployment.


93-124: Security group rule count logic correctly implements proxy feature flag.

The three security group rules implement correct conditional ingress:

  • rds_from_lambda (count = 0 when proxy enabled): Direct Lambda→RDS access without proxy
  • rds_from_proxy (count = 1 when proxy enabled): Proxy→RDS access when proxy is active
  • rds_proxy_from_lambda (count = 1 when proxy enabled): Lambda→Proxy access when proxy is active

This cleanly handles both proxy and non-proxy configurations with mutually exclusive rule sets.

infrastructure/modules/database/main.tf (3)

58-58: Variable rename is consistently applied.

Both the aws_db_instance resource and secrets manager secret version correctly use var.db_user instead of the previous var.db_username.

Also applies to: 74-74


79-158: RDS proxy conditional resource creation uses correct count pattern throughout.

All RDS proxy-related resources (IAM role, policy, proxy, target group, proxy target) correctly use count = var.create_rds_proxy ? 1 : 0, with all dependent references properly indexed using [0]. The least-privilege IAM policy grants only secretsmanager:GetSecretValue for secure secrets access.


127-131: RDS proxy configuration appropriately secures connections.

The proxy requires TLS (require_tls = true), uses Secrets Manager for credentials, and sets a 30-minute idle client timeout. Verify whether this timeout is appropriate for the long-running background tasks mentioned in issue #2595, as overly aggressive timeout settings could contribute to task interruptions.

infrastructure/modules/parameters/outputs.tf (1)

1-22: Output structure correctly exposes all SSM parameter ARNs.

The map cleanly exposes the ARNs of all 17 SSM parameters needed by ECS tasks for secret/config injection.

infrastructure/modules/parameters/main.tf (4)

1-14: Terraform provider versions are appropriate.

Requirements look good: Terraform ≥ 1.0, AWS ≥ 6.0, and random ≥ 3.5 are all recent stable versions.


40-50: Parameter naming and lifecycle strategy is sound.

The /${var.project_name}/${var.environment}/* naming convention provides good scoping and environment isolation. The lifecycle.ignore_changes blocks on non-credential parameters (allowed_hosts, configuration, settings_module) and Algolia/OpenAI/Sentry/Slack parameters correctly allow manual AWS console updates without Terraform drift. Credentials sourced from Terraform state (db_password, redis_password) appropriately lack lifecycle blocks.


188-191: Django secret key generation is solid.

Using random_string with 50 characters and special characters enabled ensures a strong, cryptographically suitable secret key. No concerns here.


16-186: KMS CMK encryption for SecureString parameters—already planned post-S3 state management.

All nine SecureString parameters (django_algolia_application_id, django_algolia_write_api_key, django_db_password, django_open_ai_secret_key, django_redis_password, django_secret_key, django_sentry_dsn, django_slack_bot_token, django_slack_signing_secret) currently lack kms_key_id and use AWS-managed keys. Per existing roadmap, KMS CMK encryption is slated for implementation after S3 state management is completed, making AWS-managed keys acceptable for the current testing infrastructure. Add kms_key_id to all SecureString parameters when KMS CMK implementation begins.

infrastructure/modules/parameters/variables.tf (1)

1-70: Variable schema is well-designed and environment-aware.

All 13 variables align with main.tf resource definitions. Sensitive flags on db_password and redis_password are correct. Defaults for configuration ("Staging") and settings_module ("settings.staging") reflect the current staging-focused scope noted in learnings; consider parameterizing these per environment in future iterations. No immediate concerns.

infrastructure/modules/ecs/main.tf (2)

12-12: Caller identity data source enables dynamic account reference.

Using data.aws_caller_identity.current to fetch account ID for the SSM policy resource pattern is a best practice that avoids hardcoding and ensures correctness across environments.


46-71: IAM policy correctly scopes SSM parameter read access.

The new policy permits ssm:GetParameters on the scoped resource pattern arn:aws:ssm:${region}:${account}:parameter/${project}/${environment}/*, correctly limiting ECS tasks to read parameters within their environment path. The policy attachment to the execution role (not task role) is appropriate for secret injection during task startup. Approved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/tests/apps/common/management/commands/load_data_test.py (1)

12-36: Add test coverage for the new --fixture-path argument.

The test correctly verifies the default fixture path behavior. However, since the load_data command now accepts a --fixture-path CLI argument, consider adding a test case that verifies custom fixture paths work correctly.

Example test to add:

@patch("apps.core.utils.index.DisableIndexing.unregister_indexes")
@patch("apps.core.utils.index.DisableIndexing.register_indexes")
@patch("apps.common.management.commands.load_data.call_command")
@patch("apps.common.management.commands.load_data.transaction.atomic")
def test_handle_with_custom_fixture_path(
    self,
    mock_atomic,
    mock_call_command,
    mock_register,
    mock_unregister,
):
    mock_atomic.return_value.__enter__ = MagicMock()
    mock_atomic.return_value.__exit__ = MagicMock()

    call_command("load_data", fixture_path="custom/path.json.gz")

    mock_call_command.assert_called_once_with("loaddata", "custom/path.json.gz", "-v", "3")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6e42aa and 2162352.

📒 Files selected for processing (2)
  • backend/docker/Dockerfile (1 hunks)
  • backend/tests/apps/common/management/commands/load_data_test.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/docker/Dockerfile
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/main.tf:0-0
Timestamp: 2025-10-23T19:22:23.811Z
Learning: In Zappa-based serverless deployments, Lambda functions and IAM execution roles are managed by Zappa at application deployment time (via `zappa deploy`/`zappa update`), not via Terraform. Terraform provisions the supporting infrastructure (VPC, RDS, S3, security groups, RDS Proxy, Secrets Manager), while Zappa handles the Lambda orchestration layer.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (2)
backend/tests/apps/common/management/commands/load_data_test.py (2)

4-4: LGTM: Correct Django testing pattern.

Switching to call_command aligns with Django's recommended approach for testing management commands.


42-61: LGTM: Exception handling test is correct.

The test properly verifies that indexing is re-enabled even when call_command raises an exception, which is critical for maintaining data integrity.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/tests/apps/common/management/commands/load_data_test.py (2)

19-21: Remove unused mock objects.

The mock_model and mock_app_config objects are created but never used in the tests. They're not patched into any location and no assertions reference them.

Apply this diff to remove the unused mocks from the first test:

-        mock_model = MagicMock()
-        mock_app_config = MagicMock()
-        mock_app_config.get_models.return_value = [mock_model]
-
         mock_atomic.return_value.__enter__ = MagicMock()

Apply similar cleanup to the second test (lines 49-51) and verify other test methods as well.

Also applies to: 49-51


84-86: Remove unused contextlib.suppress mock.

The patch of contextlib.suppress creates a mock that is never used or asserted. The actual contextlib.suppress on line 88 is the real one from the standard library, used only to prevent the test itself from failing due to the mocked exception.

Apply this diff:

         """Test that indexing is re-enabled even if call_command fails."""
         mock_call_command.side_effect = Exception("Call command failed")
 
-        with patch("contextlib.suppress") as mock_suppress:
-            mock_suppress.return_value.__enter__ = MagicMock()
-            mock_suppress.return_value.__exit__ = MagicMock()
-
-            with contextlib.suppress(Exception):
-                call_command("load_data")
+        with contextlib.suppress(Exception):
+            call_command("load_data")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2162352 and 7fe6726.

📒 Files selected for processing (1)
  • backend/tests/apps/common/management/commands/load_data_test.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:16-26
Timestamp: 2025-11-08T11:43:19.276Z
Learning: KMS CMK encryption for SSM SecureString parameters in infrastructure/modules/parameters/ is planned to be implemented after S3 state management is completed. Currently using AWS-managed keys for the testing infrastructure.
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/main.tf:0-0
Timestamp: 2025-10-23T19:22:23.811Z
Learning: In Zappa-based serverless deployments, Lambda functions and IAM execution roles are managed by Zappa at application deployment time (via `zappa deploy`/`zappa update`), not via Terraform. Terraform provisions the supporting infrastructure (VPC, RDS, S3, security groups, RDS Proxy, Secrets Manager), while Zappa handles the Lambda orchestration layer.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (2)
backend/tests/apps/common/management/commands/load_data_test.py (2)

4-4: LGTM! Better Django testing practice.

The switch from directly instantiating Command() to using call_command("load_data") is the idiomatic way to test Django management commands and better simulates how the command is invoked in production.

Also applies to: 29-29, 89-89


38-68: Excellent test coverage for the new fixture_path parameter.

The new test case properly validates that custom fixture paths are correctly passed through to the underlying loaddata command. The test structure is consistent with existing tests and includes all necessary assertions.

arkid15r
arkid15r previously approved these changes Nov 23, 2025
Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 👍

@arkid15r arkid15r marked this pull request as ready for review November 23, 2025 20:57
@arkid15r arkid15r requested a review from kasya as a code owner November 23, 2025 20:57
@arkid15r arkid15r enabled auto-merge (squash) November 23, 2025 21:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/wsgi.py (1)

9-23: (Repeat) Consider adding basic error handling/logging around SSM fetch at import time

This duplicates the existing pattern where any SSM/network/permission issue during boto3.client("ssm") or pagination will raise at import time and fail WSGI startup. That may be intentional (fail‑fast on missing secrets), but even in that model, a small amount of logging and a clear exception path would help debug ECS/Fargate issues and distinguish SSM failures from Django misconfig.

Non‑blocking for this PR, but you may want to:

  • Log how many parameters were loaded and from which path.
  • Log and re‑raise ClientError/BotoCoreError with a clearer message if you still prefer fail‑fast semantics.

Also applies to: 26-44

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4e711f and b8c7f4d.

📒 Files selected for processing (1)
  • backend/wsgi.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:16-26
Timestamp: 2025-11-08T11:43:19.276Z
Learning: KMS CMK encryption for SSM SecureString parameters in infrastructure/modules/parameters/ is planned to be implemented after S3 state management is completed. Currently using AWS-managed keys for the testing infrastructure.
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2699
File: backend/wsgi.py:13-13
Timestamp: 2025-11-23T11:52:15.447Z
Learning: In the OWASP Nest project, the SSM parameter store setup in backend/wsgi.py (using boto3 to fetch parameters from AWS Systems Manager) is designed for staging and production environments, not just for testing purposes.
📚 Learning: 2025-11-23T11:52:15.447Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2699
File: backend/wsgi.py:13-13
Timestamp: 2025-11-23T11:52:15.447Z
Learning: In the OWASP Nest project, the SSM parameter store setup in backend/wsgi.py (using boto3 to fetch parameters from AWS Systems Manager) is designed for staging and production environments, not just for testing purposes.

Applied to files:

  • backend/wsgi.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: CodeQL (javascript-typescript)

@arkid15r arkid15r disabled auto-merge November 23, 2025 21:09
@sonarqubecloud
Copy link

@arkid15r arkid15r enabled auto-merge (squash) November 23, 2025 22:03
@arkid15r arkid15r merged commit 5e7f073 into OWASP:feature/nest-zappa-migration Nov 23, 2025
26 checks passed
arkid15r added a commit to rudransh-shrivastava/Nest that referenced this pull request Nov 23, 2025
* Use AWS SSM Parameter Store to handle environment variables

* Use focused policy for read access

* Update documentation

* Add flag for create_rds_proxy

* set default value of create_rds_proxy to false

* Populate Zappa/Lambda environment variables from ssm/parameter store

* Update documentation

* Update example

* add default configurations

* add security group db from lambda

* fix load-data task by adding a --fixture-path flag

* fix ecs tasks by introducing ecs-* make targets

* change ecs run steps

* remove ecs-* and clean code

* add --no-cache

* use call_command

* add test for --fixture-path

* Update code

* Update backend/wsgi.py

---------

Co-authored-by: Arkadii Yakovets <[email protected]>
Co-authored-by: Arkadii Yakovets <[email protected]>
arkid15r added a commit to rudransh-shrivastava/Nest that referenced this pull request Nov 23, 2025
* Use AWS SSM Parameter Store to handle environment variables

* Use focused policy for read access

* Update documentation

* Add flag for create_rds_proxy

* set default value of create_rds_proxy to false

* Populate Zappa/Lambda environment variables from ssm/parameter store

* Update documentation

* Update example

* add default configurations

* add security group db from lambda

* fix load-data task by adding a --fixture-path flag

* fix ecs tasks by introducing ecs-* make targets

* change ecs run steps

* remove ecs-* and clean code

* add --no-cache

* use call_command

* add test for --fixture-path

* Update code

* Update backend/wsgi.py

---------

Co-authored-by: Arkadii Yakovets <[email protected]>
Co-authored-by: Arkadii Yakovets <[email protected]>
arkid15r added a commit that referenced this pull request Nov 23, 2025
* Populate Zappa/Lambda environment variables from ssm/parameter store

* Update documentation

* fix load-data task by adding a --fixture-path flag

* fix ecs tasks by introducing ecs-* make targets

* remove ecs-* and clean code

* add security group for ecs tasks

* update docs

* Fix long-running ECS/Fargate Tasks (#2620)

* Use AWS SSM Parameter Store to handle environment variables

* Use focused policy for read access

* Update documentation

* Add flag for create_rds_proxy

* set default value of create_rds_proxy to false

* Populate Zappa/Lambda environment variables from ssm/parameter store

* Update documentation

* Update example

* add default configurations

* add security group db from lambda

* fix load-data task by adding a --fixture-path flag

* fix ecs tasks by introducing ecs-* make targets

* change ecs run steps

* remove ecs-* and clean code

* add --no-cache

* use call_command

* add test for --fixture-path

* Update code

* Update backend/wsgi.py

---------

Co-authored-by: Arkadii Yakovets <[email protected]>
Co-authored-by: Arkadii Yakovets <[email protected]>

* Populate Zappa/Lambda environment variables from ssm/parameter store

* Update documentation

* fix ecs tasks by introducing ecs-* make targets

* remove ecs-* and clean code

* Update backend/wsgi.py

---------

Co-authored-by: Arkadii Yakovets <[email protected]>
Co-authored-by: Arkadii Yakovets <[email protected]>
arkid15r added a commit that referenced this pull request Nov 23, 2025
* Use AWS SSM Parameter Store to handle environment variables

* Use focused policy for read access

* Update documentation

* Add flag for create_rds_proxy

* set default value of create_rds_proxy to false

* Populate Zappa/Lambda environment variables from ssm/parameter store

* Update documentation

* Update example

* add default configurations

* add security group db from lambda

* fix load-data task by adding a --fixture-path flag

* fix ecs tasks by introducing ecs-* make targets

* change ecs run steps

* remove ecs-* and clean code

* add --no-cache

* use call_command

* add test for --fixture-path

* Update code

* Update backend/wsgi.py

---------

Co-authored-by: Arkadii Yakovets <[email protected]>
Co-authored-by: Arkadii Yakovets <[email protected]>
This was referenced Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend backend-tests docs Improvements or additions to documentation makefile

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix long-running ECS/Fargate Tasks

2 participants