-
Couldn't load subscription status.
- Fork 20
fix: deployment failures in terraform AMI lookup, rolling restart, and elasticsearch (devnet/testnet) #725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v1.0-dev
Are you sure you want to change the base?
Conversation
WalkthroughThree configuration files updated: dashmate restart command is rebuilt as a multiline command and now ignores non‑zero exits; elastic_stack adds Changes
Sequence DiagramssequenceDiagram
participant Task as Dashmate Restart Task
participant Cond as Eval: dashmate_safe_restart
participant Cmd as Build Command
participant Exec as Execute Restart
participant Handle as Result Handler
Task->>Cond: Check dashmate_safe_restart
alt safe = true
Cond->>Cmd: include --safe
else safe = false
Cond->>Cmd: omit --safe
end
Cmd->>Cmd: ensure --verbose and conditional --platform
Cmd->>Exec: run restart command
Exec->>Handle: return exit code
Note over Handle: failed_when: false -> ignore non-zero exit
sequenceDiagram
participant Task as Elastic Stack Tasks
participant StatCA as Stat: ca.zip
participant StatCert as Stat: certs.zip
participant FetchCA as Fetch CA (if missing)
participant FetchCert as Fetch certs (if missing)
participant Unarchive as Unarchive / Install
Task->>StatCA: stat ca.zip
StatCA-->>Task: exists? true/false
Task->>StatCert: stat certs.zip
StatCert-->>Task: exists? true/false
alt ca.zip not exists
Task->>FetchCA: fetch ca.zip
end
alt certs.zip not exists
Task->>FetchCert: fetch certs.zip
end
FetchCA->>Unarchive: pass files
FetchCert->>Unarchive: pass files
Unarchive->>Unarchive: proceed with existing logic
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
ansible/roles/dashmate/tasks/rolling_restart.yml (1)
6-7: Useis definedchecks instead of| default()per coding guidelines.The coding guidelines specify using
is definedchecks for variables likedashmate_safe_restartandneeds_core_restartto prevent undefined variable errors, rather than relying on| default()filters.Apply this diff to use proper conditional checks:
ansible.builtin.command: >- {{ dashmate_cmd }} restart - {{ '--safe' if dashmate_safe_restart | default(false) else '' }} - --verbose{{ '' if needs_core_restart | default(false) else ' --platform' }} + {% if dashmate_safe_restart is defined and dashmate_safe_restart %}--safe{% endif %} + --verbose{% if needs_core_restart is not defined or not needs_core_restart %} --platform{% endif %}Alternatively, keep the inline Jinja2 approach but add explicit
is definedchecks:ansible.builtin.command: >- {{ dashmate_cmd }} restart - {{ '--safe' if dashmate_safe_restart | default(false) else '' }} - --verbose{{ '' if needs_core_restart | default(false) else ' --platform' }} + {{ '--safe' if dashmate_safe_restart is defined and dashmate_safe_restart else '' }} + --verbose{{ '' if needs_core_restart is defined and needs_core_restart else ' --platform' }}Based on coding guidelines.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ansible/roles/dashmate/tasks/rolling_restart.yml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
ansible/roles/dashmate/tasks/*.yml
📄 CodeRabbit inference engine (CLAUDE.md)
ansible/roles/dashmate/tasks/*.yml: Useis definedchecks and conditional execution for variables such asdashmate_group_check,dashmate_user_check,dash_conf_stat,dash_conf_changed,logrotate_config_stat,dashmate_update,dashmate_start_all,dashmate_restart_all,dashmate_install_result, andtemplate_resultin Ansible tasks to prevent undefined variable errors
Implement conditional restart logic in Ansible tasks: in fast mode, each node restarts itself independently; in regular mode, coordinate chunked restarts to prevent network disruption
Check current vs required dashmate version in Ansible tasks and only install/update if versions differ, then restart services after version changes
Addmeta: flush_handlersbefore coordinated operations in regular mode to ensure all nodes reach synchronization points
Files:
ansible/roles/dashmate/tasks/rolling_restart.yml
ansible/**/*.yml
📄 CodeRabbit inference engine (CLAUDE.md)
Use force flags (
force_dashmate_rebuild,force_dashmate_reinstall,force_ssl_regenerate,force_logs_config,skip_dashmate_image_update) as manual overrides in Ansible playbooks and tasks when needed
Files:
ansible/roles/dashmate/tasks/rolling_restart.yml
🧠 Learnings (5)
📚 Learning: 2025-08-06T08:09:00.292Z
Learnt from: CR
PR: dashpay/dash-network-deploy#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-06T08:09:00.292Z
Learning: Applies to ansible/roles/dashmate/tasks/*.yml : Implement conditional restart logic in Ansible tasks: in fast mode, each node restarts itself independently; in regular mode, coordinate chunked restarts to prevent network disruption
Applied to files:
ansible/roles/dashmate/tasks/rolling_restart.yml
📚 Learning: 2025-08-06T08:09:00.292Z
Learnt from: CR
PR: dashpay/dash-network-deploy#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-06T08:09:00.292Z
Learning: Applies to ansible/roles/dashmate/tasks/*.yml : Check current vs required dashmate version in Ansible tasks and only install/update if versions differ, then restart services after version changes
Applied to files:
ansible/roles/dashmate/tasks/rolling_restart.yml
📚 Learning: 2025-08-06T08:09:00.292Z
Learnt from: CR
PR: dashpay/dash-network-deploy#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-06T08:09:00.292Z
Learning: Applies to ansible/roles/dashmate/tasks/quick_update.yml : Create a streamlined update-only task in `ansible/roles/dashmate/tasks/quick_update.yml` for fast updates
Applied to files:
ansible/roles/dashmate/tasks/rolling_restart.yml
📚 Learning: 2025-08-06T08:09:00.292Z
Learnt from: CR
PR: dashpay/dash-network-deploy#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-06T08:09:00.292Z
Learning: Applies to ansible/roles/dashmate/tasks/main.yml : Implement conditional execution for user/group checks, dash.conf generation, log configuration, and Docker image updates in `ansible/roles/dashmate/tasks/main.yml` based on fast mode flags
Applied to files:
ansible/roles/dashmate/tasks/rolling_restart.yml
📚 Learning: 2025-08-06T08:09:00.292Z
Learnt from: CR
PR: dashpay/dash-network-deploy#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-06T08:09:00.292Z
Learning: Applies to ansible/roles/dashmate/tasks/*.yml : Use `is defined` checks and conditional execution for variables such as `dashmate_group_check`, `dashmate_user_check`, `dash_conf_stat`, `dash_conf_changed`, `logrotate_config_stat`, `dashmate_update`, `dashmate_start_all`, `dashmate_restart_all`, `dashmate_install_result`, and `template_result` in Ansible tasks to prevent undefined variable errors
Applied to files:
ansible/roles/dashmate/tasks/rolling_restart.yml
🔇 Additional comments (2)
ansible/roles/dashmate/tasks/rolling_restart.yml (2)
4-7: Verify the inverted logic of the--platformflag condition.Line 7 conditionally includes
--platformwhenneeds_core_restartis false, which appears semantically inverted. When the variable name suggests we need a core restart, the flag is excluded. Conversely, when we don't need a core restart, the flag is included.Confirm that this logic is intentional and clarify the semantics of
--platformvs.needs_core_restart.
13-14: Review error-masking implications offailed_when: false.Setting
failed_when: falsecauses the task to always succeed, even on non-zero exit codes. While the PR objectives note this is intentional to avoid DKG timeout failures on fresh deployments, this will also mask legitimate restart failures that should be surfaced for investigation.Consider whether a more granular approach might be better, such as:
- Checking the exit code and only ignoring specific timeout-related failures.
- Adding validation in a follow-up task to confirm services are actually running after the "failed" restart.
- Documenting why fresh deployments are expected to fail and what operators should monitor.
Per PR objectives: This is an intentional trade-off for fresh deployments; however, ensure the downstream tasks and monitoring account for this permissive error handling.
Summary
Fixes deployment failures that occur during initial network deployments.
Changes
Terraform AMI lookup fix (
terraform/aws/main.tf)server*toserver-*for both AMD64 and ARM64ubuntu-jammy-22.04-amd64-server-20240927Rolling restart improvements (
ansible/roles/dashmate/tasks/rolling_restart.yml)--safeflag conditional viadashmate_safe_restartvariablefailed_when: falseto prevent DKG timeout failures on fresh deploymentsElasticsearch certificate handling (
ansible/roles/elastic_stack/tasks/main.yml)Testing
Tested on devnet-mahua deployment - all services deployed successfully without errors and chain progressing.
Summary by CodeRabbit