|  | 
|  | 1 | +# ADR: Provisioning Strategy - Minimal Cloud-init + Ansible | 
|  | 2 | + | 
|  | 3 | +## Status | 
|  | 4 | + | 
|  | 5 | +**Proposed** - Based on comprehensive analysis of current PoC limitations and production requirements | 
|  | 6 | + | 
|  | 7 | +## Context | 
|  | 8 | + | 
|  | 9 | +The current PoC uses a cloud-init + shell script approach for VM provisioning and application | 
|  | 10 | +deployment. While this approach works for demonstration purposes, it presents significant | 
|  | 11 | +challenges for production use and testing automation: | 
|  | 12 | + | 
|  | 13 | +### Current Approach Limitations | 
|  | 14 | + | 
|  | 15 | +**Cloud-init Heavy Approach**: | 
|  | 16 | + | 
|  | 17 | +- Complex debugging when provisioning fails | 
|  | 18 | +- Limited conditional logic capabilities | 
|  | 19 | +- Difficult to test without full VM lifecycle | 
|  | 20 | +- Shell script brittleness and maintenance overhead | 
|  | 21 | +- Poor CI/CD integration due to VM dependencies | 
|  | 22 | + | 
|  | 23 | +**Testing Challenges**: | 
|  | 24 | + | 
|  | 25 | +- 8-12 minute test cycles including VM provisioning | 
|  | 26 | +- Requires KVM/libvirt support for testing | 
|  | 27 | +- Standard CI runners don't support nested virtualization | 
|  | 28 | +- Infrastructure failures obscure application issues | 
|  | 29 | +- High resource requirements (CPU, memory, storage) | 
|  | 30 | + | 
|  | 31 | +**Technology Stack Complexity**: | 
|  | 32 | + | 
|  | 33 | +- 4-technology stack: Terraform + Cloud-init + Docker + Shell scripts | 
|  | 34 | +- Complex orchestration between different tooling approaches | 
|  | 35 | +- Inconsistent error handling and logging across tools | 
|  | 36 | + | 
|  | 37 | +## Decision | 
|  | 38 | + | 
|  | 39 | +**Adopt a minimal cloud-init + Ansible hybrid approach** for the production redesign: | 
|  | 40 | + | 
|  | 41 | +### Cloud-init Role (Minimal) | 
|  | 42 | + | 
|  | 43 | +Cloud-init will handle only essential system initialization: | 
|  | 44 | + | 
|  | 45 | +- Basic system setup (users, SSH keys, network) | 
|  | 46 | +- Package manager configuration and essential packages | 
|  | 47 | +- Docker installation and daemon configuration | 
|  | 48 | +- Security configuration (firewall, fail2ban, SSH hardening) | 
|  | 49 | +- Ansible prerequisites (Python, pip, ansible-core) | 
|  | 50 | + | 
|  | 51 | +### Ansible Role (Primary) | 
|  | 52 | + | 
|  | 53 | +Ansible will handle all application-level configuration and deployment: | 
|  | 54 | + | 
|  | 55 | +- Application configuration management | 
|  | 56 | +- Service deployment and orchestration | 
|  | 57 | +- Health checks and validation | 
|  | 58 | +- Environment-specific customization | 
|  | 59 | +- Operational procedures (backups, monitoring, updates) | 
|  | 60 | + | 
|  | 61 | +### Technology Stack Simplification | 
|  | 62 | + | 
|  | 63 | +**Target Stack**: | 
|  | 64 | + | 
|  | 65 | +- **Infrastructure**: OpenTofu/Terraform | 
|  | 66 | +- **Configuration Management**: Ansible | 
|  | 67 | +- **Services**: Docker Compose | 
|  | 68 | +- **Testing**: Container-first with minimal VM validation | 
|  | 69 | + | 
|  | 70 | +## Rationale | 
|  | 71 | + | 
|  | 72 | +### 1. Improved Testability | 
|  | 73 | + | 
|  | 74 | +**Container-Based Testing**: Ansible playbooks can be tested using molecule with Docker driver, | 
|  | 75 | +eliminating VM dependencies for most test scenarios: | 
|  | 76 | + | 
|  | 77 | +- **Speed**: Container startup in seconds vs. minutes for VMs | 
|  | 78 | +- **CI/CD Native**: Standard CI platforms support Docker containers | 
|  | 79 | +- **Resource Efficiency**: Lower CPU, memory, and storage requirements | 
|  | 80 | +- **Debugging**: Direct access to application logs and state | 
|  | 81 | + | 
|  | 82 | +### 2. Enhanced Maintainability | 
|  | 83 | + | 
|  | 84 | +**Declarative Configuration**: Ansible's YAML-based declarative syntax is more maintainable | 
|  | 85 | +than shell scripts: | 
|  | 86 | + | 
|  | 87 | +- Clear, readable configuration management | 
|  | 88 | +- Built-in idempotency guarantees | 
|  | 89 | +- Comprehensive error handling and logging | 
|  | 90 | +- Large ecosystem of community modules | 
|  | 91 | + | 
|  | 92 | +### 3. Production Readiness | 
|  | 93 | + | 
|  | 94 | +**Operational Excellence**: Ansible provides production-grade capabilities: | 
|  | 95 | + | 
|  | 96 | +- Role-based organization for reusability | 
|  | 97 | +- Inventory management for multi-environment deployments | 
|  | 98 | +- Vault integration for secret management | 
|  | 99 | +- Comprehensive logging and audit trails | 
|  | 100 | + | 
|  | 101 | +### 4. CI/CD Compatibility | 
|  | 102 | + | 
|  | 103 | +**Testing Strategy**: Container-first approach enables efficient CI/CD pipelines: | 
|  | 104 | + | 
|  | 105 | +- Unit tests: Individual components in containers (seconds) | 
|  | 106 | +- Integration tests: Multi-service Docker Compose (1-3 minutes) | 
|  | 107 | +- E2E tests: Reserved for critical scenarios with real infrastructure (5-10 minutes) | 
|  | 108 | + | 
|  | 109 | +## Implementation Strategy | 
|  | 110 | + | 
|  | 111 | +### Phase 1: Core Infrastructure | 
|  | 112 | + | 
|  | 113 | +1. **Minimal Cloud-init Templates**: Create lean cloud-init configurations focused on system initialization | 
|  | 114 | +2. **Ansible Playbook Structure**: Develop role-based playbooks for application deployment | 
|  | 115 | +3. **Container Testing**: Implement molecule-based testing for Ansible roles | 
|  | 116 | + | 
|  | 117 | +### Phase 2: Application Integration | 
|  | 118 | + | 
|  | 119 | +1. **Service Orchestration**: Migrate Docker Compose management to Ansible | 
|  | 120 | +2. **Configuration Management**: Replace envsubst templating with Ansible Jinja2 | 
|  | 121 | +3. **Health Checks**: Implement comprehensive service validation | 
|  | 122 | + | 
|  | 123 | +### Phase 3: Testing and Validation | 
|  | 124 | + | 
|  | 125 | +1. **Container Test Suite**: Comprehensive Docker-based testing | 
|  | 126 | +2. **Integration Validation**: Multi-service container testing | 
|  | 127 | +3. **Minimal E2E**: Strategic VM testing for infrastructure validation | 
|  | 128 | + | 
|  | 129 | +## Consequences | 
|  | 130 | + | 
|  | 131 | +### Positive | 
|  | 132 | + | 
|  | 133 | +- **Faster Development Cycles**: Container-based testing reduces feedback loops | 
|  | 134 | +- **Better CI/CD Integration**: Standard CI platforms support Docker natively | 
|  | 135 | +- **Improved Debugging**: Clear error messages and logging from Ansible | 
|  | 136 | +- **Enhanced Maintainability**: Declarative configuration over imperative scripts | 
|  | 137 | +- **Production Readiness**: Industry-standard configuration management practices | 
|  | 138 | +- **Reduced Complexity**: 3-technology stack vs. current 4-technology approach | 
|  | 139 | + | 
|  | 140 | +### Negative | 
|  | 141 | + | 
|  | 142 | +- **Learning Curve**: Team needs Ansible expertise | 
|  | 143 | +- **Migration Effort**: Requires refactoring existing shell script logic | 
|  | 144 | +- **Initial Complexity**: Setting up molecule testing framework | 
|  | 145 | + | 
|  | 146 | +### Risks and Mitigation | 
|  | 147 | + | 
|  | 148 | +**Risk**: Ansible playbook complexity could become unwieldy | 
|  | 149 | +**Mitigation**: Use role-based organization and follow Ansible best practices | 
|  | 150 | + | 
|  | 151 | +**Risk**: Container testing might miss infrastructure-specific issues | 
|  | 152 | +**Mitigation**: Maintain strategic E2E testing for critical infrastructure scenarios | 
|  | 153 | + | 
|  | 154 | +## Alternative Approaches Considered | 
|  | 155 | + | 
|  | 156 | +### 1. Pure Cloud-init Approach | 
|  | 157 | + | 
|  | 158 | +**Rejected**: Maintains testing challenges and limited flexibility for complex logic | 
|  | 159 | + | 
|  | 160 | +### 2. Ansible-Only (No Cloud-init) | 
|  | 161 | + | 
|  | 162 | +**Rejected**: Requires more complex initial connectivity setup and provider-specific handling | 
|  | 163 | + | 
|  | 164 | +### 3. Shell Script Enhancement | 
|  | 165 | + | 
|  | 166 | +**Rejected**: Doesn't address fundamental testing and maintainability issues | 
|  | 167 | + | 
|  | 168 | +## References | 
|  | 169 | + | 
|  | 170 | +- [Ansible Best Practices](https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html) | 
|  | 171 | +- [Molecule Testing Framework](https://molecule.readthedocs.io/) | 
|  | 172 | +- [Testcontainers Documentation](https://www.testcontainers.org/) | 
|  | 173 | +- [Docker Compose Testing Strategies](https://docs.docker.com/compose/) | 
|  | 174 | + | 
|  | 175 | +## Related Decisions | 
|  | 176 | + | 
|  | 177 | +- **Testing Strategy**: Three-layer architecture with container-first approach | 
|  | 178 | +- **Configuration Management**: Ansible Jinja2 templating over envsubst | 
|  | 179 | +- **Technology Stack**: Simplified 3-component architecture | 
0 commit comments