Skip to content

Conversation

dahui
Copy link

@dahui dahui commented Sep 21, 2025

Major Helm Chart Enhancement: External etcd Support and Production Features

Overview

This pull request significantly enhances the Omni Helm chart with production-ready features while maintaining full backwards compatibility with existing deployments. The chart now supports both embedded and external etcd configurations, comprehensive ingress management, and advanced service configuration options.

🚨 Breaking Changes

None - This release maintains full backwards compatibility. Existing deployments will continue to work without any changes.

✨ Major Features Added

1. External etcd Support

  • New Configuration: etcd.external: true enables external etcd cluster usage
  • Horizontal Scaling: Multiple replicas supported when using external etcd
  • Authentication: Both direct credentials and Kubernetes secret-based authentication
  • TLS Support: Complete TLS configuration with file paths or secret-based certificates
  • Automatic Detection: Chart automatically detects existing deployments and maintains compatibility

2. Intelligent Resource Management

  • StatefulSet for New Deployments: Automatic PVC provisioning for embedded etcd
  • Deployment for External etcd: Stateless deployment for external etcd configurations
  • Backwards Compatibility: Existing Deployment-based installations continue unchanged
  • Resource Detection: Uses Helm's lookup function to detect existing resources

3. Comprehensive Ingress Templates

  • Four Ingress Types: API, UI, Siderolink, and Kubernetes Proxy
  • Cert-Manager Integration: Automatic certificate provisioning
  • Wildcard Support: Kubernetes proxy includes wildcard DNS for ArgoCD compatibility
  • Flexible Configuration: Per-ingress customization with annotations and TLS

4. Enhanced Service Configuration

  • WireGuard Service Types: NodePort, LoadBalancer, or ClusterIP support
  • Traffic Policy Control: externalTrafficPolicy configuration for load balancing
  • Per-Service Annotations: Global and service-specific annotation support
  • Automatic DNS Resolution: WireGuard address auto-resolves to cluster DNS when not specified

5. Production-Ready Features

  • Pod Disruption Budgets: Configurable availability guarantees
  • Advanced Security: Proper capability management and device plugin support
  • Flexible Storage: Support for storage classes and automatic PVC provisioning
  • Comprehensive Documentation: Detailed configuration examples and troubleshooting

🔧 Technical Implementation

Resource Selection Logic

1. Existing Deployment detected → Continue using Deployment (backwards compatibility)
2. Existing StatefulSet detected → Continue using StatefulSet (backwards compatibility)  
3. New deployment + etcd.external: false → Use StatefulSet with embedded etcd
4. New deployment + etcd.external: true → Use Deployment with external etcd

WireGuard Address Resolution

  • Default: Automatic DNS resolution to wireguard.namespace.svc.cluster.local
  • Override: Explicit IP/FQDN configuration for external connectivity
  • Load Balancer: Support for preserving client IP with externalTrafficPolicy: Local

External etcd Configuration Examples

Basic External etcd:

etcd:
  external: true
  endpoints:
    - "https://etcd-1.example.com:2379"
    - "https://etcd-2.example.com:2379"

With Secret-based Authentication:

etcd:
  external: true
  auth:
    secretName: "etcd-auth"
    usernameKey: "username"
    passwordKey: "password"

With TLS from Secrets:

etcd:
  external: true
  tls:
    enabled: true
    secretName: "etcd-tls"
    certKey: "client.crt"
    keyKey: "client.key"
    caKey: "ca.crt"

📋 Files Modified

New Templates

  • templates/statefulset.yaml - StatefulSet for embedded etcd deployments
  • templates/ingress.yaml - Comprehensive ingress configurations
  • templates/poddisruptionbudget.yaml - Pod disruption budget support

Enhanced Templates

  • templates/deployment.yaml - External etcd support with conditional rendering
  • templates/service.yaml - Per-service annotations and WireGuard configuration
  • templates/_helpers.tpl - Updated helper functions

Configuration

  • values.yaml - Extensive new configuration options
  • README.md - Comprehensive documentation update

🔄 Migration Path

Existing Users

  • No Action Required: Existing deployments continue working unchanged
  • Upgrade Safe: helm upgrade maintains current configuration
  • Storage Preserved: PVCs and data remain intact

New Users

  • Default Behavior: StatefulSet with embedded etcd and automatic PVC provisioning
  • External etcd: Set etcd.external: true for horizontal scaling
  • Production Features: Enable ingress, PDB, and advanced service configuration as needed
  • Ready-to-Use Examples: Four complete values files for different scenarios

Migration Support

  • Data Migration: Complete procedures for preserving etcd data during transitions
  • Resource Type Changes: Safe migration between Deployment and StatefulSet
  • External etcd Migration: Production-ready migration to external etcd clusters
  • Rollback Procedures: Clear instructions for reverting changes if needed

🧪 Testing Recommendations

Backwards Compatibility Testing

  1. Deploy chart with previous version
  2. Upgrade to new version with --reuse-values
  3. Verify no resource changes or disruptions

New Feature Testing

  1. Test StatefulSet deployment with embedded etcd
  2. Test Deployment with external etcd configuration
  3. Verify ingress templates with different configurations
  4. Test service annotations and WireGuard service types

📚 Documentation

The README has been completely rewritten with:

  • Table of Contents: Easy navigation with all new sections
  • Architecture Decisions: Explanation of design choices
  • Backwards Compatibility: Clear migration guidance
  • Migration Guide: Step-by-step instructions for data preservation
  • Configuration Examples: Four complete deployment scenarios
  • Troubleshooting: Common issues and solutions

Migration Documentation

  • Deployment to StatefulSet: Complete migration with etcd data backup/restore
  • Embedded to External etcd: Production migration path with HA scaling
  • Data Preservation: Safe migration procedures with zero data loss
  • Downtime Management: Clear expectations and mitigation strategies

Configuration Examples

  • Minimal Embedded etcd: Basic StatefulSet with automatic PVC provisioning
  • Minimal External etcd: Multi-replica deployment with external etcd cluster
  • Production Setup: Enterprise configuration with ingress, PDB, and HA
  • Development/Testing: Lightweight configuration for dev environments

🎯 Benefits

For Existing Users

  • Zero Disruption: Seamless upgrades with no configuration changes
  • Enhanced Features: Access to new capabilities without migration
  • Improved Documentation: Better understanding of chart capabilities

For New Users

  • Production Ready: StatefulSet with proper storage management
  • Scalability: External etcd support for high availability
  • Enterprise Features: Comprehensive ingress and service configuration
  • Operational Excellence: Pod disruption budgets and advanced monitoring

🔒 Security Considerations

  • Capability Management: Proper NET_ADMIN capability for WireGuard
  • Secret Management: Support for Kubernetes secrets for credentials
  • TLS Configuration: Comprehensive TLS support for external etcd
  • Network Policies: Documentation for network security

This enhancement transforms the Omni Helm chart from a basic deployment tool into a production-ready, enterprise-grade solution while maintaining complete backwards compatibility for existing users.

@github-project-automation github-project-automation bot moved this to To Do in Planning Sep 21, 2025
@talos-bot talos-bot moved this from To Do to In Review in Planning Sep 21, 2025
@dahui
Copy link
Author

dahui commented Sep 21, 2025

Testing final behavior of chart before submitting for review.

@shanduur shanduur self-requested a review September 22, 2025 13:45
@dahui dahui marked this pull request as draft September 24, 2025 06:22
@smira smira moved this from In Review to On Hold in Planning Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: On Hold
Development

Successfully merging this pull request may close these issues.

2 participants