-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update production deploy configurations #1386
Conversation
- Increase resource requests and limits for headless-lms - Add rolling update strategy with 50% max unavailable and surge - Add Pod Disruption Budget with minimum 2 available pods - Increase replica count from 2 to 4 in production
WalkthroughThe changes update multiple Kubernetes deployment configurations by adding two Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer
participant API as Kubernetes API Server
participant Sched as Scheduler
participant Node as Node
Dev->>API: Submit updated Deployment YAML
API->>Sched: Process new Pod specs with topologySpreadConstraints
Sched->>Node: Evaluate available capacity across zones/hostnames
Node-->>Sched: Return node availability data
Sched-->>API: Schedule pods ("ScheduleAnyway" if constraints not fully met)
API-->>Dev: Updated pods deployed and running
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
kubernetes/base/cms/deployment.yml (1)
34-46
: LGTM! Topology spread constraints are well configured.The topology spread constraints are properly configured to distribute pods across both zones and hostnames, which enhances availability. The
maxSkew: 1
setting ensures balanced distribution whileScheduleAnyway
prevents scheduling blocks.Consider these additional availability enhancements:
- Set
minReplicas
in HPA to ensure minimum pod distribution- Add anti-affinity rules for critical workloads
kubernetes/base/course-material/deployment.yml (1)
34-46
: LGTM! Topology spread constraints follow the established pattern.The configuration is consistent with other deployments while correctly targeting course-material service pods.
Since topology spread constraints are being added consistently across multiple deployments, consider:
- Creating a common base configuration using Kustomize bases
- Using Helm charts with configurable values
This would reduce duplication and make future changes easier to manage.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
kubernetes/base/cms/deployment.yml
(1 hunks)kubernetes/base/course-material/deployment.yml
(1 hunks)kubernetes/base/example-exercise/deployment.yml
(1 hunks)kubernetes/base/headless-lms/deployment.yml
(3 hunks)kubernetes/base/main-frontend/deployment.yml
(2 hunks)kubernetes/base/quizzes/deployment.yml
(1 hunks)kubernetes/base/tmc/deployment.yml
(1 hunks)kubernetes/production/headless-lms/pdb.yml
(1 hunks)kubernetes/production/kustomization.yaml
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build
- GitHub Check: build-and-deploy
- GitHub Check: headless_lms
🔇 Additional comments (11)
kubernetes/production/headless-lms/pdb.yml (1)
1-10
:✅ Verification successful
LGTM! PDB configuration looks good.
The PodDisruptionBudget configuration is well-structured and ensures high availability by maintaining at least 2 pods during voluntary disruptions. This aligns well with the increased replica count of 4 in production.
Let's verify that the label selector matches all related resources:
🏁 Script executed:
#!/bin/bash # Check for consistent label usage across related resources rg -l "app: headless-lms" kubernetes/Length of output: 126
Verified PDB Configuration and Label Consistency
The PodDisruptionBudget is correctly configured with a
minAvailable
of 2, ensuring high availability during voluntary disruptions. Our verification confirms that both the PDB (kubernetes/production/headless-lms/pdb.yml
) and the Deployment (kubernetes/base/headless-lms/deployment.yml
) consistently use the labelapp: headless-lms
.LGTM!
kubernetes/base/tmc/deployment.yml (1)
34-46
: LGTM! Topology spread constraints match the pattern.The configuration maintains consistency with other deployments while properly targeting the tmc service pods.
kubernetes/base/quizzes/deployment.yml (1)
34-46
: LGTM! Topology spread constraints are consistent.The configuration maintains the established pattern while correctly targeting quizzes service pods.
kubernetes/base/headless-lms/deployment.yml (3)
25-29
: LGTM! Deployment strategy aligns with high availability goals.The RollingUpdate strategy with 50% maxUnavailable and maxSurge provides a good balance between update speed and availability, ensuring at least 2 pods remain available during updates when running with 4 replicas in production.
87-99
: LGTM! Topology spread constraints enhance availability.The constraints will help distribute pods across zones and hosts, improving fault tolerance. The
maxSkew: 1
withScheduleAnyway
provides a good balance between optimal distribution and scheduling flexibility.
47-52
:❓ Verification inconclusive
Verify resource requirement increases.
Significant increases in resource requests and limits:
- Memory: 100Mi → 200Mi (requests), 600Mi → 800Mi (limits)
- CPU: 20m → 100m (requests), 200m → 500m (limits)
Please confirm:
- The justification for these increases (e.g., observed OOM issues, CPU throttling)
- That the cluster has sufficient capacity for these increased requirements across 4 replicas
🏁 Script executed:
#!/bin/bash # Check cluster capacity and current resource usage # Get nodes capacity and allocatable resources kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU_CAPACITY:.status.capacity.cpu,MEMORY_CAPACITY:.status.capacity.memory,CPU_ALLOCATABLE:.status.allocatable.cpu,MEMORY_ALLOCATABLE:.status.allocatable.memory # Get current resource usage across nodes kubectl top nodes # Calculate total resource requirements for 4 replicas echo "Total resource requirements for 4 replicas:" echo "CPU requests: 400m (4 x 100m)" echo "CPU limits: 2000m (4 x 500m)" echo "Memory requests: 800Mi (4 x 200Mi)" echo "Memory limits: 3200Mi (4 x 800Mi)"Length of output: 822
ACTION: Verify Increased Resource Requirements
The deployment configuration now increases resource allocations considerably:
- Memory: requests from 100Mi to 200Mi; limits from 600Mi to 800Mi.
- CPU: requests from 20m to 100m; limits from 200m to 500m.
For a 4-replica deployment, these changes require:
- CPU: 400m (requests) and 2000m (limits)
- Memory: 800Mi (requests) and 3200Mi (limits)
Please verify:
- The rationale behind these increased resource values (e.g., observed memory pressure or CPU throttling).
- That your cluster has sufficient capacity to support these aggregated requirements. Note that the provided resource-check script did not return node data (indicating that
kubectl
was not available in the current environment); ensure these checks are re-run in a properly configured cluster context.kubernetes/production/kustomization.yaml (2)
4-4
: LGTM! PDB addition enhances availability guarantees.Adding a PodDisruptionBudget for headless-lms is a good practice to maintain service availability during voluntary disruptions like node drains.
57-58
: LGTM! Replica count increase strengthens reliability.Increasing to 4 replicas provides better fault tolerance and aligns well with:
- The new topology spread constraints for zone/host distribution
- The RollingUpdate strategy (50% maxUnavailable = 2 pods minimum)
- The PodDisruptionBudget requirements
kubernetes/base/example-exercise/deployment.yml (1)
34-46
: LGTM! Topology constraints follow established pattern.The topology spread constraints are consistent with other deployments, promoting better pod distribution across zones and hosts.
kubernetes/base/main-frontend/deployment.yml (2)
34-46
: LGTM! Topology constraints follow established pattern.The topology spread constraints are consistent with other deployments, promoting better pod distribution across zones and hosts.
54-55
:❓ Verification inconclusive
Verify the impact of base path change.
Changing
NEXT_PUBLIC_BASE_PATH
fromnull
to""
might affect routing behavior. Please confirm this change won't break any existing routes or client-side navigation.
🏁 Script executed:
#!/bin/bash # Search for usage of NEXT_PUBLIC_BASE_PATH in the codebase echo "Searching for NEXT_PUBLIC_BASE_PATH usage:" rg "NEXT_PUBLIC_BASE_PATH" -A 3Length of output: 5813
Attention: Please verify routing behavior with the updated base path configuration.
In the file
kubernetes/base/main-frontend/deployment.yml
(lines 54–55),NEXT_PUBLIC_BASE_PATH
is now explicitly set to an empty string (""
). This contrasts with other services where a non-empty base path (like"/cms"
,"/org"
, etc.) is used. The application’s Next.js configuration typically setsconfig.basePath
only when a truthy value is provided, so with an empty string the base path won’t be applied—which might be intentional if the main frontend is meant to operate at the root path.Please confirm that:
- This change won’t inadvertently disrupt client-side navigation or route resolution.
- Any logic that concatenates
NEXT_PUBLIC_BASE_PATH
(e.g., inshared-module/packages/common/src/utils/redirectBackAfterLoginOrSignup.tsx
) handles an empty string as expected.
Summary by CodeRabbit