Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update production deploy configurations #1386

Merged
merged 2 commits into from
Feb 18, 2025
Merged

Update production deploy configurations #1386

merged 2 commits into from
Feb 18, 2025

Conversation

nygrenh
Copy link
Member

@nygrenh nygrenh commented Feb 14, 2025

Summary by CodeRabbit

  • New Features
    • Enhanced workload distribution across multiple services to improve reliability and resilience.
    • Updated deployment strategies and increased resource allocations for the learning service for better performance.
    • Improved scalability and availability with higher replica counts and new safeguards during maintenance.
    • Streamlined environment configuration for the main interface to optimize operations.

- Increase resource requests and limits for headless-lms
- Add rolling update strategy with 50% max unavailable and surge
- Add Pod Disruption Budget with minimum 2 available pods
- Increase replica count from 2 to 4 in production
Copy link
Contributor

coderabbitai bot commented Feb 14, 2025

Walkthrough

The changes update multiple Kubernetes deployment configurations by adding two topologySpreadConstraints (one for topology.kubernetes.io/zone and one for kubernetes.io/hostname) with a maxSkew of 1 and whenUnsatisfiable: ScheduleAnyway. Several deployments now apply a labelSelector to target specific apps. The headless-lms deployment also introduces a new RollingUpdate strategy with increased resource requests and limits, while the main-frontend deployment updates an environment variable. Additionally, a new PodDisruptionBudget for headless-lms is added and its replica count in production is increased.

Changes

File(s) Change Summary
kubernetes/base/{cms, course-material, example-exercise, quizzes, tmc}/deployment.yml Added two topologySpreadConstraints (for zone and hostname) with maxSkew: 1 and whenUnsatisfiable: ScheduleAnyway; applicable label selectors added where needed.
kubernetes/base/headless-lms/deployment.yml Introduced a RollingUpdate strategy with maxUnavailable and maxSurge set to 50%; updated CPU and memory resource requests and limits; added two topology spread constraints with a matching label selector.
kubernetes/base/main-frontend/deployment.yml Added topology spread constraints for zone and hostname; updated the NEXT_PUBLIC_BASE_PATH environment variable from null to an empty string.
kubernetes/production/headless-lms/pdb.yml New PodDisruptionBudget resource ensuring a minimum of 2 pods remain available.
kubernetes/production/kustomization.yaml Added the headless-lms PDB resource and increased the headless-lms replica count from 2 to 4.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer
    participant API as Kubernetes API Server
    participant Sched as Scheduler
    participant Node as Node

    Dev->>API: Submit updated Deployment YAML
    API->>Sched: Process new Pod specs with topologySpreadConstraints
    Sched->>Node: Evaluate available capacity across zones/hostnames
    Node-->>Sched: Return node availability data
    Sched-->>API: Schedule pods ("ScheduleAnyway" if constraints not fully met)
    API-->>Dev: Updated pods deployed and running
Loading

Poem

I'm a bunny, hopping through these YAML streams,
Spreading pods like magic across zone and node dreams.
With constraints in place and updates so bright,
Each new config makes my heart take flight.
Leaping code and carrots—oh, what a delight!


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
kubernetes/base/cms/deployment.yml (1)

34-46: LGTM! Topology spread constraints are well configured.

The topology spread constraints are properly configured to distribute pods across both zones and hostnames, which enhances availability. The maxSkew: 1 setting ensures balanced distribution while ScheduleAnyway prevents scheduling blocks.

Consider these additional availability enhancements:

  1. Set minReplicas in HPA to ensure minimum pod distribution
  2. Add anti-affinity rules for critical workloads
kubernetes/base/course-material/deployment.yml (1)

34-46: LGTM! Topology spread constraints follow the established pattern.

The configuration is consistent with other deployments while correctly targeting course-material service pods.

Since topology spread constraints are being added consistently across multiple deployments, consider:

  1. Creating a common base configuration using Kustomize bases
  2. Using Helm charts with configurable values
    This would reduce duplication and make future changes easier to manage.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 957b47a and c72a7d9.

📒 Files selected for processing (9)
  • kubernetes/base/cms/deployment.yml (1 hunks)
  • kubernetes/base/course-material/deployment.yml (1 hunks)
  • kubernetes/base/example-exercise/deployment.yml (1 hunks)
  • kubernetes/base/headless-lms/deployment.yml (3 hunks)
  • kubernetes/base/main-frontend/deployment.yml (2 hunks)
  • kubernetes/base/quizzes/deployment.yml (1 hunks)
  • kubernetes/base/tmc/deployment.yml (1 hunks)
  • kubernetes/production/headless-lms/pdb.yml (1 hunks)
  • kubernetes/production/kustomization.yaml (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build
  • GitHub Check: build-and-deploy
  • GitHub Check: headless_lms
🔇 Additional comments (11)
kubernetes/production/headless-lms/pdb.yml (1)

1-10:

✅ Verification successful

LGTM! PDB configuration looks good.

The PodDisruptionBudget configuration is well-structured and ensures high availability by maintaining at least 2 pods during voluntary disruptions. This aligns well with the increased replica count of 4 in production.

Let's verify that the label selector matches all related resources:


🏁 Script executed:

#!/bin/bash
# Check for consistent label usage across related resources
rg -l "app: headless-lms" kubernetes/

Length of output: 126


Verified PDB Configuration and Label Consistency

The PodDisruptionBudget is correctly configured with a minAvailable of 2, ensuring high availability during voluntary disruptions. Our verification confirms that both the PDB (kubernetes/production/headless-lms/pdb.yml) and the Deployment (kubernetes/base/headless-lms/deployment.yml) consistently use the label app: headless-lms.

LGTM!

kubernetes/base/tmc/deployment.yml (1)

34-46: LGTM! Topology spread constraints match the pattern.

The configuration maintains consistency with other deployments while properly targeting the tmc service pods.

kubernetes/base/quizzes/deployment.yml (1)

34-46: LGTM! Topology spread constraints are consistent.

The configuration maintains the established pattern while correctly targeting quizzes service pods.

kubernetes/base/headless-lms/deployment.yml (3)

25-29: LGTM! Deployment strategy aligns with high availability goals.

The RollingUpdate strategy with 50% maxUnavailable and maxSurge provides a good balance between update speed and availability, ensuring at least 2 pods remain available during updates when running with 4 replicas in production.


87-99: LGTM! Topology spread constraints enhance availability.

The constraints will help distribute pods across zones and hosts, improving fault tolerance. The maxSkew: 1 with ScheduleAnyway provides a good balance between optimal distribution and scheduling flexibility.


47-52:

❓ Verification inconclusive

Verify resource requirement increases.

Significant increases in resource requests and limits:

  • Memory: 100Mi → 200Mi (requests), 600Mi → 800Mi (limits)
  • CPU: 20m → 100m (requests), 200m → 500m (limits)

Please confirm:

  1. The justification for these increases (e.g., observed OOM issues, CPU throttling)
  2. That the cluster has sufficient capacity for these increased requirements across 4 replicas

🏁 Script executed:

#!/bin/bash
# Check cluster capacity and current resource usage

# Get nodes capacity and allocatable resources
kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU_CAPACITY:.status.capacity.cpu,MEMORY_CAPACITY:.status.capacity.memory,CPU_ALLOCATABLE:.status.allocatable.cpu,MEMORY_ALLOCATABLE:.status.allocatable.memory

# Get current resource usage across nodes
kubectl top nodes

# Calculate total resource requirements for 4 replicas
echo "Total resource requirements for 4 replicas:"
echo "CPU requests: 400m (4 x 100m)"
echo "CPU limits: 2000m (4 x 500m)"
echo "Memory requests: 800Mi (4 x 200Mi)"
echo "Memory limits: 3200Mi (4 x 800Mi)"

Length of output: 822


ACTION: Verify Increased Resource Requirements

The deployment configuration now increases resource allocations considerably:

  • Memory: requests from 100Mi to 200Mi; limits from 600Mi to 800Mi.
  • CPU: requests from 20m to 100m; limits from 200m to 500m.

For a 4-replica deployment, these changes require:

  • CPU: 400m (requests) and 2000m (limits)
  • Memory: 800Mi (requests) and 3200Mi (limits)

Please verify:

  1. The rationale behind these increased resource values (e.g., observed memory pressure or CPU throttling).
  2. That your cluster has sufficient capacity to support these aggregated requirements. Note that the provided resource-check script did not return node data (indicating that kubectl was not available in the current environment); ensure these checks are re-run in a properly configured cluster context.
kubernetes/production/kustomization.yaml (2)

4-4: LGTM! PDB addition enhances availability guarantees.

Adding a PodDisruptionBudget for headless-lms is a good practice to maintain service availability during voluntary disruptions like node drains.


57-58: LGTM! Replica count increase strengthens reliability.

Increasing to 4 replicas provides better fault tolerance and aligns well with:

  • The new topology spread constraints for zone/host distribution
  • The RollingUpdate strategy (50% maxUnavailable = 2 pods minimum)
  • The PodDisruptionBudget requirements
kubernetes/base/example-exercise/deployment.yml (1)

34-46: LGTM! Topology constraints follow established pattern.

The topology spread constraints are consistent with other deployments, promoting better pod distribution across zones and hosts.

kubernetes/base/main-frontend/deployment.yml (2)

34-46: LGTM! Topology constraints follow established pattern.

The topology spread constraints are consistent with other deployments, promoting better pod distribution across zones and hosts.


54-55:

❓ Verification inconclusive

Verify the impact of base path change.

Changing NEXT_PUBLIC_BASE_PATH from null to "" might affect routing behavior. Please confirm this change won't break any existing routes or client-side navigation.


🏁 Script executed:

#!/bin/bash
# Search for usage of NEXT_PUBLIC_BASE_PATH in the codebase
echo "Searching for NEXT_PUBLIC_BASE_PATH usage:"
rg "NEXT_PUBLIC_BASE_PATH" -A 3

Length of output: 5813


Attention: Please verify routing behavior with the updated base path configuration.

In the file kubernetes/base/main-frontend/deployment.yml (lines 54–55), NEXT_PUBLIC_BASE_PATH is now explicitly set to an empty string (""). This contrasts with other services where a non-empty base path (like "/cms", "/org", etc.) is used. The application’s Next.js configuration typically sets config.basePath only when a truthy value is provided, so with an empty string the base path won’t be applied—which might be intentional if the main frontend is meant to operate at the root path.

Please confirm that:

  • This change won’t inadvertently disrupt client-side navigation or route resolution.
  • Any logic that concatenates NEXT_PUBLIC_BASE_PATH (e.g., in shared-module/packages/common/src/utils/redirectBackAfterLoginOrSignup.tsx) handles an empty string as expected.

@nygrenh nygrenh merged commit 86ce6d8 into master Feb 18, 2025
18 checks passed
@nygrenh nygrenh deleted the fixes-1112320099 branch February 18, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant