fix(base-cluster/loki): adjust retention settings for loki logs#1745
fix(base-cluster/loki): adjust retention settings for loki logs#1745marvinWolff merged 2 commits intomainfrom
Conversation
WalkthroughReplaces Loki HelmRelease Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Helm as Helm chart
participant Values as Loki values
participant Loki as Loki components
participant Compactor as Compactor process
Helm->>Values: apply `loki.loki` values
Values->>Loki: set `limits_config.retention_period = 45d`
Values->>Compactor: configure compactor settings
note over Compactor: compaction_interval: 15m\nretention_delete_delay: 24h\nretention_delete_worker_count: 150
Compactor->>Loki: perform compaction & retention deletion runs
note right of Loki: `tableManager` retention config removed
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
charts/base-cluster/templates/monitoring/logs/loki.yaml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Update release-please config file for a possibly new chart
- GitHub Check: check licenses
- GitHub Check: lint helm chart (base-cluster)
🔇 Additional comments (2)
charts/base-cluster/templates/monitoring/logs/loki.yaml (2)
80-81: Retention configuration migrated to limits_config.The shift from legacy
tableManagertolimits_configwith explicitretention_periodis the correct modern approach for Loki 2.9+. The 45-day retention (45d) is a reasonable increase from the previous 6-week (42d) setting.Please confirm:
- What is the target Loki Helm chart version? (Ensure it supports limits_config & compactor retention settings)
- Is there a migration plan for existing Loki instances or data that may be affected by the retention change?
85-87: Verify the aggressive retention_delete_delay setting.The default
retention_delete_delayis 2 hours, but your configuration sets it to 1m. A 2-hour delay introduces a buffer before permanently deleting logs, ensuring a window in case of accidental deletion or misconfigurations.The 1-minute setting is unusually aggressive. While the delay allows index gateways to pull modified index files that don't contain references to marked chunks, preventing queries from failing against deleted chunks, and provides a short window to cancel chunk deletion in case of configuration mistakes, a 1-minute window may be too tight for safety.
In contrast,
retention_delete_worker_count: 150aligns with the default value of 150, which is appropriate.Please confirm whether the 1-minute delay is intentional for your retention strategy, or if it should be adjusted to a more conservative value like 2 hours for better safety margins.
1434d09 to
0f1fd37
Compare
0f1fd37 to
74666fe
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
charts/base-cluster/templates/monitoring/logs/loki.yaml (1)
80-81: Consider adding additional limits configuration parameters.The
limits_configcurrently only setsretention_period. Depending on your cluster's log ingestion profile and SingleBinary resource constraints, you may want to add parameters such as:
ingestion_rate_mb: Controls ingestion rate per tenant (default: 3)max_streams_per_user: Limits the number of log streams per tenantmax_cache_freshness_period: Controls cache freshness for queriesThis is not a blocker, as the current minimal configuration is valid, but consider whether rate limits or cache settings are needed for your environment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
charts/base-cluster/templates/monitoring/logs/loki.yaml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check licenses
- GitHub Check: Update release-please config file for a possibly new chart
- GitHub Check: lint helm chart (base-cluster)
🔇 Additional comments (2)
charts/base-cluster/templates/monitoring/logs/loki.yaml (2)
80-87: ✅ Retention settings appropriately corrected.The changes address the previous critical review comment:
retention_delete_delayhas been increased from 1m to 24h, which is the safe threshold for production Loki deployments. This prevents premature data loss and aligns with official Loki documentation recommendations. Theretention_delete_worker_count: 150andcompaction_interval: 15mare appropriate for SingleBinary deployments.
80-87:⚠️ Summary mentions tableManager removal, but it's not present in the provided code.The AI summary states the PR "removes the prior
tableManagerblock," but the full file context does not show atableManagersection. Either it was removed in a prior commit, or the summary refers to a previous iteration. This is a minor inconsistency but doesn't affect the correctness of the current changes.
🤖 I have created a release *beep* *boop* --- ## [10.0.1](base-cluster-v10.0.0...base-cluster-v10.0.1) (2025-10-27) ### Bug Fixes * **base-cluster/descheduler:** don't remove pods with too many restarts ([#1744](#1744)) ([9c1ed51](9c1ed51)) * **base-cluster/ingress:** add missing `prometheus` block 🙄 ([#1767](#1767)) ([a329e1a](a329e1a)) * **base-cluster/loki:** adjust retention settings for loki logs ([#1745](#1745)) ([1985d34](1985d34)) * **base-cluster/monitoring:** use the correct prometheus datasource id ([#1764](#1764)) ([511cc84](511cc84)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Fixed descheduler to prevent removal of pods with excessive restart counts * Added missing Prometheus monitoring configuration to ingress * Adjusted log retention settings in Loki * Corrected Prometheus datasource ID in monitoring <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Summary by CodeRabbit