-
Couldn't load subscription status.
- Fork 55
Improve compactor alerts and playbooks #334
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -272,11 +272,21 @@ Same as [`CortexCompactorHasNotSuccessfullyCleanedUpBlocks`](#CortexCompactorHas | |
| This alert fires when a Cortex compactor is not uploading any compacted blocks to the storage since a long time. | ||
|
|
||
| How to **investigate**: | ||
| - If the alert `CortexCompactorHasNotSuccessfullyRun` or `CortexCompactorHasNotSuccessfullyRunSinceStart` have fired as well, then investigate that issue first | ||
| - If the alert `CortexCompactorHasNotSuccessfullyRunCompaction` has fired as well, then investigate that issue first | ||
| - If the alert `CortexIngesterHasNotShippedBlocks` or `CortexIngesterHasNotShippedBlocksSinceStart` have fired as well, then investigate that issue first | ||
| - Ensure ingesters are successfully shipping blocks to the storage | ||
| - Look for any error in the compactor logs | ||
|
|
||
| ### CortexCompactorHasNotSuccessfullyRunCompaction | ||
|
|
||
| This alert fires if the compactor is not able to successfully compact all discovered compactable blocks (across all tenants). | ||
|
|
||
| When this alert fires, the compactor may still have successfully compacted some blocks but, for some reason, other blocks compaction is consistently failing. A common case is when the compactor is trying to compact a corrupted block for a single tenant: in this case the compaction of blocks for other tenants is still working, but compaction for the affected tenant is blocked by the corrupted block. | ||
|
|
||
| How to **investigate**: | ||
| - Look for any error in the compactor logs | ||
| - Corruption: [`not healthy index found`](#compactor-is-failing-because-of-not-healthy-index-found) | ||
|
|
||
| #### Compactor is failing because of `not healthy index found` | ||
|
|
||
| The compactor may fail to compact blocks due a corrupted block index found in one of the source blocks: | ||
|
|
@@ -301,18 +311,6 @@ To rename a block stored on GCS you can use the `gsutil` CLI: | |
| gsutil mv gs://BUCKET/TENANT/BLOCK gs://BUCKET/TENANT/corrupted-BLOCK | ||
| ``` | ||
|
|
||
| ### CortexCompactorHasNotUploadedBlocksSinceStart | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After the renaming it's covered by |
||
|
|
||
| Same as [`CortexCompactorHasNotUploadedBlocks`](#CortexCompactorHasNotUploadedBlocks). | ||
|
|
||
| ### CortexCompactorHasNotSuccessfullyRunCompaction | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moved above, to have "Compactor is failing because of |
||
|
|
||
| _TODO: this playbook has not been written yet._ | ||
|
|
||
| ### CortexCompactorRunFailed | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After the renaming it's covered by |
||
|
|
||
| _TODO: this playbook has not been written yet._ | ||
|
|
||
| ### CortexBucketIndexNotUpdated | ||
|
|
||
| This alert fires when the bucket index, for a given tenant, is not updated since a long time. The bucket index is expected to be periodically updated by the compactor and is used by queriers and store-gateways to get an almost-updated view over the bucket store. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.