Skip to content

[8.19] Fix that gap can be stuck "in-progress" (#221473)#224177

Merged
nkhristinin merged 1 commit intoelastic:8.19from
kibanamachine:backport/8.19/commit-dfd783e1
Jun 17, 2025
Merged

[8.19] Fix that gap can be stuck "in-progress" (#221473)#224177
nkhristinin merged 1 commit intoelastic:8.19from
kibanamachine:backport/8.19/commit-dfd783e1

Conversation

@kibanamachine
Copy link
Contributor

Backport

This will backport the following commits from main to 8.19:

Questions ?

Please refer to the Backport tool documentation

## Summary

[[Issue](https://github.com/elastic/kibana/issues/221111)](https://github.com/elastic/kibana/issues/221111)

Gaps can get stuck in the `in-progress` state if a rule is
backfill-executed with failures.

### Current behavior:

Let's say we have a gap from `12:00–13:00`.

When the gap is initially detected, it has the following state:

```
filled_intervals: []
unfilled_intervals: [12:00–13:00]
in_progress_intervals: []
```

When a backfill starts, we set `in_progress_intervals` to the range that
overlaps with the backfill. We also remove that range from
`unfilled_intervals`:

```
filled_intervals: []
unfilled_intervals: []
in_progress_intervals: [12:00–13:00]
```

After the backfill is successfully executed, we move the range to
`filled_intervals` and clear `in_progress_intervals`:

```
filled_intervals: [12:00–13:00]
unfilled_intervals: []
in_progress_intervals: []
```

However, if the backfill fails, we want to remove the range from
`in_progress_intervals` and move it back to `unfilled_intervals`. The
problem is that we cannot simply do this because there might be other
overlapping backfills still in progress for the same gap. In the case of
a successful execution, this isn’t an issue, as the range is moved to
`filled_intervals`.

When a backfill fails, we refetch all overlapping backfills for the gap
to recalculate the `in_progress_intervals`.

### Problem

In the current implementation, we're updating the gaps **before**
deleting the failed backfill. This causes the recalculated
`in_progress_intervals` to still include the failed backfill’s range,
resulting in a stale state.

### Fix

We should **first delete** the failed backfill, and **then** update the
gap. This ensures that the recalculated `in_progress_intervals` reflect
only the remaining active backfills.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit dfd783e)
@kibanamachine kibanamachine added the backport This PR is a backport of another PR label Jun 17, 2025
@kibanamachine kibanamachine enabled auto-merge (squash) June 17, 2025 06:54
auto-merge was automatically disabled June 17, 2025 06:57

Pull Request is not mergeable

@kibanamachine kibanamachine added the backport This PR is a backport of another PR label Jun 17, 2025
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

cc @nkhristinin

@nkhristinin nkhristinin merged commit 2df08eb into elastic:8.19 Jun 17, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport This PR is a backport of another PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants