Skip to content

Fix that gap can be stuck "in-progress"#221473

Merged
nkhristinin merged 6 commits intoelastic:mainfrom
nkhristinin:gaps-stuck
Jun 17, 2025
Merged

Fix that gap can be stuck "in-progress"#221473
nkhristinin merged 6 commits intoelastic:mainfrom
nkhristinin:gaps-stuck

Conversation

@nkhristinin
Copy link
Contributor

@nkhristinin nkhristinin commented May 26, 2025

Summary

[Issue](#221111)

Gaps can get stuck in the in-progress state if a rule is backfill-executed with failures.

Current behavior:

Let's say we have a gap from 12:00–13:00.

When the gap is initially detected, it has the following state:

filled_intervals: []
unfilled_intervals: [12:00–13:00]
in_progress_intervals: []

When a backfill starts, we set in_progress_intervals to the range that overlaps with the backfill. We also remove that range from unfilled_intervals:

filled_intervals: []
unfilled_intervals: []
in_progress_intervals: [12:00–13:00]

After the backfill is successfully executed, we move the range to filled_intervals and clear in_progress_intervals:

filled_intervals: [12:00–13:00]
unfilled_intervals: []
in_progress_intervals: []

However, if the backfill fails, we want to remove the range from in_progress_intervals and move it back to unfilled_intervals. The problem is that we cannot simply do this because there might be other overlapping backfills still in progress for the same gap. In the case of a successful execution, this isn’t an issue, as the range is moved to filled_intervals.

When a backfill fails, we refetch all overlapping backfills for the gap to recalculate the in_progress_intervals.

Problem

In the current implementation, we're updating the gaps before deleting the failed backfill. This causes the recalculated in_progress_intervals to still include the failed backfill’s range, resulting in a stale state.

Fix

We should first delete the failed backfill, and then update the gap. This ensures that the recalculated in_progress_intervals reflect only the remaining active backfills.

@nkhristinin
Copy link
Contributor Author

/ci

1 similar comment
@nkhristinin
Copy link
Contributor Author

/ci

@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor Author

/ci

@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor Author

/ci

@nkhristinin nkhristinin changed the title Change order of gap update Fix that gap can be stuck "in-progress" Jun 11, 2025
@nkhristinin nkhristinin marked this pull request as ready for review June 11, 2025 08:29
@nkhristinin nkhristinin requested a review from a team as a code owner June 11, 2025 08:29
@nkhristinin nkhristinin added backport:version Backport to applied version labels v8.19.0 v9.0.3 v8.18.3 release_note:skip Skip the PR/issue when compiling release notes labels Jun 11, 2025
@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response Ops change LGTM. Code review only.

@nkhristinin nkhristinin merged commit dfd783e into elastic:main Jun 17, 2025
10 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.18, 8.19, 9.0

https://github.com/elastic/kibana/actions/runs/15700178150

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jun 17, 2025
## Summary

[[Issue](https://github.com/elastic/kibana/issues/221111)](https://github.com/elastic/kibana/issues/221111)

Gaps can get stuck in the `in-progress` state if a rule is
backfill-executed with failures.

### Current behavior:

Let's say we have a gap from `12:00–13:00`.

When the gap is initially detected, it has the following state:

```
filled_intervals: []
unfilled_intervals: [12:00–13:00]
in_progress_intervals: []
```

When a backfill starts, we set `in_progress_intervals` to the range that
overlaps with the backfill. We also remove that range from
`unfilled_intervals`:

```
filled_intervals: []
unfilled_intervals: []
in_progress_intervals: [12:00–13:00]
```

After the backfill is successfully executed, we move the range to
`filled_intervals` and clear `in_progress_intervals`:

```
filled_intervals: [12:00–13:00]
unfilled_intervals: []
in_progress_intervals: []
```

However, if the backfill fails, we want to remove the range from
`in_progress_intervals` and move it back to `unfilled_intervals`. The
problem is that we cannot simply do this because there might be other
overlapping backfills still in progress for the same gap. In the case of
a successful execution, this isn’t an issue, as the range is moved to
`filled_intervals`.

When a backfill fails, we refetch all overlapping backfills for the gap
to recalculate the `in_progress_intervals`.

### Problem

In the current implementation, we're updating the gaps **before**
deleting the failed backfill. This causes the recalculated
`in_progress_intervals` to still include the failed backfill’s range,
resulting in a stale state.

### Fix

We should **first delete** the failed backfill, and **then** update the
gap. This ensures that the recalculated `in_progress_intervals` reflect
only the remaining active backfills.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit dfd783e)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jun 17, 2025
## Summary

[[Issue](https://github.com/elastic/kibana/issues/221111)](https://github.com/elastic/kibana/issues/221111)

Gaps can get stuck in the `in-progress` state if a rule is
backfill-executed with failures.

### Current behavior:

Let's say we have a gap from `12:00–13:00`.

When the gap is initially detected, it has the following state:

```
filled_intervals: []
unfilled_intervals: [12:00–13:00]
in_progress_intervals: []
```

When a backfill starts, we set `in_progress_intervals` to the range that
overlaps with the backfill. We also remove that range from
`unfilled_intervals`:

```
filled_intervals: []
unfilled_intervals: []
in_progress_intervals: [12:00–13:00]
```

After the backfill is successfully executed, we move the range to
`filled_intervals` and clear `in_progress_intervals`:

```
filled_intervals: [12:00–13:00]
unfilled_intervals: []
in_progress_intervals: []
```

However, if the backfill fails, we want to remove the range from
`in_progress_intervals` and move it back to `unfilled_intervals`. The
problem is that we cannot simply do this because there might be other
overlapping backfills still in progress for the same gap. In the case of
a successful execution, this isn’t an issue, as the range is moved to
`filled_intervals`.

When a backfill fails, we refetch all overlapping backfills for the gap
to recalculate the `in_progress_intervals`.

### Problem

In the current implementation, we're updating the gaps **before**
deleting the failed backfill. This causes the recalculated
`in_progress_intervals` to still include the failed backfill’s range,
resulting in a stale state.

### Fix

We should **first delete** the failed backfill, and **then** update the
gap. This ensures that the recalculated `in_progress_intervals` reflect
only the remaining active backfills.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit dfd783e)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jun 17, 2025
## Summary

[[Issue](https://github.com/elastic/kibana/issues/221111)](https://github.com/elastic/kibana/issues/221111)

Gaps can get stuck in the `in-progress` state if a rule is
backfill-executed with failures.

### Current behavior:

Let's say we have a gap from `12:00–13:00`.

When the gap is initially detected, it has the following state:

```
filled_intervals: []
unfilled_intervals: [12:00–13:00]
in_progress_intervals: []
```

When a backfill starts, we set `in_progress_intervals` to the range that
overlaps with the backfill. We also remove that range from
`unfilled_intervals`:

```
filled_intervals: []
unfilled_intervals: []
in_progress_intervals: [12:00–13:00]
```

After the backfill is successfully executed, we move the range to
`filled_intervals` and clear `in_progress_intervals`:

```
filled_intervals: [12:00–13:00]
unfilled_intervals: []
in_progress_intervals: []
```

However, if the backfill fails, we want to remove the range from
`in_progress_intervals` and move it back to `unfilled_intervals`. The
problem is that we cannot simply do this because there might be other
overlapping backfills still in progress for the same gap. In the case of
a successful execution, this isn’t an issue, as the range is moved to
`filled_intervals`.

When a backfill fails, we refetch all overlapping backfills for the gap
to recalculate the `in_progress_intervals`.

### Problem

In the current implementation, we're updating the gaps **before**
deleting the failed backfill. This causes the recalculated
`in_progress_intervals` to still include the failed backfill’s range,
resulting in a stale state.

### Fix

We should **first delete** the failed backfill, and **then** update the
gap. This ensures that the recalculated `in_progress_intervals` reflect
only the remaining active backfills.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit dfd783e)
nkhristinin added a commit that referenced this pull request Jun 17, 2025
# Backport

This will backport the following commits from `main` to `9.0`:
 - Fix that gap can be stuck "in-progress" (#221473) (dfd783e)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Khristinin
Nikita","email":"nikita.khristinin@elastic.co"},"sourceCommit":{"committedDate":"2025-06-17T06:47:01Z","message":"Fix
that gap can be stuck \"in-progress\" (#221473)\n\n##
Summary\n\n\n[[Issue](https://github.com/elastic/kibana/issues/221111)](https://github.com/elastic/kibana/issues/221111)\n\nGaps
can get stuck in the `in-progress` state if a rule is\nbackfill-executed
with failures.\n\n### Current behavior:\n\nLet's say we have a gap from
`12:00–13:00`.\n\nWhen the gap is initially detected, it has the
following state:\n\n```\nfilled_intervals: []\nunfilled_intervals:
[12:00–13:00]\nin_progress_intervals: []\n```\n\nWhen a backfill starts,
we set `in_progress_intervals` to the range that\noverlaps with the
backfill. We also remove that range
from\n`unfilled_intervals`:\n\n```\nfilled_intervals:
[]\nunfilled_intervals: []\nin_progress_intervals:
[12:00–13:00]\n```\n\nAfter the backfill is successfully executed, we
move the range to\n`filled_intervals` and clear
`in_progress_intervals`:\n\n```\nfilled_intervals:
[12:00–13:00]\nunfilled_intervals: []\nin_progress_intervals:
[]\n```\n\nHowever, if the backfill fails, we want to remove the range
from\n`in_progress_intervals` and move it back to `unfilled_intervals`.
The\nproblem is that we cannot simply do this because there might be
other\noverlapping backfills still in progress for the same gap. In the
case of\na successful execution, this isn’t an issue, as the range is
moved to\n`filled_intervals`.\n\nWhen a backfill fails, we refetch all
overlapping backfills for the gap\nto recalculate the
`in_progress_intervals`.\n\n### Problem\n\nIn the current
implementation, we're updating the gaps **before**\ndeleting the failed
backfill. This causes the recalculated\n`in_progress_intervals` to still
include the failed backfill’s range,\nresulting in a stale state.\n\n###
Fix\n\nWe should **first delete** the failed backfill, and **then**
update the\ngap. This ensures that the recalculated
`in_progress_intervals` reflect\nonly the remaining active
backfills.\n\n---------\n\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"dfd783e12a4046758be75c05bbe36bc105710296"},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[]}]
BACKPORT-->

Co-authored-by: Khristinin Nikita <nikita.khristinin@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
nkhristinin added a commit that referenced this pull request Jun 17, 2025
# Backport

This will backport the following commits from `main` to `8.18`:
 - Fix that gap can be stuck "in-progress" (#221473) (dfd783e)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Khristinin
Nikita","email":"nikita.khristinin@elastic.co"},"sourceCommit":{"committedDate":"2025-06-17T06:47:01Z","message":"Fix
that gap can be stuck \"in-progress\" (#221473)\n\n##
Summary\n\n\n[[Issue](https://github.com/elastic/kibana/issues/221111)](https://github.com/elastic/kibana/issues/221111)\n\nGaps
can get stuck in the `in-progress` state if a rule is\nbackfill-executed
with failures.\n\n### Current behavior:\n\nLet's say we have a gap from
`12:00–13:00`.\n\nWhen the gap is initially detected, it has the
following state:\n\n```\nfilled_intervals: []\nunfilled_intervals:
[12:00–13:00]\nin_progress_intervals: []\n```\n\nWhen a backfill starts,
we set `in_progress_intervals` to the range that\noverlaps with the
backfill. We also remove that range
from\n`unfilled_intervals`:\n\n```\nfilled_intervals:
[]\nunfilled_intervals: []\nin_progress_intervals:
[12:00–13:00]\n```\n\nAfter the backfill is successfully executed, we
move the range to\n`filled_intervals` and clear
`in_progress_intervals`:\n\n```\nfilled_intervals:
[12:00–13:00]\nunfilled_intervals: []\nin_progress_intervals:
[]\n```\n\nHowever, if the backfill fails, we want to remove the range
from\n`in_progress_intervals` and move it back to `unfilled_intervals`.
The\nproblem is that we cannot simply do this because there might be
other\noverlapping backfills still in progress for the same gap. In the
case of\na successful execution, this isn’t an issue, as the range is
moved to\n`filled_intervals`.\n\nWhen a backfill fails, we refetch all
overlapping backfills for the gap\nto recalculate the
`in_progress_intervals`.\n\n### Problem\n\nIn the current
implementation, we're updating the gaps **before**\ndeleting the failed
backfill. This causes the recalculated\n`in_progress_intervals` to still
include the failed backfill’s range,\nresulting in a stale state.\n\n###
Fix\n\nWe should **first delete** the failed backfill, and **then**
update the\ngap. This ensures that the recalculated
`in_progress_intervals` reflect\nonly the remaining active
backfills.\n\n---------\n\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"dfd783e12a4046758be75c05bbe36bc105710296"},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[]}]
BACKPORT-->

Co-authored-by: Khristinin Nikita <nikita.khristinin@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
nkhristinin added a commit that referenced this pull request Jun 17, 2025
# Backport

This will backport the following commits from `main` to `8.19`:
 - Fix that gap can be stuck "in-progress" (#221473) (dfd783e)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Khristinin
Nikita","email":"nikita.khristinin@elastic.co"},"sourceCommit":{"committedDate":"2025-06-17T06:47:01Z","message":"Fix
that gap can be stuck \"in-progress\" (#221473)\n\n##
Summary\n\n\n[[Issue](https://github.com/elastic/kibana/issues/221111)](https://github.com/elastic/kibana/issues/221111)\n\nGaps
can get stuck in the `in-progress` state if a rule is\nbackfill-executed
with failures.\n\n### Current behavior:\n\nLet's say we have a gap from
`12:00–13:00`.\n\nWhen the gap is initially detected, it has the
following state:\n\n```\nfilled_intervals: []\nunfilled_intervals:
[12:00–13:00]\nin_progress_intervals: []\n```\n\nWhen a backfill starts,
we set `in_progress_intervals` to the range that\noverlaps with the
backfill. We also remove that range
from\n`unfilled_intervals`:\n\n```\nfilled_intervals:
[]\nunfilled_intervals: []\nin_progress_intervals:
[12:00–13:00]\n```\n\nAfter the backfill is successfully executed, we
move the range to\n`filled_intervals` and clear
`in_progress_intervals`:\n\n```\nfilled_intervals:
[12:00–13:00]\nunfilled_intervals: []\nin_progress_intervals:
[]\n```\n\nHowever, if the backfill fails, we want to remove the range
from\n`in_progress_intervals` and move it back to `unfilled_intervals`.
The\nproblem is that we cannot simply do this because there might be
other\noverlapping backfills still in progress for the same gap. In the
case of\na successful execution, this isn’t an issue, as the range is
moved to\n`filled_intervals`.\n\nWhen a backfill fails, we refetch all
overlapping backfills for the gap\nto recalculate the
`in_progress_intervals`.\n\n### Problem\n\nIn the current
implementation, we're updating the gaps **before**\ndeleting the failed
backfill. This causes the recalculated\n`in_progress_intervals` to still
include the failed backfill’s range,\nresulting in a stale state.\n\n###
Fix\n\nWe should **first delete** the failed backfill, and **then**
update the\ngap. This ensures that the recalculated
`in_progress_intervals` reflect\nonly the remaining active
backfills.\n\n---------\n\nCo-authored-by: Elastic Machine
<elasticmachine@users.noreply.github.com>","sha":"dfd783e12a4046758be75c05bbe36bc105710296"},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[]}]
BACKPORT-->

Co-authored-by: Khristinin Nikita <nikita.khristinin@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes v8.18.3 v8.19.0 v9.0.3 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants