Skip to content

Retry on "all shards failed" from ES#246533

Merged
gsoldevila merged 1 commit intoelastic:mainfrom
gsoldevila:retry-on-all-shards-failed
Dec 17, 2025
Merged

Retry on "all shards failed" from ES#246533
gsoldevila merged 1 commit intoelastic:mainfrom
gsoldevila:retry-on-all-shards-failed

Conversation

@gsoldevila
Copy link
Contributor

@gsoldevila gsoldevila commented Dec 16, 2025

Summary

We are observing Pod restarts on Serverless during Kibana bootstrap (migration) process, with the following error:

Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error:
{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]}

It seems that the Kibana Pods bootstrap normally after the restart, so we can assume that the all shards failed is a temporary condition of ES upon which we can retry.

@gsoldevila gsoldevila added the Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// label Dec 16, 2025
@gsoldevila gsoldevila requested a review from a team as a code owner December 16, 2025 12:07
@gsoldevila gsoldevila added release_note:skip Skip the PR/issue when compiling release notes backport:all-open Backport to all branches that could still receive a release labels Dec 16, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock progress


Did ES confirm that this is a good one to generally retry on? My only reservation is that we should only be retrying this during init/startup, but I don't think it's a massive risk to retry generally.

@gsoldevila gsoldevila merged commit d70da1f into elastic:main Dec 17, 2025
24 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.19, 9.1, 9.2

https://github.com/elastic/kibana/actions/runs/20305501725

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.19 Backport failed because of merge conflicts

You might need to backport the following PRs to 8.19:
- Expose isRetryableEsClientError (#228315)
9.1 Backport failed because of merge conflicts

You might need to backport the following PRs to 9.1:
- Expose isRetryableEsClientError (#228315)
9.2 Backport failed because of merge conflicts

You might need to backport the following PRs to 9.2:
- Expose isRetryableEsClientError (#228315)

Manual backport

To create the backport manually run:

node scripts/backport --pr 246533

Questions ?

Please refer to the Backport tool documentation

mbondyra added a commit to mbondyra/kibana that referenced this pull request Dec 17, 2025
…donly

* commit 'bb1f55fa520b30ceb923af069ef403b24dcb1606': (52 commits)
  [CPS][Maps] Support CPS Picker in Maps  (elastic#246382)
  [APM] Migrate the Transaction Overview tests to Scout/Playwright/Component/API tests (elastic#245972)
  [Cases] Change nested field search to be case insensitive (elastic#246643)
  [ES|QL] PromQL parser initial implementation (elastic#246552)
  [Agent Builder] Adds keyboard shortcut and toggle behavior to AI Agent button (elastic#246659)
  Retry on "all shards failed" from ES (elastic#246533)
  [Streams] Test enable wired streams flow (elastic#246113)
  [Agent Builder] Fast-follow bugfixes for MCP Tool type  (elastic#246665)
  [Entity Store][API] Fix snake case on CRUD API List response (elastic#246003)
  [ResponseOps][Slack] Simplify channel configuration  (elastic#245423)
  Add Canonical Name Badge to Documentation (elastic#246647)
  [Streams] Add simulation filtering by conditions (elastic#245400)
  [o11y AI] Add `get_hosts` tool (elastic#246541)
  [agent builder] create_visualization: support heatmap and regionmap (elastic#246671)
  [AI Infra] Chat experience: Selection modal title change (elastic#246683)
  [Background search] Change polling behavior (elastic#244760)
  [ES|QL  ]  Common Lookup Join Fields Are Not Listed First (elastic#246582)
  Add missing `dynamic: false` (elastic#246685)
  [Metrics in Discover] Unskip metrics api test (elastic#246593)
  [ES|QL] Show next actions after simple field assignment in RERANK ON Clause (elastic#246676)
  ...
KodeRad pushed a commit to KodeRad/kibana that referenced this pull request Dec 17, 2025
## Summary

We are observing Pod restarts on Serverless during Kibana bootstrap
(migration) process, with the following error:

```
Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error:
{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]}
```

It seems that the Kibana Pods bootstrap normally after the restart, so
we can assume that the `all shards failed` is a temporary condition of
ES upon which we can retry.
@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Dec 18, 2025
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 246533 locally
cc: @gsoldevila

1 similar comment
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 246533 locally
cc: @gsoldevila

gsoldevila added a commit that referenced this pull request Dec 22, 2025
## Summary

Follow-up of #246533

We don't need to be that nitpicky about the reason. The
`search_phase_execution_error` is likely a temporary situation and
retrying will be a more efficient strategy than Pod restart + retrying.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Dec 22, 2025
## Summary

Follow-up of elastic#246533

We don't need to be that nitpicky about the reason. The
`search_phase_execution_error` is likely a temporary situation and
retrying will be a more efficient strategy than Pod restart + retrying.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit 9028f92)
@kibanamachine
Copy link
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 246533 locally
cc: @gsoldevila

gsoldevila added a commit to gsoldevila/kibana that referenced this pull request Dec 23, 2025
## Summary

We are observing Pod restarts on Serverless during Kibana bootstrap
(migration) process, with the following error:

```
Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error:
{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]}
```

It seems that the Kibana Pods bootstrap normally after the restart, so
we can assume that the `all shards failed` is a temporary condition of
ES upon which we can retry.
gsoldevila added a commit to gsoldevila/kibana that referenced this pull request Dec 23, 2025
## Summary

Follow-up of elastic#246533

We don't need to be that nitpicky about the reason. The
`search_phase_execution_error` is likely a temporary situation and
retrying will be a more efficient strategy than Pod restart + retrying.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
@gsoldevila gsoldevila added backport:skip This PR does not require backporting and removed backport missing Added to PRs automatically when the are determined to be missing a backport. backport:all-open Backport to all branches that could still receive a release labels Dec 23, 2025
gsoldevila added a commit to gsoldevila/kibana that referenced this pull request Dec 23, 2025
## Summary

We are observing Pod restarts on Serverless during Kibana bootstrap
(migration) process, with the following error:

```
Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error:
{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]}
```

It seems that the Kibana Pods bootstrap normally after the restart, so
we can assume that the `all shards failed` is a temporary condition of
ES upon which we can retry.
gsoldevila added a commit to gsoldevila/kibana that referenced this pull request Dec 23, 2025
## Summary

Follow-up of elastic#246533

We don't need to be that nitpicky about the reason. The
`search_phase_execution_error` is likely a temporary situation and
retrying will be a more efficient strategy than Pod restart + retrying.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
gsoldevila added a commit to gsoldevila/kibana that referenced this pull request Dec 23, 2025
We are observing Pod restarts on Serverless during Kibana bootstrap
(migration) process, with the following error:

```
Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error:
{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]}
```

It seems that the Kibana Pods bootstrap normally after the restart, so
we can assume that the `all shards failed` is a temporary condition of
ES upon which we can retry.
gsoldevila added a commit to gsoldevila/kibana that referenced this pull request Dec 23, 2025
Follow-up of elastic#246533

We don't need to be that nitpicky about the reason. The
`search_phase_execution_error` is likely a temporary situation and
retrying will be a more efficient strategy than Pod restart + retrying.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Jan 6, 2026
## Summary

Follow-up of elastic#246533

We don't need to be that nitpicky about the reason. The
`search_phase_execution_error` is likely a temporary situation and
retrying will be a more efficient strategy than Pod restart + retrying.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
dej611 pushed a commit to dej611/kibana that referenced this pull request Jan 8, 2026
## Summary

Follow-up of elastic#246533

We don't need to be that nitpicky about the reason. The
`search_phase_execution_error` is likely a temporary situation and
retrying will be a more efficient strategy than Pod restart + retrying.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants