Retry on "all shards failed" from ES by gsoldevila · Pull Request #246533 · elastic/kibana

gsoldevila · 2025-12-16T12:07:32Z

Summary

We are observing Pod restarts on Serverless during Kibana bootstrap (migration) process, with the following error:

Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error:
{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]}

It seems that the Kibana Pods bootstrap normally after the restart, so we can assume that the all shards failed is a temporary condition of ES upon which we can retry.

elasticmachine · 2025-12-16T12:07:36Z

Pinging @elastic/kibana-core (Team:Core)

jloleysens

Approving to unblock progress

Did ES confirm that this is a good one to generally retry on? My only reservation is that we should only be retrying this during init/startup, but I don't think it's a massive risk to retry generally.

kibanamachine · 2025-12-17T14:05:13Z

Starting backport for target branches: 8.19, 9.1, 9.2

https://github.com/elastic/kibana/actions/runs/20305501725

kibanamachine · 2025-12-17T14:13:07Z

💔 All backports failed

Status	Branch	Result
❌	8.19	Backport failed because of merge conflicts You might need to backport the following PRs to 8.19: - Expose isRetryableEsClientError (#228315)
❌	9.1	Backport failed because of merge conflicts You might need to backport the following PRs to 9.1: - Expose isRetryableEsClientError (#228315)
❌	9.2	Backport failed because of merge conflicts You might need to backport the following PRs to 9.2: - Expose isRetryableEsClientError (#228315)

Manual backport

To create the backport manually run:

node scripts/backport --pr 246533

Questions ?

Please refer to the Backport tool documentation

…donly * commit 'bb1f55fa520b30ceb923af069ef403b24dcb1606': (52 commits) [CPS][Maps] Support CPS Picker in Maps (elastic#246382) [APM] Migrate the Transaction Overview tests to Scout/Playwright/Component/API tests (elastic#245972) [Cases] Change nested field search to be case insensitive (elastic#246643) [ES|QL] PromQL parser initial implementation (elastic#246552) [Agent Builder] Adds keyboard shortcut and toggle behavior to AI Agent button (elastic#246659) Retry on "all shards failed" from ES (elastic#246533) [Streams] Test enable wired streams flow (elastic#246113) [Agent Builder] Fast-follow bugfixes for MCP Tool type (elastic#246665) [Entity Store][API] Fix snake case on CRUD API List response (elastic#246003) [ResponseOps][Slack] Simplify channel configuration (elastic#245423) Add Canonical Name Badge to Documentation (elastic#246647) [Streams] Add simulation filtering by conditions (elastic#245400) [o11y AI] Add `get_hosts` tool (elastic#246541) [agent builder] create_visualization: support heatmap and regionmap (elastic#246671) [AI Infra] Chat experience: Selection modal title change (elastic#246683) [Background search] Change polling behavior (elastic#244760) [ES|QL ] Common Lookup Join Fields Are Not Listed First (elastic#246582) Add missing `dynamic: false` (elastic#246685) [Metrics in Discover] Unskip metrics api test (elastic#246593) [ES|QL] Show next actions after simple field assignment in RERANK ON Clause (elastic#246676) ...

## Summary We are observing Pod restarts on Serverless during Kibana bootstrap (migration) process, with the following error: ``` Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error: {"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]} ``` It seems that the Kibana Pods bootstrap normally after the restart, so we can assume that the `all shards failed` is a temporary condition of ES upon which we can retry.

kibanamachine · 2025-12-18T14:49:09Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 246533 locally
cc: @gsoldevila

kibanamachine · 2025-12-19T15:49:24Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 246533 locally
cc: @gsoldevila

## Summary Follow-up of #246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

## Summary Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 9028f92)

kibanamachine · 2025-12-22T16:49:10Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 246533 locally
cc: @gsoldevila

## Summary We are observing Pod restarts on Serverless during Kibana bootstrap (migration) process, with the following error: ``` Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error: {"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]} ``` It seems that the Kibana Pods bootstrap normally after the restart, so we can assume that the `all shards failed` is a temporary condition of ES upon which we can retry.

## Summary Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

## Summary We are observing Pod restarts on Serverless during Kibana bootstrap (migration) process, with the following error: ``` Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error: {"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]} ``` It seems that the Kibana Pods bootstrap normally after the restart, so we can assume that the `all shards failed` is a temporary condition of ES upon which we can retry.

## Summary Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

We are observing Pod restarts on Serverless during Kibana bootstrap (migration) process, with the following error: ``` Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error: {"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]} ``` It seems that the Kibana Pods bootstrap normally after the restart, so we can assume that the `all shards failed` is a temporary condition of ES upon which we can retry.

Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

## Summary Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

Retry on "all shards failed" from ES

3089fd2

gsoldevila added the Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// label Dec 16, 2025

gsoldevila requested a review from a team as a code owner December 16, 2025 12:07

gsoldevila added release_note:skip Skip the PR/issue when compiling release notes backport:all-open Backport to all branches that could still receive a release labels Dec 16, 2025

jloleysens approved these changes Dec 16, 2025

View reviewed changes

gsoldevila merged commit d70da1f into elastic:main Dec 17, 2025
24 checks passed

kibanamachine added the v9.3.0 label Dec 17, 2025

gsoldevila mentioned this pull request Dec 18, 2025

Ignore the reason and retry systematically #246830

Merged

kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Dec 18, 2025

gsoldevila added backport:skip This PR does not require backporting and removed backport missing Added to PRs automatically when the are determined to be missing a backport. backport:all-open Backport to all branches that could still receive a release labels Dec 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry on "all shards failed" from ES#246533

Retry on "all shards failed" from ES#246533
gsoldevila merged 1 commit intoelastic:mainfrom
gsoldevila:retry-on-all-shards-failed

gsoldevila commented Dec 16, 2025 •

edited by kibanamachine

Loading

Uh oh!

elasticmachine commented Dec 16, 2025

Uh oh!

jloleysens left a comment

Uh oh!

Uh oh!

kibanamachine commented Dec 17, 2025

Uh oh!

kibanamachine commented Dec 17, 2025

Uh oh!

kibanamachine commented Dec 18, 2025

Uh oh!

kibanamachine commented Dec 19, 2025

Uh oh!

kibanamachine commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gsoldevila commented Dec 16, 2025 • edited by kibanamachine Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

elasticmachine commented Dec 16, 2025

Uh oh!

jloleysens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kibanamachine commented Dec 17, 2025

Uh oh!

kibanamachine commented Dec 17, 2025

💔 All backports failed

Manual backport

Questions ?

Uh oh!

kibanamachine commented Dec 18, 2025

Uh oh!

kibanamachine commented Dec 19, 2025

Uh oh!

kibanamachine commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gsoldevila commented Dec 16, 2025 •

edited by kibanamachine

Loading