Retry on "all shards failed" from ES#246533
Conversation
|
Pinging @elastic/kibana-core (Team:Core) |
jloleysens
left a comment
There was a problem hiding this comment.
Approving to unblock progress
Did ES confirm that this is a good one to generally retry on? My only reservation is that we should only be retrying this during init/startup, but I don't think it's a massive risk to retry generally.
|
Starting backport for target branches: 8.19, 9.1, 9.2 https://github.com/elastic/kibana/actions/runs/20305501725 |
💔 All backports failed
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
…donly * commit 'bb1f55fa520b30ceb923af069ef403b24dcb1606': (52 commits) [CPS][Maps] Support CPS Picker in Maps (elastic#246382) [APM] Migrate the Transaction Overview tests to Scout/Playwright/Component/API tests (elastic#245972) [Cases] Change nested field search to be case insensitive (elastic#246643) [ES|QL] PromQL parser initial implementation (elastic#246552) [Agent Builder] Adds keyboard shortcut and toggle behavior to AI Agent button (elastic#246659) Retry on "all shards failed" from ES (elastic#246533) [Streams] Test enable wired streams flow (elastic#246113) [Agent Builder] Fast-follow bugfixes for MCP Tool type (elastic#246665) [Entity Store][API] Fix snake case on CRUD API List response (elastic#246003) [ResponseOps][Slack] Simplify channel configuration (elastic#245423) Add Canonical Name Badge to Documentation (elastic#246647) [Streams] Add simulation filtering by conditions (elastic#245400) [o11y AI] Add `get_hosts` tool (elastic#246541) [agent builder] create_visualization: support heatmap and regionmap (elastic#246671) [AI Infra] Chat experience: Selection modal title change (elastic#246683) [Background search] Change polling behavior (elastic#244760) [ES|QL ] Common Lookup Join Fields Are Not Listed First (elastic#246582) Add missing `dynamic: false` (elastic#246685) [Metrics in Discover] Unskip metrics api test (elastic#246593) [ES|QL] Show next actions after simple field assignment in RERANK ON Clause (elastic#246676) ...
## Summary
We are observing Pod restarts on Serverless during Kibana bootstrap
(migration) process, with the following error:
```
Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error:
{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]}
```
It seems that the Kibana Pods bootstrap normally after the restart, so
we can assume that the `all shards failed` is a temporary condition of
ES upon which we can retry.
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
1 similar comment
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
## Summary Follow-up of #246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 9028f92)
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |
## Summary
We are observing Pod restarts on Serverless during Kibana bootstrap
(migration) process, with the following error:
```
Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error:
{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]}
```
It seems that the Kibana Pods bootstrap normally after the restart, so
we can assume that the `all shards failed` is a temporary condition of
ES upon which we can retry.
## Summary Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary
We are observing Pod restarts on Serverless during Kibana bootstrap
(migration) process, with the following error:
```
Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error:
{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]}
```
It seems that the Kibana Pods bootstrap normally after the restart, so
we can assume that the `all shards failed` is a temporary condition of
ES upon which we can retry.
## Summary Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
We are observing Pod restarts on Serverless during Kibana bootstrap
(migration) process, with the following error:
```
Reason: Unable to complete saved object migrations for the [.kibana] index. Error: pickupUpdatedMappings task failed with the following error:
{"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":".kibana_1","node":"iu4ANm6gTQO8eQyvnWGdkw","reason":{"type":"no_shard_available_action_exception","reason":"[es-es-search-6b8cf96d57-rvnz4][100.66.190.214:9300][indices:data/read/search[phase/query]]"}}]}
```
It seems that the Kibana Pods bootstrap normally after the restart, so
we can assume that the `all shards failed` is a temporary condition of
ES upon which we can retry.
Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Follow-up of elastic#246533 We don't need to be that nitpicky about the reason. The `search_phase_execution_error` is likely a temporary situation and retrying will be a more efficient strategy than Pod restart + retrying. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Summary
We are observing Pod restarts on Serverless during Kibana bootstrap (migration) process, with the following error:
It seems that the Kibana Pods bootstrap normally after the restart, so we can assume that the
all shards failedis a temporary condition of ES upon which we can retry.