-
Notifications
You must be signed in to change notification settings - Fork 8.6k
Expose isRetryableEsClientError #228315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose isRetryableEsClientError #228315
Changes from all commits
b3e78b3
aa48382
6ebe2f2
0eaa367
bbfc753
dcbdfcf
a14f6c7
3da2d71
6165e72
565d8f3
8b2f8cb
6712b71
262caba
6bb974d
e4c9562
c6f210b
719ad16
5ce4e4e
3b26b43
bdfaa2c
c124d3d
d0829f4
91b2eb9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,8 +8,8 @@ | |
| */ | ||
|
|
||
| import { defer, map, retry, timer, firstValueFrom, throwError } from 'rxjs'; | ||
| import { isRetryableEsClientError } from '@kbn/core-elasticsearch-server-utils'; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. New behavior here is that we won't retry for |
||
| import type { ElasticsearchClient } from '@kbn/core-elasticsearch-server'; | ||
| import { isRetryableEsClientError } from './retryable_es_client_errors'; | ||
|
|
||
| const scriptAllowedTypesKey = 'script.allowed_types'; | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # @kbn/core-elasticsearch-server-utils | ||
|
|
||
| Utilities for working with Elasticsearch |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| module.exports = { | ||
| preset: '@kbn/test/jest_node', | ||
| rootDir: '../../../../..', | ||
| roots: ['<rootDir>/src/core/packages/elasticsearch/server-utils'], | ||
| }; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| { | ||
| "type": "shared-common", | ||
| "id": "@kbn/core-elasticsearch-server-utils", | ||
| "owner": "@elastic/kibana-core", | ||
| "group": "platform", | ||
| "visibility": "shared" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| { | ||
| "name": "@kbn/core-elasticsearch-server-utils", | ||
| "private": true, | ||
| "version": "1.0.0", | ||
| "license": "Elastic License 2.0 OR AGPL-3.0-only OR SSPL-1.0" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,9 +9,7 @@ | |
|
|
||
| import { errors as EsErrors } from '@elastic/elasticsearch'; | ||
|
|
||
| const retryResponseStatuses = [ | ||
| 401, // AuthorizationException | ||
| 403, // AuthenticationException | ||
| const DEFAULT_RETRY_STATUS_CODES = [ | ||
| 408, // RequestTimeout | ||
| 410, // Gone | ||
| 429, // TooManyRequests -> ES circuit breaker | ||
|
|
@@ -21,20 +19,32 @@ const retryResponseStatuses = [ | |
|
|
||
| /** | ||
| * Returns true if the given elasticsearch error should be retried | ||
| * by retry-based resiliency systems such as the SO migration, false otherwise. | ||
| * | ||
| * Retryable errors include: | ||
| * - NoLivingConnectionsError | ||
| * - ConnectionError | ||
| * - TimeoutError | ||
| * - ResponseError with status codes: | ||
| * - 408 RequestTimeout | ||
| * - 410 Gone | ||
| * - 429 TooManyRequests (ES circuit breaker) | ||
| * - 503 ServiceUnavailable | ||
| * - 504 GatewayTimeout | ||
| * - OR custom status codes if provided | ||
| * @param e The error to check | ||
| * @param customRetryStatusCodes Custom response status codes to consider as retryable | ||
| * @returns true if the error is retryable, false otherwise | ||
| */ | ||
| export const isRetryableEsClientError = (e: EsErrors.ElasticsearchClientError): boolean => { | ||
| export const isRetryableEsClientError = ( | ||
| e: EsErrors.ElasticsearchClientError, | ||
| customRetryStatusCodes?: number[] | ||
| ): boolean => { | ||
| if ( | ||
| e instanceof EsErrors.NoLivingConnectionsError || | ||
| e instanceof EsErrors.ConnectionError || | ||
| e instanceof EsErrors.TimeoutError || | ||
| (e instanceof EsErrors.ResponseError && | ||
| (retryResponseStatuses.includes(e?.statusCode!) || | ||
| // ES returns a 400 Bad Request when trying to close or delete an | ||
| // index while snapshots are in progress. This should have been a 503 | ||
| // so once https://github.com/elastic/elasticsearch/issues/65883 is | ||
| // fixed we can remove this. | ||
| e?.body?.error?.type === 'snapshot_in_progress_exception')) | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. snapshot_in_progress_exception no longer blocks other API calls |
||
| (customRetryStatusCodes ?? DEFAULT_RETRY_STATUS_CODES).includes(e?.statusCode!)) | ||
| ) { | ||
| return true; | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| { | ||
| "extends": "../../../../../tsconfig.base.json", | ||
| "compilerOptions": { | ||
| "outDir": "target/types", | ||
| "types": [ | ||
| "jest", | ||
| "node" | ||
| ] | ||
| }, | ||
| "include": [ | ||
| "**/*.ts" | ||
| ], | ||
| "kbn_references": [ | ||
| "@kbn/core-elasticsearch-client-server-mocks" | ||
| ], | ||
| "exclude": [ | ||
| "target/**/*", | ||
| ] | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,18 +9,30 @@ | |
|
|
||
| import * as Either from 'fp-ts/Either'; | ||
| import type { errors as EsErrors } from '@elastic/elasticsearch'; | ||
| import { isRetryableEsClientError } from '@kbn/core-elasticsearch-server-internal'; | ||
| import { isRetryableEsClientError } from '@kbn/core-elasticsearch-server-utils'; | ||
|
|
||
| export interface RetryableEsClientError { | ||
| type: 'retryable_es_client_error'; | ||
| message: string; | ||
| error?: Error; | ||
| } | ||
|
|
||
| // Migrations also retry on Auth exceptions as this is a common failure for newly created | ||
| // clusters that might have misconfigured credentials. | ||
| const retryResponseStatuses = [ | ||
| 401, // AuthorizationException | ||
| 403, // AuthenticationException | ||
|
Comment on lines
+23
to
+24
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The shared function doesn't retry on auth errors since usually it's not a transient error.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we should populate the ones below by spreading the default const
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or in other words, is there a scenario where it does not make sense to retry on any of the values below? I'm asking cause if we add a new status code on our DEFAULT list in the future, we might forget to also add it in the "custom" lists.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another option would be to use an incremental approach: this way users can automatically benefit from updates in the DEFAULT_RETRY_STATUS_CODES list, and they can add their custom exceptions. |
||
| 408, // RequestTimeout | ||
| 410, // Gone | ||
| 429, // TooManyRequests -> ES circuit breaker | ||
| 503, // ServiceUnavailable | ||
| 504, // GatewayTimeout | ||
| ]; | ||
|
|
||
| export const catchRetryableEsClientErrors = ( | ||
| e: EsErrors.ElasticsearchClientError | ||
| ): Either.Either<RetryableEsClientError, never> => { | ||
| if (isRetryableEsClientError(e)) { | ||
| if (isRetryableEsClientError(e, retryResponseStatuses)) { | ||
| return Either.left({ | ||
| type: 'retryable_es_client_error' as const, | ||
| message: e?.message, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to introduce a new package to avoid circular dependencies:
server → client-server-mocks → client-server-internal → server