Skip to content

Fix ES health check poller retries#248496

Merged
jloleysens merged 5 commits intoelastic:mainfrom
jloleysens:fix/es-health-poller
Jan 10, 2026
Merged

Fix ES health check poller retries#248496
jloleysens merged 5 commits intoelastic:mainfrom
jloleysens:fix/es-health-poller

Conversation

@jloleysens
Copy link
Contributor

@jloleysens jloleysens commented Jan 9, 2026

Summary

Ensure that our retry logic calls the network request again the configured number of retries.

Description

After introducing the retries to pollEsNodesVersion a failed API request to the nodes info API would not actually retry the network request. Each "retry" would just be a no-op delayed by the healthCheckInterval. So e.g. if a request times out we would wait healthCheckInterval*healthCheckRetry ms and emit the failure. The nodes info API call would only be called again on the next healthCheckInterval.

This PR changes the behavior to actually call the API during retries. Assuming the API calls continuously fails, this would create healthCheckRetry API requests and only if all of these fail it would emit the failure.

@jloleysens jloleysens requested a review from a team as a code owner January 9, 2026 16:47
@jloleysens jloleysens added Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// release_note:skip Skip the PR/issue when compiling release notes backport:all-open Backport to all branches that could still receive a release labels Jan 9, 2026
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

Copy link
Contributor Author

@jloleysens jloleysens Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added expected info API call counts

@jloleysens jloleysens enabled auto-merge (squash) January 9, 2026 16:51
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

@jloleysens jloleysens changed the title Fix ES health check poller Fix ES health check poller retries Jan 10, 2026
},
complete: done,
complete: () => {
expect(internalClient.nodes.info).toHaveBeenCalledTimes(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: just thinking maybe we can increase health retries here to something like 1000 to really make it clear that we only expect 1 network request for successful calls

expect(result.isCompatible).toBeDefined();
},
complete: () => {
done();
Copy link
Contributor Author

@jloleysens jloleysens Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an expect against internalClient.info.nodes here for the expected number of calls?

nodeInfosSuccessOnce(createNodes('5.1.1', '5.1.2', '5.1.3')); // emit
nodeInfosSuccessOnce(createNodes('5.1.1', '5.1.2', '5.1.3')); // ignore
nodeInfosSuccessOnce(createNodes('5.0.0', '5.1.0', '5.2.0')); // emit, different from previous version
nodeInfosSuccessOnce(createNodes('5.1.0', '5.1.0', '5.1.0')); // emit, no warning nodes, used to detect end of test
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

.map((key) => nodesInfoResponse.nodes[key])
.map((node) => Object.assign({}, node, { name: getHumanizedNodeName(node) }));
.map((node) => Object.assign({}, node, { name: getHumanizedNodeName(node) }))
.sort(sortNodes); // Sorting ensures stable ordering for comparison
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@jloleysens jloleysens merged commit 33fa57c into elastic:main Jan 10, 2026
13 checks passed
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.19, 9.1, 9.2, 9.3

https://github.com/elastic/kibana/actions/runs/20878265645

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jan 10, 2026
## Summary

Ensure that our `retry` logic calls the network request again the
configured number of retries.

(cherry picked from commit 33fa57c)
@kibanamachine
Copy link
Contributor

💔 Some backports could not be created

Status Branch Result
8.19 Backport failed because of merge conflicts
9.1 Backport failed because of merge conflicts
9.2 Backport failed because of merge conflicts
9.3

Note: Successful backport PRs will be merged automatically after passing CI.

Manual backport

To create the backport manually run:

node scripts/backport --pr 248496

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Jan 10, 2026
# Backport

This will backport the following commits from `main` to `9.3`:
- [Fix ES health check poller
(#248496)](#248496)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Jean-Louis
Leysens","email":"jeanlouis.leysens@elastic.co"},"sourceCommit":{"committedDate":"2026-01-10T12:24:38Z","message":"Fix
ES health check poller (#248496)\n\n## Summary\n\nEnsure that our
`retry` logic calls the network request again the\nconfigured number of
retries.","sha":"33fa57c139429d8231c5f2d725379e729948d195","branchLabelMapping":{"^v9.4.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Core","release_note:skip","backport:all-open","v9.4.0"],"title":"Fix
ES health check poller
retries","number":248496,"url":"https://github.com/elastic/kibana/pull/248496","mergeCommit":{"message":"Fix
ES health check poller (#248496)\n\n## Summary\n\nEnsure that our
`retry` logic calls the network request again the\nconfigured number of
retries.","sha":"33fa57c139429d8231c5f2d725379e729948d195"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.4.0","branchLabelMappingKey":"^v9.4.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/248496","number":248496,"mergeCommit":{"message":"Fix
ES health check poller (#248496)\n\n## Summary\n\nEnsure that our
`retry` logic calls the network request again the\nconfigured number of
retries.","sha":"33fa57c139429d8231c5f2d725379e729948d195"}}]}]
BACKPORT-->

Co-authored-by: Jean-Louis Leysens <jeanlouis.leysens@elastic.co>
devamanv pushed a commit to devamanv/kibana that referenced this pull request Jan 12, 2026
## Summary

Ensure that our `retry` logic calls the network request again the
configured number of retries.
mbondyra added a commit to mbondyra/kibana that referenced this pull request Jan 12, 2026
* commit 'c4304e27736c62f17af20d145770b2ae9d3fae30': (418 commits)
  skip failing suite (elastic#89079)
  [ES|QL] Update grammars (elastic#248600)
  skip failing test suite (elastic#248579)
  [ES|QL] Update function metadata (elastic#248601)
  skip failing test suite (elastic#248554)
  Fix flaky test runner serverless flag for Search solution (elastic#248559)
  [Security Solution][Attacks/Alerts][Attacks page][Table section] Remember last selected attack details tab (Summary or Alerts) (elastic#247519) (elastic#247988)
  Fix ES health check poller (elastic#248496)
  Fix collector schema ownership (elastic#241292)
  [api-docs] 2026-01-10 Daily api_docs build (elastic#248574)
  Update dependency cssstyle to v5.3.5 (main) (elastic#237637)
  Update dependency @octokit/rest to v22.0.1 (main) (elastic#243102)
  skip failing test suite (elastic#248504)
  skip failing test suite (elastic#247685)
  Remove broken ecommerce_dashboard journeys (elastic#248162)
  [Obs AI] Hide AI Insight component when there are no connectors (elastic#248542)
  skip failing suite (elastic#248433)
  [Security Solution][Attacks/Alerts][Attacks page][Table section] Hide tabs for generic attack groups (elastic#248444)
  [Agent Builder] [AI Infra] Adds product documentation tool and task evals (elastic#248370)
  [Controls Anywhere] Keep controls focused when creating + editing other panels (elastic#248021)
  ...
@jloleysens jloleysens added backport:version Backport to applied version labels and removed backport:all-open Backport to all branches that could still receive a release labels Jan 12, 2026
@jloleysens jloleysens deleted the fix/es-health-poller branch January 12, 2026 10:49
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.19, 9.1, 9.2, 9.3

https://github.com/elastic/kibana/actions/runs/20916590759

@kibanamachine
Copy link
Contributor

💔 All backports failed

Status Branch Result
8.19 Backport failed because of merge conflicts
9.1 Backport failed because of merge conflicts
9.2 Backport failed because of merge conflicts
9.3 Cherrypick failed because the selected commit (33fa57c) is empty. It looks like the commit was already backported in #248577

Manual backport

To create the backport manually run:

node scripts/backport --pr 248496

Questions ?

Please refer to the Backport tool documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// v9.3.0 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants