Skip to content

[Fleet] Add retries w/ backoff to Fleet setup on Kibana boot#167246

Merged
joshdover merged 10 commits intoelastic:mainfrom
joshdover:fleet-setup-retry
Sep 28, 2023
Merged

[Fleet] Add retries w/ backoff to Fleet setup on Kibana boot#167246
joshdover merged 10 commits intoelastic:mainfrom
joshdover:fleet-setup-retry

Conversation

@joshdover
Copy link
Contributor

@joshdover joshdover commented Sep 26, 2023

Summary

Closes #165971

This adds retry logic w/ backoff to Fleet's setup process that runs on Kibana boot. For now, this is behind a xpack.fleet.internal.retrySetupOnBoot feature flag that is only enabled on Serverless projects.

In ESS, we depend on the Integrations Server's boot process to implicitly perform the retry logic, since it depends on Fleet setup to be completed before it can successfully boot. This behavior does not exist on Serverless so we need built-in logic to perform this. The objects that are setup on serverless are:

  • Elasticsearch output
  • Fleet Server URL config
  • (in some Security projects) A preconfigured Agent policy for hosted CSPM.

This change does not block Kibana startup, so that Kibana can still be used while Fleet's setup process is retrying to completion. Fleet will report the current status of it's setup on the Kibana-wide /api/status API, including the number of retry attempts and the last error encountered.

This change does not include any explicit tests, but instead is relying on the existing test coverage of Serverless end-to-end tests, Fleet setup FTR tests, and the exponential-backoff library's unit tests.

Checklist

Delete any items that are not applicable to this PR.

@ghost
Copy link

ghost commented Sep 26, 2023

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@joshdover joshdover added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:Fleet Team label for Observability Data Collection Fleet team labels Sep 26, 2023
"email-addresses": "^5.0.0",
"execa": "^4.0.2",
"expiry-js": "0.1.7",
"exponential-backoff": "^3.1.1",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kibana already has this dependency as a transitive dep, thought it made sense to just use it again here.

@jlind23
Copy link
Contributor

jlind23 commented Sep 27, 2023

Changed the keyword in the description to Closes as stated here

@joshdover joshdover marked this pull request as ready for review September 27, 2023 14:35
@joshdover joshdover requested review from a team as code owners September 27, 2023 14:35
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kibana-ci
Copy link

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Defend Workflows Cypress Tests #4 / Artifact pages Trusted applications should update Endpoint Policy on Endpoint when adding Trusted application name should update Endpoint Policy on Endpoint when adding Trusted application name
  • [job] [logs] FTR Configs #67 / Observability Log Explorer DatasetSelection initialization and update when the "index" query param exists should fallback to the "All logs" selection and notify the user of an invalid encoded index
  • [job] [logs] FTR Configs #60 / serverless observability UI navigation navigate observability sidenav & breadcrumbs
  • [job] [logs] FTR Configs #15 / serverless security UI landing page has serverless side nav

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@joshdover joshdover merged commit a42d601 into elastic:main Sep 28, 2023
@joshdover joshdover deleted the fleet-setup-retry branch September 28, 2023 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.11.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Fleet] Fix flakiness on Fleet setup on Serverless

7 participants