[Fleet] Use LockManagerService for fleet setup lock#219113
[Fleet] Use LockManagerService for fleet setup lock#219113nchaulet merged 8 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/fleet (Team:Fleet) |
|
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]
History
cc @nchaulet |
@nchaulet Great to hear you find this useful! WDYT about extracting the LockManager out to a separate package? That way we avoid fleet depending on the Assistant plugin. |
|
Can we test on a cloud instance with multiple kibana instances setting up at the same time? |
I tested multiple local instances, tried to kill them during setup, and it seems it's working well. We also have some integration test testing setup in HA setup here https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/fleet/server/integration_tests/ha_setup.test.ts#L94 |
|
Depends on before merging #219220 |
| export async function _runSetupWithLock(setupFn: () => Promise<SetupStatus>) { | ||
| return await pRetry( | ||
| () => appContextService.getLockManagerService()!.withLock('fleet-setup', () => setupFn()), | ||
| { | ||
| onFailedAttempt: async (error) => { | ||
| if (!(error instanceof LockAcquisitionError)) { | ||
| throw error; | ||
| } | ||
| }, | ||
| maxRetryTime: 5 * 60 * 1000, // Retry for 5 minute to get the lock | ||
| } | ||
| ); | ||
| } |
There was a problem hiding this comment.
I think there is a bug in the retry logic here. From the pRetry documentation:
If the
onFailedAttemptfunction throws, all retries will be aborted and the original promise will reject with the thrown error.
Right now all errors but LockAcquisitionError are thrown, meaning they are aborted without retries. Only LockAcquisitionError's are retried.
Is that the intention?
There was a problem hiding this comment.
Yes that the intention, we already have some retry logic in the Fleet setup.
There was a problem hiding this comment.
All right. Why would you want to retry setupFn, if it's already running? Then it would run multiple times - although sequentially, not in parallel.
There was a problem hiding this comment.
All right. Why would you want to retry setupFn, if it's already running? Then
We want to be sure the setup ran and finish at least once If it's running on a different instance Kibana, but that instance is killed for any reason, this way we will retry, if the setup was already successful that operation should be relatively quick.
## Summary Backport #219113 We are hitting some retries issue during setup, and relying on the lock manager service as we do in 9+ version seems to be more reliable
Summary
Resolve #216025
The
fleetSetupCompletedis used by other plugin to know when the fleet setup as run and when they can use fleet to install packages, because an issue with the way we implemented our lock that method could return early if another instance of Kibana is performing or tried to do a setup.That PR:
LockManagerServicethat has a better way to mange ttl and refresh itself.@sorenlouv Do you see any issue we start using the
LockManagerServicein Fleet?, it works well for our usage to replace a really incomplete lock we had.