[Fleet] Optimize package installation performance, phase 1#130906
[Fleet] Optimize package installation performance, phase 1#130906joshdover merged 7 commits intoelastic:mainfrom
Conversation
There was a problem hiding this comment.
AFAICT this isn't necessary. Creating an index template without the aliases field at all seems to have the same behavior in my manual testing and this is already what the main create index template logic below does.
@nchaulet It looks like you wrote this originally, do you know if there's anything I'm missing here?
There was a problem hiding this comment.
Here's the original issue: #90984
It appears it was from an upgrade of the endpoint package that I can no longer reproduce, even using the old versions mentioned in the issue 0.16.2 -> 0.17.0
There was a problem hiding this comment.
I believe it could be reproduced if you upgraded from 7.10.0 to main, but this is not strictly supported since Fleet was not GA until 7.14.0. I think this is a safe thing to remove.
35c863f to
ca44aeb
Compare
|
Pinging @elastic/fleet (Team:Fleet) |
|
great work, should we run the script that install all packages to verify nothing breaks? |
Good idea. I'm also going to move the APM instrumentation into a separate PR so if this needs to get reverted for any reason we still have the instrumentation in. This also makes it easier to create comparison traces before/after the optimizations. |
dc11c73 to
ea84f45
Compare
|
There's some work that will need to be done to refactor the ES asset reference passing to be able to remove these refreshes on the SO index. A lot of logic is depending on updating this object and then querying it back which needs to be refactored to simply keeping a running list in memory that gets updated as we go without querying it back to avoid the needs for refreshes. |
10c4fa6 to
151acb6
Compare
There was a problem hiding this comment.
We weren't previously keeping track of these references so these weren't getting deleted on uninstall. Fixed in this PR.
There was a problem hiding this comment.
This is the error message that will be returned by ES if the custom component template already exists. Unfortunately there's not a better machine readable field name.
There was a problem hiding this comment.
The order changed here, but it doesn't really matter.
There was a problem hiding this comment.
Extracted this from _install_package to make mocking simpler
There was a problem hiding this comment.
Note this retry logic will retry failures other than conflict errors. I think that's probably ok, but happy to fix if we think it's a bad idea.
There was a problem hiding this comment.
is there a chance 20 retry takes very long in case of trying to retry a persistent error?
151acb6 to
d22c8dd
Compare
|
I've been able to shave off another ~3-4 seconds with two additional optimizations but I'm going to move those to a separate PR so we can focus on getting this one in first. |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @joshdover |
juliaElastic
left a comment
There was a problem hiding this comment.
LGTM, I haven't tested locally
Summary
Phase 1 of #110500
Depends on the following to be merged first:
In conjunction with elastic/elasticsearch#86017, this PR improves package installation time significantly. I tested this with the package install script that installs all packages, without any failures.
refresh: falsefor several of the Saved Object operations that we don't need to wait for@customtemplates to use thecreate=truequery parameter rather than a check and set (removes an API call)Another 1-2 seconds will be shaved off by #115032
The largest change in this PR is related to removing the
refreshon each of the Saved Object update calls for updating theinstalled_esarray on theepm-packagesSaved Object. Prior to this PR, each update of this array involved querying the current value and then writing an update back. In order to be able to remove therefresh, we have to avoid needing to query the current value and instead keep the current value in memory. This requires keeping track of the current array and passing it into each asset type's install function.This change also exposes a known issue with Saved Object update calls that could result in conflict errors even when not using optimistic concurrency control (see #126240). In order to work around this problem, I added simple retry logic which seems to work well.
I've deferred on a couple more optimizations to a separate PR that will shave off another few seconds:
refresh: falseon the bulk Saved Object import, blocked on Expose refresh option to SavedObjectsImporter #131339Results
To install the system package, version 1.13.0:
main- 20smain+ Batch execute template and pipeline cluster state operations elasticsearch#86017 - 13.4sWhen installing all packages, the 95p changes significantly:
main- 21sView APM traces
mainmain+ elastic/elasticsearch#86017This branch + elastic/elasticsearch#86017
View install script results
This branch + elastic/elasticsearch#86017