[Fleet] Finer-grained error information from install/upgrade API#95649
[Fleet] Finer-grained error information from install/upgrade API#95649skh merged 11 commits intoelastic:masterfrom
Conversation
166f76f to
716957a
Compare
|
@jen-huang after 716957a installing a broken package with a direct POST should correctly report errors and attempt to roll back, but the bulk install code path (which is also used by the setup code path) will still fail. Feel free to have a look, otherwise I'll just continue working on it. |
8bed5f9 to
3f217d8
Compare
|
Pinging @elastic/fleet (Feature:EPM) |
|
@jen-huang This is ready for review. If you don't have time to do all the manual tests, could you review the test scenarios in the initial description to check if I missed something important? @kevinlog This touches the bulk install functionality used by the security solution, who from your team could review this? |
|
@afgomez maybe you could have a look at this too, and maybe you find the test descriptions helpful for other EPM work. |
f1a8da4 to
f2243fc
Compare
f2243fc to
9e9e11e
Compare
jfsiii
left a comment
There was a problem hiding this comment.
I did not run this locally but I am comfortable merging it based on the description, tests, and my understanding of the code.
Let's get this in before FF and keep 👁️ and👂open for any issues 🚀
9e9e11e to
d3a4379
Compare
|
++ on getting this in. |
kevinlog
left a comment
There was a problem hiding this comment.
LGTM, this will be really helpful moving forward
a198cb6 to
9fa2e24
Compare
💚 Build SucceededMetrics [docs]Unknown metric groupsAPI count
API count missing comments
History
To update your PR or re-run it, just comment with: |
…stic#95649) * Intercept installation errors and add meta info. * Adjust mock. * Catch errors in all steps of install/upgrade. * Adjust handler for direct package upload. * Don't throw not-found errors on assets during rollback. * Correctly catch errors from _installPackage() * Propagate error from installResult in bulk install case. * Add tests for rollback. * Remove unused code. * Skipping test that doesn't test what it says. * Fix and reenable test.
) (#97400) * Intercept installation errors and add meta info. * Adjust mock. * Catch errors in all steps of install/upgrade. * Adjust handler for direct package upload. * Don't throw not-found errors on assets during rollback. * Correctly catch errors from _installPackage() * Propagate error from installResult in bulk install case. * Add tests for rollback. * Remove unused code. * Skipping test that doesn't test what it says. * Fix and reenable test.
…te-legacy-es-client * 'master' of github.com:elastic/kibana: (102 commits) [Exploratory view] integerate page views to exploratory view (elastic#97258) Fix typo in license_api_guard README name and import http server mocks from public interface (elastic#97334) Avoid mutating KQL query when validating it (elastic#97081) Add description as title on tag badge (elastic#97109) Remove legacy ES client usages in `home` and `xpack_legacy` (elastic#97359) [Fleet] Finer-grained error information from install/upgrade API (elastic#95649) Rule registry bundle size (elastic#97251) [Partial Results] Move other bucket into Search Source (elastic#96384) [Dashboard] Makes lens default editor for creating new panels (elastic#96181) skip flaky suite (elastic#97387) [Asset Management] Agent picker follow up (elastic#97357) skip flaky suite (elastic#97382) [Security Solutions] Fixes flake with cypress tests (elastic#97329) skip flaky suite (elastic#97355) Skip test to try and stabilize master minimize number of so fild asserted in tests. it creates flakines when implementation details change (elastic#97374) [Search Sessions] Client side search cache (elastic#92439) [SavedObjects] Add aggregations support (elastic#96292) [Reporting] Remove legacy elasticsearch client usage from the reporting plugin (elastic#97184) [kbnClient] fix basePath handling and export reponse type (elastic#97277) ... # Conflicts: # x-pack/plugins/watcher/server/lib/license_pre_routing_factory/license_pre_routing_factory.ts # x-pack/plugins/watcher/server/plugin.ts # x-pack/plugins/watcher/server/routes/api/indices/register_get_route.ts # x-pack/plugins/watcher/server/routes/api/license/register_refresh_route.ts # x-pack/plugins/watcher/server/routes/api/register_list_fields_route.ts # x-pack/plugins/watcher/server/routes/api/register_load_history_route.ts # x-pack/plugins/watcher/server/routes/api/settings/register_load_route.ts # x-pack/plugins/watcher/server/routes/api/watch/action/register_acknowledge_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_activate_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_deactivate_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_delete_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_execute_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_history_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_load_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_save_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_visualize_route.ts # x-pack/plugins/watcher/server/routes/api/watches/register_delete_route.ts # x-pack/plugins/watcher/server/routes/api/watches/register_list_route.ts # x-pack/plugins/watcher/server/shared_imports.ts # x-pack/plugins/watcher/server/types.ts
Summary
Partially implements #91864
The approach, trying to stay minimally invasive, is:
_installPackage()unchanged. If errors happen, they are thrown upwards.installPackageFromRegistry()andinstallPackageByUpload(), add atry / catchblock around the call to_installPackage(). If errors are caught, they are not thrown, but added to theinstallResultthey return, along with the information whatinstallTypethis operation was (install,upgradeetc.)InstallResultis changedInstallResultis returned frominstallPackage()installPackage()inspect the return value and throw any error they find, so that overall behavior isn't changed.This is all in preparation for callers being able to not re-throw the error they find, but report it back to the user in case they deem it non-fatal. This will happen in a second PR.
How to test this
In all tests, adjust
BASEPATHto match your locally running system, or remove it. For reference,BASEPATHin allcurlcommands below isrch. (You can set this withserver.basePath: "/rch"inkibana.dev.yml)After the changes in this PR, the behavior in error situations should be exactly like before. Unfortunately, not all error scenarios are covered by our tests run in CI and need to be tested manually. Specifically, testing the
setupAPI endpoints with broken required packages is not possible in CI, as we can have only one registry setup in CI (a combination of a docker container containing the registry and most packages, and the packages contained infleet_api_integration/api/fixtures), and we need a workingsetupfor most other integration tests.Tests with
systemCraft a
systempackage that triggers an error during installation, serve it from a locally running registry. Do do so, edit any dashboard insystem/0.10.9to includeinstead of
7.3.0.Use this to test the error during a plain installation, by calling
422 Unprocessable entityNote that the rollback refuses to uninstall
systembecause it is a required package.Now delete
systemwithforce: truelike this:and verify it is uninstalled with
The response should contain
"status":"not_installed".Use the same broken
systempackage as above.Install a previous, non-broken version of the package with
Then update the
systempackage with a call toObserve the errors in the return value from the API call, and in the log. Verify that the rollback to the older version worked with another call to
and searching for the value of
status, it should beinstalled. Observe that"install_version":"0.10.9"and"install_status":"installing", these are leftovers of the failed upgrade and subsequent rollback.Delete the
systempackage again withNow use the same broken
systempackage to test the setup and bulk install endpoints. Test the fresh install of the brokensystem-0.10.9as well as the update from the non brokensystem-0.10.7to the brokensystem-0.10.9. The relevantcurlcommands are:Tests with
endpointOne way to break the endpoint package is to make one of the ingest pipelines in the contained data streams invalid by changing
"ignore_failure": trueto"ignore_failures": truein theprocessor(or introducing any other syntax error here).Then open the UI and navigate directly to Security -> Overview. (Do NOT open the Fleet page first, as this would call the setup endpoints which would invalidate this test.)
Observe the call to the
/api/fleet/epm/packages/_bulkendpoint in the network requests. It responds with200 OK, and the response{"response":[{"name":"endpoint","statusCode":500,"error":"parse_exception"}]}. This behavior is unchanged from before. (Theparse_exceptioncomes from the invalid ingest pipeline.)In the log, observe these errors:
(Specifically, note that Fleet again refuses to uninstall the
endpointpackage during rollback as it is a required package.)Additional tests would be to first install a non-broken earlier version of
endpoint, then open the Security UI and verify that an attempt to update the package to the current, broken version was made and the rollback was successful. This is, however, already covered by testing the_bulkendpoint in isolation as described above.Other possible tests
/epm/packagesroute (optional, if you're only testing through the API withcurlcalls, as you can installsystemlike any other package before the UI triggers the setup)In addition to that, verify that normal functionality is not broken, i.e. try to break it in any way you can think of.