Skip to content

Handle bulkGet errors on package retrieval from ES storage#111114

Merged
joshdover merged 1 commit intoelastic:masterfrom
joshdover:improve-es-storage-error-handling
Sep 3, 2021
Merged

Handle bulkGet errors on package retrieval from ES storage#111114
joshdover merged 1 commit intoelastic:masterfrom
joshdover:improve-es-storage-error-handling

Conversation

@joshdover
Copy link
Copy Markdown
Contributor

@joshdover joshdover commented Sep 3, 2021

Summary

Adds error handling and logging for the case where not all package assets can be successfully retrieved from storage. Now in this case, we'll fallback to attempting to fetch the package from the registry when we can't successfully retrieve all the assets from ES for some reason.

This came up on the edge-oblt cluster on the latest 8.0.0 snapshot. In my analysis of that cluster, I'm not seeing any assets that are actually missing, so I currently suspect a different bug that may be produced by something in the SOC. In particular, this recent change seems relevant: #109967

This additional logging and fallback behavior should help us nail down the root cause.

@nchaulet any recommendations on where to add tests for this? The best place would be unit tests on this archive storage code, but it seems we don't have this yet. Outside of that option I don't see any good options since I'm not yet sure of the root cause here to be able to reproduce.

@joshdover joshdover added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.15.0 v7.16.0 labels Sep 3, 2021
@joshdover joshdover requested a review from a team as a code owner September 3, 2021 11:46
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/fleet (Team:Fleet)

Copy link
Copy Markdown
Member

@kpollich kpollich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM and I'm okay landing this before tests are sorted as we're working with a bit of an unknown bug 🚀

@kibanamachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@joshdover joshdover added the auto-backport Deprecated - use backport:version if exact versions are needed label Sep 3, 2021
@joshdover joshdover merged commit ed18699 into elastic:master Sep 3, 2021
@joshdover joshdover deleted the improve-es-storage-error-handling branch September 3, 2021 14:01
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 3, 2021
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Sep 3, 2021
@kibanamachine
Copy link
Copy Markdown
Contributor

💚 Backport successful

Status Branch Result
7.15
7.x

The backport PRs will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Sep 3, 2021
…111148)

Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
kibanamachine added a commit that referenced this pull request Sep 3, 2021
…111147)

Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Deprecated - use backport:version if exact versions are needed release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.15.0 v7.16.0 v8.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants