Skip to content

[search source] return rawResponse on search failure#168389

Merged
nreese merged 30 commits intoelastic:mainfrom
nreese:kbn_search_error
Oct 23, 2023
Merged

[search source] return rawResponse on search failure#168389
nreese merged 30 commits intoelastic:mainfrom
nreese:kbn_search_error

Conversation

@nreese
Copy link
Contributor

@nreese nreese commented Oct 9, 2023

Closes #167099

Problem

/bsearch and /search APIs only return error key from elasticsearch error response. This is problematic because Inspector needs rawResponse to populate "Clusters and shards"

While working on this issue, I discovered another problem with how error responses are added to inspector requestResponder. The Error instance is added as json key. This is a little awkward since the response tab just stringifies the contents of json, thus stringifing the Error object instead of just the error body returned from API. This PR address this problem by setting json to either attributes or { message }.

Solution

PR updates /bsearch and /search APIs to return { attributes: { error: ErrorCause, rawResponse }} for failed responses. Solution avoided changing KbnServerError and reportServerError since these methods are used extensivly throughout Kibana (see #167544 (comment) for more details). Instead, KbnSearchError and reportSearchError are created to report search error messages.

Test

  1. install web logs sample data set
  2. open discover
  3. add filter
    {
      "error_query": {
        "indices": [
          {
            "error_type": "exception",
            "message": "local shard failure message 123",
            "name": "kibana_sample_data_logs",
            "shard_ids": [
              0
            ]
          }
        ]
      }
    }
    
  4. Open inspector. Verify "Clusters and shards" tab is visible and populated. Verify "Response" tab shows "error" and "rawResponse" keys.
    Screenshot 2023-10-09 at 9 29 16 AM
    Screenshot 2023-10-09 at 9 29 06 AM

@nreese
Copy link
Contributor Author

nreese commented Oct 10, 2023

@elasticmachine merge upstream

@nreese nreese marked this pull request as ready for review October 10, 2023 18:11
@nreese nreese requested review from a team as code owners October 10, 2023 18:12
@nreese nreese added release_note:skip Skip the PR/issue when compiling release notes Team:DataDiscovery Discover, search (data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. t// v8.12.0 labels Oct 10, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@nreese nreese added the Feature:Search Querying infrastructure in Kibana label Oct 10, 2023
@nreese
Copy link
Contributor Author

nreese commented Oct 11, 2023

@elasticmachine merge upstream

@nreese
Copy link
Contributor Author

nreese commented Oct 11, 2023

@elasticmachine merge upstream

@nreese
Copy link
Contributor Author

nreese commented Oct 11, 2023

@elasticmachine merge upstream

@MadameSheema MadameSheema requested a review from a team as a code owner October 12, 2023 13:39
Copy link
Contributor

@davismcphee davismcphee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM! Just left a couple of minor comments.

Otherwise my only concern is related to backward compatibility. Does changing the error response of the /search and /bsearch endpoints risk breaking backward compatibility in a Serverless upgrade scenario (i.e. backend and frontend versions off by 1)? This is less of a concern while Serverless is in private preview, but something to be mindful of regardless.

Also, unrelated to this PR, but I noticed the inspector uses X failured shard(s) in the title, which looks a little odd. Should it instead say X failed shard(s)?
failured

if (e instanceof KbnSearchError) return e;
return new KbnSearchError(
e.message ?? 'Unknown error',
e instanceof errors.ResponseError ? e.statusCode! : 500,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know e.statusCode isn't null here?

expect(error.body.attributes).toBe(indexNotFoundException.error);
expect(error.body.attributes).toEqual({
error: indexNotFoundException.error,
rawResponse: undefined,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a scenario we can test for here where rawResponse isn't undefined too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a test case to API integration tests that verifies rawResponse is returned. These tests are all just mocks, where as API integration tests test actual code paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That instead also works for me 👍

@nreese
Copy link
Contributor Author

nreese commented Oct 12, 2023

Also, unrelated to this PR, but I noticed the inspector uses X failured shard(s) in the title, which looks a little odd. Should it instead say X failed shard(s)?

I am not sure I see the problem. It says "1 failed shard" when there is a single shard failure and "3 failed shards" when there are multiple shard failures. This seems like proper english. Could you explain further?

https://github.com/elastic/kibana/blob/main/src/plugins/inspector/public/views/requests/components/details/clusters_view/clusters_table/shards_view/shard_failure_flyout.tsx#L35

{i18n.translate('inspector.requests.clusters.shards.flyoutTitle', {
              defaultMessage:
                '{failedShardCount} failured {failedShardCount, plural, one {shard} other {shards}}',
              values: { failedShardCount: failures.length },
            })}

@davismcphee
Copy link
Contributor

I am not sure I see the problem. It says "1 failed shard" when there is a single shard failure and "3 failed shards" when there are multiple shard failures. This seems like proper english. Could you explain further?

"1 failed shard" makes sense, but currently it's "1 failured shard". There's an additional "ur" in "failed" currently.

@nreese
Copy link
Contributor Author

nreese commented Oct 12, 2023

"1 failed shard" makes sense, but currently it's "1 failured shard". There's an additional "ur" in "failed" currently.

Thanks, I see that now. I can open a separate PR to resolve

Copy link
Contributor

@drewdaemon drewdaemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lens changes make sense to me. I left one question.

Search changes are owned by Discovery, so nothing needed from me there.

if (e.attributes?.error?.reason) {
return getNestedErrorClause(e.attributes.error);
}
if (e.attributes?.error?.caused_by) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why this guard was inserted. It doesn't look like we gated the logic by the existence of caused_by before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attributes.error is typed as optional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the previous logic assumed that caused_by was defined (type cast) which it must always have been since getNestedErrorClause would have errored if passed undefined.

Given this, I think changing from casting to checking is okay because it should not change the amount of times this branch gets entered.

@nreese
Copy link
Contributor Author

nreese commented Oct 22, 2023

@elasticmachine merge upstream

Comment on lines +547 to +548
// Skipping to unblock: https://github.com/elastic/kibana/pull/168389
describe.skip('With anomalies data', () => {
Copy link
Contributor

@angorayc angorayc Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce this when running locally with Network throttling. Create an issue to follow it up: #169507

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please check if the error you had was the same as #168709, if so, can you please try unskipping the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed skip, it is not longer needed now that test has been fixed

if (e.attributes?.error?.reason) {
return getNestedErrorClause(e.attributes.error);
}
if (e.attributes?.error?.caused_by) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the previous logic assumed that caused_by was defined (type cast) which it must always have been since getNestedErrorClause would have errored if passed undefined.

Given this, I think changing from casting to checking is okay because it should not change the amount of times this branch gets entered.

Copy link
Contributor

@PhilippeOberti PhilippeOberti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the Threat Hunting Investigations team!

@kibana-ci
Copy link

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
data 2547 2539 -8

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
lens 1.4MB 1.4MB +145.0B
securitySolution 13.0MB 13.0MB +136.0B
timelines 30.0KB 30.2KB +136.0B
total +417.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
data 407.4KB 407.8KB +361.0B
Unknown metric groups

API count

id before after diff
data 3202 3194 -8

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@nreese nreese merged commit adf3b8b into elastic:main Oct 23, 2023
@kibanamachine kibanamachine added the backport:skip This PR does not require backporting label Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting Feature:Search Querying infrastructure in Kibana release_note:skip Skip the PR/issue when compiling release notes Team:DataDiscovery Discover, search (data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. t// v8.12.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[search source] return raw response when search fails

9 participants