[performance] continuous polling by drewdaemon · Pull Request #256564 · elastic/kibana

drewdaemon · 2026-03-06T23:34:18Z

Summary

This PR implements two key (async) search performance optimizations related to polling.

When the browser is using a protocol which supports multiplexing, Kibana-side sleeps are eliminated and long-polling is used, ensuring results are delivered as soon as possible.
One of the Elasticsearch requests that used to happen after polling has been removed.

Reviewer notes

retrieveResults has been renamed to returnIntermediateResults to match the Elasticsearch parameter name, but should be functionally identical.

Settings behavior

`wait_for_completion_timeout`

  flowchart TD
      Start[Start] --> Phase{Phase?}

      Phase -->|Initial Submit| SubmitConfig[Use search.asyncSearch.waitForCompletion]

      Phase -->|Polling GET| ClientConfig{search.asyncSearch.pollLength set?}

      ClientConfig -->|Yes - Number| UseClientConfig[Use config value]
      ClientConfig -->|No| Multiplex{Is the browser using HTTP/2 or<br/>HTTP/3?}

      Multiplex -->|Yes| Use30s[Use 30000ms]
      Multiplex -->|No| Undefined[Omitted - functionally zero]

`pollInterval`

  flowchart TD
      Start[Start] --> ConfigSet{search.asyncSearch.pollInterval set?}

      ConfigSet -->|Yes| UseConfig[Use config value]
      ConfigSet -->|No| CheckMultiplex{HTTP/2 or<br/>HTTP/3?}

      CheckMultiplex -->|Yes| UseZero[Use 0ms]
      CheckMultiplex -->|No| CheckStatic{Static value<br/>provided?}

      CheckStatic -->|Yes| UseStatic[Use that value]
      CheckStatic -->|No| ElapsedTime{Elapsed time?}

      ElapsedTime -->|< 1.5s| Use300[300ms]
      ElapsedTime -->|< 5s| Use1000[1000ms]
      ElapsedTime -->|< 20s| Use2500[2500ms]
      ElapsedTime -->|>= 20s| Use5000[5000ms]

Checklist

Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines

Identify risks

The main risk is if an on-prem deployment has for some reason configured aggressive network timeouts and also has HTTP/2 enabled for browser communication, they may see timeout errors after upgrading.

has set elasticsearch.idleSocketTimeout or server.socketTimeout to a value less than 30 seconds.
uses a proxy that has a timeout configured at less than 30 seconds.

In all these cases, the user can fix the behavior by raising the interfering timeouts (preferred) or by tuning down the poll length via kibana.yml.

Release note

Sped up fetching Elasticsearch data for setups using HTTP/2 as the browser's communication protocol.

…us-polling

drewdaemon · 2026-03-30T17:12:29Z

src/platform/plugins/shared/data/server/search/strategies/eql_search/eql_search_strategy.ts

          : {
              ...(await getIgnoreThrottled(uiSettingsClient)),
              ...defaultParams,
-              ...getCommonDefaultAsyncGetParams(searchConfig, options, {


Using the Get method appears to have been a mistake/oversight.

…us-polling

davismcphee · 2026-04-08T00:33:13Z

I think this one deserves two sets of eyes on our end. Requesting reviews from both @AlexGPlay and @lukasolson.

stratoula

ES|QL changes LGTM!

(what a lovely change, but I agree with Davis it would need some testing. Let me know if you want me to test too, I ony did a code review)

iblancof

Obs-exploration code changes LGTM.
Only renaming retrieveResults to returnIntermediateResults.

AlexGPlay · 2026-04-08T11:25:35Z

src/platform/packages/shared/kbn-search-types/src/types.ts

+   * setting this to `true` will request the search results, regardless of whether or not the search is complete.
   */
-  retrieveResults?: boolean;
+  returnIntermediateResults?: boolean;


does this mean that we can get incremental results while the search is in progress?

nit: I know the Elasticsearch naming could be taken either way, but we are really talking about "partial results," not "incremental results."

But to your question, there should be no practical change in the behavior from the old retrieveResults param. retrieveResults has always meant, "retrieve all results that are available whether or not the search is complete."

The change is that instead of switching endpoints from the status to the GET endpoint when returnIntermediateResults is set, we control the behavior with a param to the GET endpoint.

drewdaemon · 2026-04-08T14:28:26Z

@kertal nice to see an early validation with at least some modest gains. The gains are going to be very dependent on the original search duration.

kertal · 2026-04-08T16:04:54Z

@kertal nice to see an early validation with at least some modest gains. The gains are going to be very dependent on the original search duration.

Yes, I do agree the real gain needs then distribution of various search durations, which can't be simulated with a static scenario like the given one

Update Can't be simulated, unless you ask AI to generate a few dashboards with in increasing number of stall time : for a 21s stalled search the gain seems to 4.8s seconds 🎉 , for a 22s one it's 0.8s (pending numbers, needs to run multiple times)

Update 2 So this is how it looks like when loading 23 dashboards in a row with increasing query time (by increasing the stalling value on the ES filter from 1 to 23s ), first part of the the video with increased speed for better dramatization. It's not very exciting to see the same dashboard loading for several minutes 😆 . However the result at the end is exiting, showing that you get more than one long running dashboard for free. So with polling there's a speed gain of half a minute in this case 🥳

kibana-data-polling.mp4

here are the numbers, in format {ES query time}:{faster dashboard render time}
1s:0.074s, 2s:0.474s, 3s:0.510s, 4s:0.645s, 5s:0.688s, 6s:1.982s, 7s:1.185s, 8s:0.178s, 9s:1.300s, 10s:0.711s, 11s:2.202s, 12s:1.261s, 13s:0.332s, 14s:1.897s, 15s:1.877s, 16s:2.224s, 17s:1.240s, 18s:0.253s, 19s:1.909s, 20s:0.685s, 21s:4.864s, 22s:4.077s, 23s:2.918s

lukasolson · 2026-04-08T21:14:36Z

src/platform/plugins/shared/data/common/search/poll_search.ts

+        return isRunningResponse(response)
+          ? timer(getPollInterval(elapsedTime)).pipe(switchMap(() => search()))
+          : EMPTY;


Curious as to why we use this vs. takeWhile?

When I made the changes in this PR, the way it was set up before was making an extra request at the end after the results were already available. But I am not an rxjs guru so open to suggestions

lukasolson · 2026-04-08T23:47:35Z

src/platform/plugins/shared/data/public/search/search_interceptor/search_interceptor.ts

+        entries.forEach((entry) => {
+          if (entry.name.includes('/internal/search/')) {
+            this.protocolSupportsMultiplexing = ['h2', 'h3'].includes(entry.nextHopProtocol);
+            this.performanceObserver?.disconnect(); // We only need to detect this once, so we can disconnect the observer after the first match
+          }
+        });


Nit: Is there any reason we need to continue looping through the array after we've found an appropriate entry?

Suggested change

entries.forEach((entry) => {

if (entry.name.includes('/internal/search/')) {

this.protocolSupportsMultiplexing = ['h2', 'h3'].includes(entry.nextHopProtocol);

this.performanceObserver?.disconnect(); // We only need to detect this once, so we can disconnect the observer after the first match

}

});

const entry = entries.find(({ name }) => name.includes('/internal/search/'));

if (entry) {

this.protocolSupportsMultiplexing = ['h2', 'h3'].includes(entry.nextHopProtocol);

this.performanceObserver?.disconnect(); // We only need to detect this once, so we can disconnect the observer after the first match

}

});

lukasolson · 2026-04-08T23:57:10Z

src/platform/plugins/shared/data/public/search/search_interceptor/search_interceptor.ts

    // Preserve and project first request params into responses.
    let firstRequestParams: SanitizedConnectionRequestParams;

+    const pollInterval = this.deps.searchConfig.asyncSearch.pollInterval


So this always uses the pollInterval from configuration (if it's configured), even if the protocol supports multiplexing?

yeah, the user's explicit settings are always respected

lukasolson

Hmm, when running in http2 mode, background search seems to behave a little funky:

Screen.Recording.2026-04-09.at.10.33.15.AM.mov

Here's the flow:

Start a long query
The first response comes back with the search ID
Click the "send to background" button
A new request goes out with that search ID to attach it to the background search
We wait until that response until we show the notification (which ends up being when the search completes)

The request from 4 ends up using the same wait_for_completion_timeout as the main request, and so it doesn't complete until the search request is complete. We probably need to send a shorter wait_for_complete_timeout for that request specifically (as far as I'm aware, we don't care about the results in that response, just to make sure it gets attached to the background search saved object).

…us-polling

drewdaemon · 2026-04-09T18:20:37Z

@lukasolson great catch. Thought I'd tested background searches but it must have slipped through the cracks. See what you think about 3a36579 and fa87041

lukasolson

Love this change! LGTM. I did notice another odd behavior while testing but it's unrelated to these changes: #262384

elasticmachine · 2026-04-09T22:00:02Z

💚 Build Succeeded

Buildkite Build
Commit: 3322067

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`data`	2622	2619	-3

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`esql`	821.2KB	821.2KB	+10.0B
`streamsApp`	1.9MB	1.9MB	+10.0B
total			+20.0B

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`data`	32	31	-1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`data`	441.9KB	442.8KB	+888.0B

Unknown metric groups

API count

id	before	after	diff
`data`	3245	3243	-2

History

cc @drewdaemon

pmuellr

ResponseOps changes LGTM

drewdaemon added 8 commits March 6, 2026 15:51

bump default submit wait to 1s

1d2ea99

rename retrieveResults

cd165c6

extend the timeout

295b9d8

remove the final poll request

07f1614

Merge branch 'main' of github.com:elastic/kibana into 229903/continuo…

79fdfdf

…us-polling

add poll length

3a1a411

Automatically apply configuration based on protocol

ab6d7b3

Fix timeout problem

78976f3

drewdaemon mentioned this pull request Mar 19, 2026

[performance] continuous polling no backoff #258653

Closed

elasticmachine and others added 8 commits March 24, 2026 08:35

Merge branch 'main' into 229903/continuous-polling

80f4390

Merge branch 'main' of github.com:elastic/kibana into 229903/continuo…

8c4b61a

…us-polling

Merge branch 'main' of github.com:elastic/kibana into 229903/continuo…

ef5e33f

…us-polling

respect pollLength yml setting

25aaafa

return_intermediate_results defaults to false

0e902f6

remove buffer size change

f9f7873

update strategy test

b9eebf8

fix strategy tests

ae81692

drewdaemon commented Mar 30, 2026

View reviewed changes

drewdaemon added 9 commits March 30, 2026 11:57

make pollLength a duration

a65ef09

configure serverless

13a75ee

better format

1fba93c

better config

73eca8f

Merge branch 'main' of github.com:elastic/kibana into 229903/continuo…

6df22f3

…us-polling

rename

fe33d20

formatting

f428389

add search interceptor test

33ca667

update poll length to be more human readable

6b0ca9c

drewdaemon added the release_note:enhancement label Mar 30, 2026

drewdaemon added 2 commits March 30, 2026 16:20

some name misses

27188b8

serverless config fix

fd208c6

davismcphee requested review from AlexGPlay and lukasolson April 8, 2026 00:33

Merge branch 'main' into 229903/continuous-polling

50a1d18

stratoula approved these changes Apr 8, 2026

View reviewed changes

iblancof approved these changes Apr 8, 2026

View reviewed changes

AlexGPlay reviewed Apr 8, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

This comment was marked as duplicate.

Sign in to view

lukasolson reviewed Apr 8, 2026

View reviewed changes

Update serverless.yml

2d5e914

drewdaemon removed the request for review from a team April 9, 2026 16:11

update search interceptor

8e4eb8e

lukasolson requested changes Apr 9, 2026

View reviewed changes

drewdaemon added 2 commits April 9, 2026 11:46

Merge branch 'main' of github.com:elastic/kibana into 229903/continuo…

701c4c6

…us-polling

use 0 for wait_for_completion_timeout with session polls

3a36579

drewdaemon requested a review from lukasolson April 9, 2026 18:19

drewdaemon added 3 commits April 9, 2026 12:25

scope the change

fa87041

add comment

f01cc85

Merge branch 'main' into 229903/continuous-polling

3322067

lukasolson approved these changes Apr 9, 2026

View reviewed changes

pmuellr approved these changes Apr 9, 2026

View reviewed changes

drewdaemon merged commit 9b6b455 into elastic:main Apr 10, 2026
17 checks passed

kibanamachine added the v9.4.0 label Apr 10, 2026

drewdaemon mentioned this pull request Apr 10, 2026

[performance] optimize Serverless configuration for continuous polling #262570

Open

Conversation

drewdaemon commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reviewer notes

Settings behavior

wait_for_completion_timeout

pollInterval

Checklist

Identify risks

Release note

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davismcphee commented Apr 8, 2026

Uh oh!

stratoula left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iblancof left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

drewdaemon commented Apr 8, 2026

Uh oh!

This comment was marked as duplicate.

kertal commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukasolson left a comment

Choose a reason for hiding this comment

Uh oh!

drewdaemon commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukasolson left a comment

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Apr 9, 2026

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Async chunks

Public APIs missing exports

Page load bundle

API count

History

Uh oh!

pmuellr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

drewdaemon commented Mar 6, 2026 •

edited

Loading

`wait_for_completion_timeout`

`pollInterval`

stratoula left a comment •

edited

Loading

kertal commented Apr 8, 2026 •

edited

Loading

drewdaemon commented Apr 9, 2026 •

edited

Loading