[performance] continuous polling#256564
Conversation
| : { | ||
| ...(await getIgnoreThrottled(uiSettingsClient)), | ||
| ...defaultParams, | ||
| ...getCommonDefaultAsyncGetParams(searchConfig, options, { |
There was a problem hiding this comment.
Using the Get method appears to have been a mistake/oversight.
|
I think this one deserves two sets of eyes on our end. Requesting reviews from both @AlexGPlay and @lukasolson. |
iblancof
left a comment
There was a problem hiding this comment.
Obs-exploration code changes LGTM.
Only renaming retrieveResults to returnIntermediateResults.
| * setting this to `true` will request the search results, regardless of whether or not the search is complete. | ||
| */ | ||
| retrieveResults?: boolean; | ||
| returnIntermediateResults?: boolean; |
There was a problem hiding this comment.
does this mean that we can get incremental results while the search is in progress?
There was a problem hiding this comment.
nit: I know the Elasticsearch naming could be taken either way, but we are really talking about "partial results," not "incremental results."
But to your question, there should be no practical change in the behavior from the old retrieveResults param. retrieveResults has always meant, "retrieve all results that are available whether or not the search is complete."
The change is that instead of switching endpoints from the status to the GET endpoint when returnIntermediateResults is set, we control the behavior with a param to the GET endpoint.
This comment was marked as outdated.
This comment was marked as outdated.
|
@kertal nice to see an early validation with at least some modest gains. The gains are going to be very dependent on the original search duration.
|
This comment was marked as duplicate.
This comment was marked as duplicate.
Yes, I do agree the real gain needs then distribution of various search durations, which can't be simulated with a static scenario like the given one Update Can't be simulated, unless you ask AI to generate a few dashboards with in increasing number of stall time : for a 21s stalled search the gain seems to 4.8s seconds 🎉 , for a 22s one it's 0.8s (pending numbers, needs to run multiple times) Update 2 So this is how it looks like when loading 23 dashboards in a row with increasing query time (by increasing the stalling value on the ES filter from 1 to 23s ), first part of the the video with increased speed for better dramatization. It's not very exciting to see the same dashboard loading for several minutes 😆 . However the result at the end is exiting, showing that you get more than one long running dashboard for free. So with polling there's a speed gain of half a minute in this case 🥳 kibana-data-polling.mp4here are the numbers, in format {ES query time}:{faster dashboard render time} |
| return isRunningResponse(response) | ||
| ? timer(getPollInterval(elapsedTime)).pipe(switchMap(() => search())) | ||
| : EMPTY; |
There was a problem hiding this comment.
Curious as to why we use this vs. takeWhile?
There was a problem hiding this comment.
When I made the changes in this PR, the way it was set up before was making an extra request at the end after the results were already available. But I am not an rxjs guru so open to suggestions
| entries.forEach((entry) => { | ||
| if (entry.name.includes('/internal/search/')) { | ||
| this.protocolSupportsMultiplexing = ['h2', 'h3'].includes(entry.nextHopProtocol); | ||
| this.performanceObserver?.disconnect(); // We only need to detect this once, so we can disconnect the observer after the first match | ||
| } | ||
| }); |
There was a problem hiding this comment.
Nit: Is there any reason we need to continue looping through the array after we've found an appropriate entry?
| entries.forEach((entry) => { | |
| if (entry.name.includes('/internal/search/')) { | |
| this.protocolSupportsMultiplexing = ['h2', 'h3'].includes(entry.nextHopProtocol); | |
| this.performanceObserver?.disconnect(); // We only need to detect this once, so we can disconnect the observer after the first match | |
| } | |
| }); | |
| const entry = entries.find(({ name }) => name.includes('/internal/search/')); | |
| if (entry) { | |
| this.protocolSupportsMultiplexing = ['h2', 'h3'].includes(entry.nextHopProtocol); | |
| this.performanceObserver?.disconnect(); // We only need to detect this once, so we can disconnect the observer after the first match | |
| } | |
| }); |
| // Preserve and project first request params into responses. | ||
| let firstRequestParams: SanitizedConnectionRequestParams; | ||
|
|
||
| const pollInterval = this.deps.searchConfig.asyncSearch.pollInterval |
There was a problem hiding this comment.
So this always uses the pollInterval from configuration (if it's configured), even if the protocol supports multiplexing?
There was a problem hiding this comment.
yeah, the user's explicit settings are always respected
lukasolson
left a comment
There was a problem hiding this comment.
Hmm, when running in http2 mode, background search seems to behave a little funky:
Screen.Recording.2026-04-09.at.10.33.15.AM.mov
Here's the flow:
- Start a long query
- The first response comes back with the search ID
- Click the "send to background" button
- A new request goes out with that search ID to attach it to the background search
- We wait until that response until we show the notification (which ends up being when the search completes)
The request from 4 ends up using the same wait_for_completion_timeout as the main request, and so it doesn't complete until the search request is complete. We probably need to send a shorter wait_for_complete_timeout for that request specifically (as far as I'm aware, we don't care about the results in that response, just to make sure it gets attached to the background search saved object).
|
@lukasolson great catch. Thought I'd tested background searches but it must have slipped through the cracks. See what you think about 3a36579 and fa87041 |
lukasolson
left a comment
There was a problem hiding this comment.
Love this change! LGTM. I did notice another odd behavior while testing but it's unrelated to these changes: #262384
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Async chunks
Public APIs missing exports
Page load bundle
History
cc @drewdaemon |
pmuellr
left a comment
There was a problem hiding this comment.
ResponseOps changes LGTM

Summary
Close #229903
Close #186145
This PR implements two key (async) search performance optimizations related to polling.
Reviewer notes
retrieveResultshas been renamed toreturnIntermediateResultsto match the Elasticsearch parameter name, but should be functionally identical.Settings behavior
wait_for_completion_timeoutflowchart TD Start[Start] --> Phase{Phase?} Phase -->|Initial Submit| SubmitConfig[Use search.asyncSearch.waitForCompletion] Phase -->|Polling GET| ClientConfig{search.asyncSearch.pollLength set?} ClientConfig -->|Yes - Number| UseClientConfig[Use config value] ClientConfig -->|No| Multiplex{Is the browser using HTTP/2 or<br/>HTTP/3?} Multiplex -->|Yes| Use30s[Use 30000ms] Multiplex -->|No| Undefined[Omitted - functionally zero]pollIntervalflowchart TD Start[Start] --> ConfigSet{search.asyncSearch.pollInterval set?} ConfigSet -->|Yes| UseConfig[Use config value] ConfigSet -->|No| CheckMultiplex{HTTP/2 or<br/>HTTP/3?} CheckMultiplex -->|Yes| UseZero[Use 0ms] CheckMultiplex -->|No| CheckStatic{Static value<br/>provided?} CheckStatic -->|Yes| UseStatic[Use that value] CheckStatic -->|No| ElapsedTime{Elapsed time?} ElapsedTime -->|< 1.5s| Use300[300ms] ElapsedTime -->|< 5s| Use1000[1000ms] ElapsedTime -->|< 20s| Use2500[2500ms] ElapsedTime -->|>= 20s| Use5000[5000ms]Checklist
release_note:*label is applied per the guidelinesIdentify risks
The main risk is if an on-prem deployment has for some reason configured aggressive network timeouts and also has HTTP/2 enabled for browser communication, they may see timeout errors after upgrading.
elasticsearch.idleSocketTimeoutorserver.socketTimeoutto a value less than 30 seconds.In all these cases, the user can fix the behavior by raising the interfering timeouts (preferred) or by tuning down the poll length via
kibana.yml.Release note
Sped up fetching Elasticsearch data for setups using HTTP/2 as the browser's communication protocol.