[performance] process-wide cache for advanced settings lookup#262618
[performance] process-wide cache for advanced settings lookup#262618stratoula merged 43 commits intoelastic:mainfrom
Conversation
…daemon/kibana into advanced-settings-per-setting-cache
…daemon/kibana into advanced-settings-per-setting-cache
Flaky Test Runner Stats🟠 Some tests failed. - kibana-flaky-test-suite-runner#11609[❌] src/platform/test/functional/apps/context/config.ts: 2/5 tests passed. |
e2c4760 to
48b9319
Compare
Flaky Test Runner Stats🎉 All tests passed! - kibana-flaky-test-suite-runner#11610[✅] src/platform/test/functional/apps/context/config.ts: 10/10 tests passed. |
There was a problem hiding this comment.
🏆 For this PR
While then doing visual performance testing locally with default settings, there's not much difference between main and this PR, when simulating a slower network connection between ES and Kibana, the win in this case is clearly visible. What I did was applying the following in my local system to slow down the connection:
# Enable pf
sudo pfctl -E
# Configure the dummynet pipe with 100ms delay
sudo dnctl pipe 1 config delay 100ms
# Apply both inbound and outbound rules in one shot
sudo pfctl -f - <<'EOF'
dummynet out proto tcp from any to 127.0.0.1 port 9200 pipe 1
dummynet in proto tcp from any port 9200 to any pipe 1
EOF
Loading our e-commerce dashboard in this case looked like this
Median improvement: 0.706s, Average 1.004s when running it 10x. Left side with cache, right side without
More details
| return result; | ||
| } | ||
|
|
||
| private async computeUserProvided<T = unknown>(bypassCache = false): Promise<UserProvided<T>> { |
There was a problem hiding this comment.
looks like bypassCache is not used?
| { handleWriteErrors }: { validateKeys?: boolean; handleWriteErrors?: boolean } = {} | ||
| ) { | ||
| this.cache.del(); | ||
| // check if a read is currently in progress, wait for it to complete before proceeding |
There was a problem hiding this comment.
Seems like an outdated comment.
|
|
||
| this.onWriteHook(changes); | ||
|
|
||
| // Register write operation as in-flight so concurrent reads will wait |
There was a problem hiding this comment.
Seems like an outdated comment
| timer: NodeJS.Timeout; | ||
| } | ||
|
|
||
| export const NAMESPACED_CACHE_TTL = 60_000; |
There was a problem hiding this comment.
I think we discussed lowering this to reduce eventual consistency time in a multinode setup?
There was a problem hiding this comment.
Yes, totally spaced 👍
There was a problem hiding this comment.
What value do you think is reasonable given the cache-bust-on-page-load addition?
There was a problem hiding this comment.
5 seconds to be on the safe side? I think this is what we had in the per-request cache?
There was a problem hiding this comment.
I think 5 seconds made sense for a request cache because few requests take longer than 5 seconds.
On a busy cluster, the cache will be getting refreshed via page load often so we're really talking about the absolute outer-bound on eventual consistency. And looking at it from a dashboard perspective, users will often be changing a filter, etc at greater than 5 second intervals so I think they'd often be waiting on a refresh if the TTL is so low.
That together with the fact that changing these settings is an edge case makes me lean more aggressive. E.g. 10-15 seconds.
But, even with 5 seconds this is a significant improvement over the current situation. I defer to you, of course!
There was a problem hiding this comment.
60 is def very aggressive. I would be open for 10s though, I think it will work nice. But @Dosant you prefer 5s it is ok by me. Let me know and I will make the change (Drew is out and he told me to take care of this PR)
There was a problem hiding this comment.
thank you both! let's do 10! 👍
There was a problem hiding this comment.
(Agreed that 60 is too aggressive... I put it there before I considered multi-node scenarios)
…daemon/kibana into advanced-settings-per-setting-cache
Dosant
left a comment
There was a problem hiding this comment.
LGTM! thanks for getting this over the line
| export class NamespacedCache<T = unknown> { | ||
| private readonly entries = new Map<string, NamespacedCacheEntry<T>>(); | ||
| private readonly inflightReads = new Map<string, Promise<T>>(); | ||
| private readonly inflightWrites = new Map<string, Promise<void>>(); |
There was a problem hiding this comment.
I think inflightWrites are not used and can be cleaned up.
| timer: NodeJS.Timeout; | ||
| } | ||
|
|
||
| export const NAMESPACED_CACHE_TTL = 60_000; |
There was a problem hiding this comment.
thank you both! let's do 10! 👍
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Public APIs missing comments
History
|
walterra
left a comment
There was a problem hiding this comment.
Code review only for datavis codeownership, LGTM.
…sationChanges23 * commit '9a7b717c662d1c904052bc59f0e5a81daab87c7f': (145 commits) Upgrade EUI to v114.2.0 (elastic#264550) [Entity Analytics] Add missing OpenAPI descriptions and examples to p… (elastic#264778) [Entity Resolution] Clarify CSV upload result for already-linked entities (elastic#264689) [AI Infra] Fix failing GenAI Settings Scout tests (elastic#260496) [Agent Builder] [Bug Bash] OAuth connector settings mention fields that are not there (elastic#264756) [performance] process-wide cache for advanced settings lookup (elastic#262618) [CI] Update limits.yml for securitySolution (elastic#264946) [SLO] Fix APM embeddable ids (elastic#264750) [EDR Workflows] Unify artifacts empty state buttons (elastic#264389) [Alert Triage workflow] Adds security.buildAlertEntityGraph and security.renderAlertNarrative… (elastic#259159) [SigEvents] Add KI feature identification endpoints and refactor task to use shared service (elastic#263528) [Scout] Migrate Data Views API tests from FTR - Part5 (elastic#264088) [Cases] Apply shared extended_fields path util server side (elastic#264706) [Lens as code] Fix metric trendline (elastic#264777) [api-docs] 2026-04-22 Daily api_docs build (elastic#264882) [Scout] Update test config manifests (elastic#264575) [workflows_management] Lazy-load Zod connector schemas to cut idle memory (elastic#264283) [ES|QL] Fix ES|QL columns reset race during active fetch (elastic#263947) [Content List] Column layout props, sticky actions, and title click handlers (elastic#264203) [Lens as code] Validate `id` in route for new vis types (elastic#264480) ...
…c#262618) ## Summary This is a performance enhancement to the UI Settings service. It introduces a space-aware, process-wide cache layer for the settings saved object. It applies to `.get('setting-name')` lookups on all UI settings clients. ### Justification When a dashboard loads, search-related settings need to be consulted before each search request. That can mean 20+ saved object lookups (see elastic#84923) which are each happening in different requests, so the [per-request cache](elastic#84513) doesn't cover it. ### Caching approach `NamespacedCache` is the class that handles caching. It stores both resolved settings objects and in-flight requests to deduplicate read operations (`getUserDefined`). The server-side cache is freshened in three scenarios - A setting is changed (`setMany`) - A browser requests `index.html` - A settings lookup occurs after the TTL has expired The browser UI settings client is guaranteed to get fresh settings page loads (this preserves the existing UX for changing settings). In a multi-node scenario, the node performing a setting change gets fast consistency. Other nodes get eventual consistency. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Stratou <efstratia.kalafateli@elastic.co>
) This PR: - stabilizes the suite by reducing advanced-setting flips. - groups disabled and enabled scenarios so the suite only transitions into the enabled state once. - waits for the enabled setting to propagate before running the enabled-path assertions (following [ref](#262618 (comment))).
Summary
This is a performance enhancement to the UI Settings service. It introduces a space-aware, process-wide cache layer for the settings saved object. It applies to
.get('setting-name')lookups on all UI settings clients.Justification
When a dashboard loads, search-related settings need to be consulted before each search request. That can mean 20+ saved object lookups (see #84923) which are each happening in different requests, so the per-request cache doesn't cover it.
Caching approach
NamespacedCacheis the class that handles caching. It stores both resolved settings objects and in-flight requests to deduplicate read operations (getUserDefined).The server-side cache is freshened in three scenarios
setMany)index.htmlThe browser UI settings client is guaranteed to get fresh settings page loads (this preserves the existing UX for changing settings).
In a multi-node scenario, the node performing a setting change gets fast consistency. Other nodes get eventual consistency.
Checklist