🌊 Streams: Prevent concurrent access by flash1293 · Pull Request #222961 · elastic/kibana

flash1293 · 2025-06-06T10:45:06Z

This PR guards changes to the streams state that go through State.attemptChanges via the newly introduced lock manager.

If two requests are happening at the same time, one of them now fails with a 409.

Concerns

Lock expiry is 30s for now - is this too little? Should be good enough for now, maybe we need to reconsider once we introduce the bulk api
This is only guarding changes that go through the State class - some things like queries and dashboards do not, so they can still be subject to race conditions. We could sprinkle more locks over the code base, but I would like to solve this by moving them into State as well, that seems like the cleaner approach, even though a bit more effort
Biggest question - on this PR the concurrent request fails directly with a 409. Is this OK or should it wait and retry a couple times? I'm in favor of starting like this and seeing if this is actually a problem.

…manager

elasticmachine · 2025-06-06T10:45:10Z

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

…apping'

…manager

…ana into flash1293/use-lock-manager

klacabane · 2025-06-09T12:29:48Z

x-pack/test/api_integration/deployment_agnostic/apis/observability/streams/conflicts.ts

+      await disableStreams(apiClient);
+    });
+
+    it('should not allow multiple requests manipulating streams state at once', async () => {


fork should be slow enough that we won't get flaky results on this ? perhaps to be safe we could trigger an even longer request, like a upsert that creates 2-3 child from root

dgieselaar · 2025-06-09T12:39:01Z

x-pack/platform/plugins/shared/streams/server/lib/streams/state_management/state.ts

-      }
+      const lmService = dependencies.lockManager;
+      return lmService
+        .withLock('streams_api', async () => {


wdyt of something like streams/apply_changes?

klacabane · 2025-06-09T13:09:35Z

Lock expiry is 30s for now - is this too little? Should be good enough for now, maybe we need to reconsider once we introduce the bulk api

would be great to revisit this once we get enough telemetry of the endpoints. perhaps the execution time of the api tests should be a good starting point to get an initial value ?

Biggest question - on this PR the concurrent request fails directly with a 409. Is this OK or should it wait and retry a couple times? I'm in favor of starting like this and seeing if this is actually a problem.

Failing sounds safer to me as an initial approach, at least user get the info that there may be conflicting changes ran in parallel and give them a chance to review the latest state before submitting again. We can add telemetry to the concurrent event, but I'm not sure how to measure whether it is more an annoyance or a helper

dgieselaar · 2025-06-09T13:17:27Z

Fwiw, the ttl of the lock should be extended as long as the node can still reach ES. It does not mean the task should complete within the ttl.

klacabane · 2025-06-09T13:30:17Z

x-pack/platform/plugins/shared/streams/server/lib/streams/state_management/state.ts

+          }
+        })
+        .catch((error) => {
+          if (error instanceof LockAcquisitionError) {


nit there is isLockAcquisitionError helper

klacabane · 2025-06-09T13:40:09Z

thanks @dgieselaar

TTL-Based Lease Each lock has a short, fixed lifespan (default 30s) and will automatically expire if not renewed. While the callback is executing, the lock will automatically extend the TTL to keep the lock active. This safeguards against deadlocks because if a Kibana node crashes after having obtained a lock it will automatically be released after 30 seconds.

elasticmachine · 2025-06-09T14:29:01Z

💚 Build Succeeded

Buildkite Build
Commit: 5f94e56

Metrics [docs]

✅ unchanged

History

💔 Build #306154 failed 19e7b61

kibanamachine · 2025-06-16T15:52:44Z

Starting backport for target branches: 8.19

https://github.com/elastic/kibana/actions/runs/15685623916

This PR guards changes to the streams state that go through `State.attemptChanges` via the newly introduced lock manager. If two requests are happening at the same time, one of them now fails with a 409. ## Concerns * Lock expiry is 30s for now - is this too little? Should be good enough for now, maybe we need to reconsider once we introduce the bulk api * This is only guarding changes that go through the `State` class - some things like queries and dashboards do not, so they can still be subject to race conditions. We could sprinkle more locks over the code base, but I would like to solve this by moving them into `State` as well, that seems like the cleaner approach, even though a bit more effort * Biggest question - on this PR the concurrent request fails directly with a 409. Is this OK or should it wait and retry a couple times? I'm in favor of starting like this and seeing if this is actually a problem. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Lacabane <kevin.lacabane@elastic.co> (cherry picked from commit a8b2ac6)

kibanamachine · 2025-06-16T16:00:13Z

💚 All backports created successfully

Status	Branch	Result
✅	8.19

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

# Backport This will backport the following commits from `main` to `8.19`: - [🌊 Streams: Prevent concurrent access (#222961)](#222961)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport)  Co-authored-by: Joe Reuter <johannes.reuter@elastic.co> Co-authored-by: Kevin Lacabane <kevin.lacabane@elastic.co>

flash1293 added 2 commits June 6, 2025 12:38

add lock manager

5bc8430

Merge remote-tracking branch 'upstream/main' into flash1293/use-lock-…

b518e57

…manager

flash1293 requested a review from a team as a code owner June 6, 2025 10:45

flash1293 added release_note:skip Skip the PR/issue when compiling release notes Team:obs-onboarding Observability Onboarding Team backport:version Backport to applied version labels Feature:Streams This is the label for the Streams Project v9.1.0 v8.19.0 labels Jun 6, 2025

kibanamachine and others added 4 commits June 6, 2025 10:56

[CI] Auto-commit changed files from 'node scripts/styled_components_m…

19e7b61

…apping'

Merge remote-tracking branch 'upstream/main' into flash1293/use-lock-…

5f0f7d3

…manager

move things where they belong

5be7810

Merge branch 'flash1293/use-lock-manager' of github.com:flash1293/kib…

2ea161c

…ana into flash1293/use-lock-manager

klacabane reviewed Jun 9, 2025

View reviewed changes

klacabane approved these changes Jun 9, 2025

View reviewed changes

Merge branch 'main' into flash1293/use-lock-manager

5f94e56

dgieselaar reviewed Jun 9, 2025

View reviewed changes

klacabane reviewed Jun 9, 2025

View reviewed changes

flash1293 added 2 commits June 16, 2025 14:18

Merge branch 'main' into flash1293/use-lock-manager

bc7d62b

comments

edbce3b

flash1293 enabled auto-merge (squash) June 16, 2025 13:39

flash1293 merged commit a8b2ac6 into elastic:main Jun 16, 2025
10 checks passed

kibanamachine mentioned this pull request Jun 16, 2025

[8.19] 🌊 Streams: Prevent concurrent access (#222961) #224098

Merged

kibanamachine mentioned this pull request Jun 17, 2025

[Response Ops][Connectors] New xpack.actions.webhook.ssl.pfx.enabled config #222507

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌊 Streams: Prevent concurrent access#222961

🌊 Streams: Prevent concurrent access#222961
flash1293 merged 9 commits intoelastic:mainfrom
flash1293:flash1293/use-lock-manager

flash1293 commented Jun 6, 2025 •

edited by kibanamachine

Loading

Uh oh!

elasticmachine commented Jun 6, 2025

Uh oh!

klacabane Jun 9, 2025

Uh oh!

dgieselaar Jun 9, 2025 •

edited

Loading

Uh oh!

klacabane commented Jun 9, 2025

Uh oh!

dgieselaar commented Jun 9, 2025

Uh oh!

klacabane Jun 9, 2025

Uh oh!

klacabane commented Jun 9, 2025

Uh oh!

elasticmachine commented Jun 9, 2025

Uh oh!

Uh oh!

kibanamachine commented Jun 16, 2025

Uh oh!

kibanamachine commented Jun 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

flash1293 commented Jun 6, 2025 • edited by kibanamachine Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Concerns

Uh oh!

elasticmachine commented Jun 6, 2025

Uh oh!

klacabane Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

dgieselaar Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klacabane commented Jun 9, 2025

Uh oh!

dgieselaar commented Jun 9, 2025

Uh oh!

klacabane Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

klacabane commented Jun 9, 2025

Uh oh!

elasticmachine commented Jun 9, 2025

💚 Build Succeeded

Metrics [docs]

History

Uh oh!

Uh oh!

kibanamachine commented Jun 16, 2025

Uh oh!

kibanamachine commented Jun 16, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

flash1293 commented Jun 6, 2025 •

edited by kibanamachine

Loading

dgieselaar Jun 9, 2025 •

edited

Loading