[Entity Store] Add support to Cross Cluster Search by romulets · Pull Request #254779 · elastic/kibana

romulets · 2026-02-24T16:59:22Z

Summary

Adds support for extracting entity data from remote (CCS) indices in addition to local indices. Remote indices are queried in parallel with local extraction; partial entities from CCS are written to the updates data stream so the next run merges them into the latest index. Also fixes persistence of the extraction window by storing lastSearchTimestamp as lastExecutionTimestamp.

Obs: only works with all clusters being on >9.4.0

How the CCS solution works

Main extraction uses ESQL with a LOOKUP JOIN against the latest index, which does not support cross-cluster indices. So we split index patterns into local and remote, run two paths in parallel, and let the existing “updates → next run” flow merge remote data.

┌─────────────────────────────────────────────────────────────────────────────┐
│                        Single extraction run                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Index patterns (data view)                                                │
│         │                                                                   │
│         ▼                                                                   │
│   ┌──────────────┐                                                          │
│   │ Split by CCS │──────┬──────────────────────────────────────────────┐    │
│   └──────────────┘      │                                              │    │
│                         ▼                                              ▼    │
│              ┌──────────────────┐                    ┌───────────────────┐  │
│              │ Local patterns   │                    │ Remote patterns   │  │
│              │ (e.g. logs-*)    │                    │ (e.g. remote:logs)│  │
│              └────────┬─────────┘                    └────────┬──────────┘  │
│                       │                                       │             │
│                       ▼                                       ▼             │
│              ┌──────────────────┐                    ┌──────────────────┐   │
│              │ Main extraction  │                    │ CCS extraction   │   │
│              │ ESQL + LOOKUP    │                    │ ESQL (no LOOKUP) │   │
│              │ → latest index   │                    │ → updates stream │   │
│              └──────────────────┘                    └──────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                        Next extraction run                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Main extraction reads:  [ updates stream, local patterns, ... ]           │
│   → Picks up partial entities written by CCS in the previous run            │
│   → LOOKUP JOIN + merge → latest index                                      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Step 1 (current run)
- CCS extraction reads only remote index patterns.
- Runs ESQL aggregation (same logic as main, but no LOOKUP).
- Writes results as partial entities to the updates data stream (with @timestamp in the extraction window, e.g. toDateISO + small random offset to reduce collisions).
Step 2 (next run)
- Main extraction uses index patterns that include the updates data stream (and local indices).
- Documents written by CCS in step 1 are now in that stream.
- Main extraction runs ESQL + LOOKUP JOIN against the latest index and merges everything (local + updates, including those CCS partials) into latest.

No change is required to the main extraction logic; it already reads from updates. We only add a parallel path that feeds updates from remote indices so the next run sees them.

Fix: `lastExecutionTimestamp` from `lastSearchTimestamp`

The next extraction window is computed as:

fromDate = lastExecutionTimestamp (minus delay), or lookback if not set
toDate = now − delay

So lastExecutionTimestamp must be the end of the window we just finished searching (i.e. the toDateISO of that run). If we stored something else (e.g. “now” at update time), the next run would use the wrong fromDate and we could skip a segment of data or create overlapping windows.

Change: After a successful extraction we now set:

lastExecutionTimestamp: lastSearchTimestamp || moment().utc().toISOString()

where lastSearchTimestamp is the toDateISO of the window we just completed. The next run therefore starts from the correct point and no segment is dropped.

Testing manually:

Start an ECH deployment on 9.4-SNAPSHOT
Go to Stack Management > API Keys and generate a new Cross-Cluster api key.

Save the provided credentials

Start kibana and elasticsearch local
Add the stored credentials to the local deployment running this command in your CLI:.es/9.4.0/bin/elasticsearch-keystore add cluster.remote.${REMOTE_CLUSTER_NAME}.credentials. This command will prompt you to add the credential.
Reload security settings from kibana dev tools POST /_nodes/reload_secure_settings
Go to the cloud console of your deployment, under security, at the bottom of the page copy the proxy address

7. Register a new cluster with the proxy address

PUT _cluster/settings
{
  "persistent": {
    "cluster.remote.${REMOTE_CLUSTER_NAME}.mode": "proxy",
    "cluster.remote.${REMOTE_CLUSTER_NAME}.proxy_address": "${PROXY_ADDRESS}"
  }
}

Add data to the remote cluster observer it be ingested in your environment!

Copilot

Pull request overview

This PR adds Cross Cluster Search (CCS) support to the Entity Store, enabling extraction of entity data from remote indices in addition to local ones. Remote indices are queried in parallel with local extraction, and partial entities from CCS are written to the updates data stream for merging in the next extraction run. Additionally, it fixes the extraction window persistence by storing lastSearchTimestamp as lastExecutionTimestamp.

Changes:

Split index patterns into local and remote (CCS), running extraction paths in parallel
CCS extraction writes partial entities to updates stream with timestamps in the extraction window
Fixed lastExecutionTimestamp to use lastSearchTimestamp (the toDateISO of the completed window) instead of "now"

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
factories.ts	Instantiates CcsLogsExtractionClient and passes it to LogsExtractionClient via dependencies object
upsert_bulk.ts	Updates upsertEntitiesBulk call to use options object instead of boolean parameter
request_context_factory.ts	Creates CcsLogsExtractionClient and injects it into LogsExtractionClient
logs_extraction_client.ts	Adds CCS parallel extraction logic, splits index patterns, fixes lastExecutionTimestamp to use lastSearchTimestamp
logs_extraction_client.test.ts	Updates tests for constructor changes and adds CCS extraction verification
logs_extraction_query_builder.ts	Adds buildCcsLogsExtractionEsqlQuery for CCS-only extraction without LOOKUP JOIN
logs_extraction_query_builder.test.ts	Adds tests for CCS query builder
logs_extraction_query_builder.test.ts.snap	Snapshot tests for CCS ESQL queries
crud_client/utils.ts	Adds timestampGenerator parameter for custom timestamp generation during upsert
crud_client/utils.test.ts	Tests for flat document handling with force=true
crud_client/index.ts	Changes upsertEntitiesBulk to accept options object with timestampGenerator
ccs_logs_extraction_client.ts	New client for CCS extraction that writes partial entities to updates stream
ccs_logs_extraction_client.test.ts	Comprehensive tests for CCS extraction client
asset_manager.ts	Updates getIndexPatterns call to getLocalIndexPatterns

chennn1990 · 2026-02-26T11:42:40Z

+        abortController: opts?.abortController,
+      });
+
+      const [mainResult, ccsResult] = await Promise.all([mainPromise, ccsPromise]);


One concern that comes to mind:
We’re currently using the same timeframe window for both remoteExtraction and localExtraction.
Consider the following scenario:
The local loop finishes with a last log timestamp of X + 1
The remote loop finishes with a last log timestamp of X + 2

Now we have two possible approaches for the next iteration:

Option 1:
Use X + 1 as the starting point for the next run.
→ This would likely result in duplicate document collection on the remote cluster.
Option 2:
Use X + 2 as the starting point for the next run.
→ This would likely result in missing logs on the local cluster.

What am I missing here?

We explored this problem on the tech beat yesterday. It's a known issue and currently the CCS extraction is best effort, having always the main extraction as the leading window setting.

We decided to explore a better resilience to this on a next moment. But this allows us already to test with CCS env.

chennn1990

minor one

…erface UpsertEntitiesBulkParams

…ion and move to generic entity definition

uri-weisman

🚀

elasticmachine · 2026-02-26T16:22:06Z

💔 Build Failed

Buildkite Build
Commit: 16ad8e5

Failed CI Steps

Post-Build

Metrics [docs]

‼️ ERROR: no builds found for mergeBase sha [ec1c4e5]

History

💔 Build #401474 failed a2120a1
💛 Build #400684 was flaky 15ae915
💔 Build #400498 failed ac806f7
💔 Build #400291 failed 918412b
💔 Build #400234 failed 4a7594e

cc @romulets

kubasobon

LGTM 🚀 Great way to iterate fast

## Summary Adds support for extracting entity data from **remote (CCS) indices** in addition to local indices. Remote indices are queried in parallel with local extraction; partial entities from CCS are written to the **updates** data stream so the **next run** merges them into the latest index. Also fixes persistence of the extraction window by storing `lastSearchTimestamp` as `lastExecutionTimestamp`. **Obs: only works with all clusters being on >9.4.0** ## How the CCS solution works Main extraction uses ESQL with a **LOOKUP JOIN** against the latest index, which **does not support cross-cluster indices**. So we split index patterns into **local** and **remote**, run two paths in parallel, and let the existing “updates → next run” flow merge remote data. ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ Single extraction run │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ Index patterns (data view) │ │ │ │ │ ▼ │ │ ┌──────────────┐ │ │ │ Split by CCS │──────┬──────────────────────────────────────────────┐ │ │ └──────────────┘ │ │ │ │ ▼ ▼ │ │ ┌──────────────────┐ ┌───────────────────┐ │ │ │ Local patterns │ │ Remote patterns │ │ │ │ (e.g. logs-*) │ │ (e.g. remote:logs)│ │ │ └────────┬─────────┘ └────────┬──────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Main extraction │ │ CCS extraction │ │ │ │ ESQL + LOOKUP │ │ ESQL (no LOOKUP) │ │ │ │ → latest index │ │ → updates stream │ │ │ └──────────────────┘ └──────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Next extraction run │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ Main extraction reads: [ updates stream, local patterns, ... ] │ │ → Picks up partial entities written by CCS in the previous run │ │ → LOOKUP JOIN + merge → latest index │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` 1. **Step 1 (current run)** - CCS extraction reads only **remote** index patterns. - Runs ESQL aggregation (same logic as main, but no LOOKUP). - Writes results as partial entities to the **updates** data stream (with `@timestamp` in the extraction window, e.g. `toDateISO` + small random offset to reduce collisions). 2. **Step 2 (next run)** - Main extraction uses index patterns that **include the updates data stream** (and local indices). - Documents written by CCS in step 1 are now in that stream. - Main extraction runs ESQL + LOOKUP JOIN against the latest index and merges everything (local + updates, including those CCS partials) into latest. No change is required to the main extraction logic; it already reads from updates. We only add a parallel path that **feeds** updates from remote indices so the next run sees them. --- ## Fix: `lastExecutionTimestamp` from `lastSearchTimestamp` The **next** extraction window is computed as: - `fromDate` = `lastExecutionTimestamp` (minus delay), or lookback if not set - `toDate` = now − delay So **`lastExecutionTimestamp` must be the end of the window we just finished searching** (i.e. the `toDateISO` of that run). If we stored something else (e.g. “now” at update time), the next run would use the wrong `fromDate` and we could **skip** a segment of data or create overlapping windows. **Change:** After a successful extraction we now set: ```ts lastExecutionTimestamp: lastSearchTimestamp || moment().utc().toISOString() ``` where `lastSearchTimestamp` is the **toDateISO** of the window we just completed. The next run therefore starts from the correct point and no segment is dropped. ## Testing manually: 1. Start an ECH deployment on 9.4-SNAPSHOT 2. Go to Stack Management > API Keys and generate a new `Cross-Cluster` api key. - Save the provided credentials 3. Start kibana and elasticsearch local 4. Add the stored credentials to the local deployment running this command in your CLI:`.es/9.4.0/bin/elasticsearch-keystore add cluster.remote.${REMOTE_CLUSTER_NAME}.credentials`. This command will prompt you to add the credential. 5. Reload security settings from kibana dev tools `POST /_nodes/reload_secure_settings` 6. Go to the cloud console of your deployment, under security, at the bottom of the page copy the proxy address <img width="1143" height="194" alt="image" src="https://github.com/user-attachments/assets/19ce4142-c184-466a-a1e1-a91ecdbec18f" /> 7. Register a new cluster with the proxy address ``` PUT _cluster/settings { "persistent": { "cluster.remote.${REMOTE_CLUSTER_NAME}.mode": "proxy", "cluster.remote.${REMOTE_CLUSTER_NAME}.proxy_address": "${PROXY_ADDRESS}" } } ``` 8. Add data to the remote cluster observer it be ingested in your environment!

kubasobon · 2026-04-02T13:25:43Z

Verified

I created 2 Elastic Cloud projects and connected one into another using the Testing manually section in this PRs description and https://www.elastic.co/docs/deploy-manage/remote-clusters/ec-remote-cluster-same-ess.

I installed Entity Store adding the cluster_b:logs-* index pattern. I saw the entity I've manually doctored in cluster_b being ingested into Entity Store along ~30 other entities from there.

Proof

Image: Doctored entity with entity.name: kuba-testing-entity visible in .entities.v2.updates.security_default

Image: Request to Force CCS extract to updates endpoint extracts approx. 30 Entities from cluster_b

romulets requested a review from a team as a code owner February 24, 2026 16:59

romulets self-assigned this Feb 24, 2026

romulets added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting labels Feb 24, 2026

romulets requested a review from Copilot February 24, 2026 17:07

Copilot AI reviewed Feb 24, 2026

View reviewed changes

Comment thread x-pack/solutions/security/plugins/entity_store/server/domain/ccs_logs_extraction_client.ts Outdated

Comment thread x-pack/solutions/security/plugins/entity_store/server/domain/logs_extraction_client.ts Outdated

romulets linked an issue Feb 24, 2026 that may be closed by this pull request

[Entity Store] Implement workaround for CCS #254651

Closed

romulets force-pushed the entity-store/support-ccs branch from 918412b to ac806f7 Compare February 25, 2026 08:21

romulets commented Feb 25, 2026

View reviewed changes

Comment thread x-pack/solutions/security/plugins/entity_store/server/domain/ccs_logs_extraction_client.ts Outdated

chennn1990 reviewed Feb 26, 2026

View reviewed changes

romulets added 7 commits February 26, 2026 13:54

[Entity Store] Add support to CCS

be88691

[Enitty Store] remove double wait on promise

e730bb8

[Entity Store] Add fault tolerance to ccs errors

d7ee20b

[Entity Store] Add scout tests to test query logic

673d73b

[Entity Store] change the timestamp generator to be less heavy

6944da7

[Entity Store] simplify double if logic

b19b228

[Entity Store] Single loop on withoutAlerts

82dcb34

romulets force-pushed the entity-store/support-ccs branch from 15ae915 to 82dcb34 Compare February 26, 2026 12:59

chennn1990 approved these changes Feb 26, 2026

View reviewed changes

Comment thread x-pack/solutions/security/plugins/entity_store/server/domain/crud_client/index.ts Outdated

romulets added 2 commits February 26, 2026 14:21

[Entity Store] Follow constructor pattern on logs_extraction_client

6ca78a4

[Entity Store] Change from interface UpsertEntitiesBulkOptions to int…

a2120a1

…erface UpsertEntitiesBulkParams

romulets enabled auto-merge (squash) February 26, 2026 13:31

uri-weisman reviewed Feb 26, 2026

View reviewed changes

Comment thread ...security/plugins/entity_store/server/domain/logs_extraction/logs_extraction_query_builder.ts Outdated

romulets mentioned this pull request Feb 26, 2026

[Entity Store] Improve CCS Extraction Resilience #255114

Open

romulets disabled auto-merge February 26, 2026 14:20

[Entity Store] remove the hard coded entity.id from every CCS extract…

f06c458

…ion and move to generic entity definition

uri-weisman approved these changes Feb 26, 2026

View reviewed changes

romulets enabled auto-merge (squash) February 26, 2026 14:31

Merge branch 'main' into entity-store/support-ccs

16ad8e5

kubasobon approved these changes Feb 26, 2026

View reviewed changes

Merge branch 'main' into entity-store/support-ccs

9299ac3

romulets merged commit 2353b01 into elastic:main Feb 27, 2026
17 checks passed

kibanamachine added the v9.4.0 label Feb 27, 2026

kubasobon mentioned this pull request Mar 13, 2026

Failing test: Entity Store CRUD API tests - Should perform a bulk upsert #256319

Closed

This was referenced Apr 2, 2026

[Entity Store] Filter out CCS indices #253644

Merged

[Entity Store] Test CCS #252567

Closed

[Entity Store] Implement workaround for CCS #254651

Closed

[Entity Store] Fix CCs #257057

Closed

Conversation

romulets commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How the CCS solution works

Fix: lastExecutionTimestamp from lastSearchTimestamp

Testing manually:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chennn1990 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

romulets Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

chennn1990 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

uri-weisman left a comment

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Feb 26, 2026

💔 Build Failed

Failed CI Steps

Metrics [docs]

History

Uh oh!

kubasobon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kubasobon commented Apr 2, 2026

Verified

Proof

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

romulets commented Feb 24, 2026 •

edited

Loading

Fix: `lastExecutionTimestamp` from `lastSearchTimestamp`