Skip to content

[9.4] [Entity Store] Implement logs pagination in CCS (#266307)#266464

Merged
romulets merged 2 commits intoelastic:9.4from
romulets:backport/9.4/pr-266307
Apr 29, 2026
Merged

[9.4] [Entity Store] Implement logs pagination in CCS (#266307)#266464
romulets merged 2 commits intoelastic:9.4from
romulets:backport/9.4/pr-266307

Conversation

@romulets
Copy link
Copy Markdown
Member

Backport

This will backport the following commits from main to 9.4:

Questions ?

Please refer to the Backport tool documentation

## Summary

This PR introduces two major improvements to CCS (cross-cluster search)
logs extraction:
log-slice pagination (mirroring what local extraction already had) and
independent timestamp
management so CCS no longer relies on the caller to supply its time
window.

A third fix resolves a subtle boundary bug in the time-window filter
that caused log documents
to be silently dropped when all remaining logs share the same
millisecond timestamp.

---

## 1 — Log-slice pagination for CCS extraction

CCS extraction previously used a single-pass entity loop with no raw-log
capping.
It now uses the same two-level pagination that local extraction uses:

**Outer loop — log slices**
A boundary probe (`buildLogPaginationCursorProbeEsql`) runs before each
entity batch.
It sorts raw logs ascending by `(@timestamp, _id)`, takes the first
`maxLogsPerPage`
documents, and returns the last one as the inclusive slice end
(`sliceEnd`) plus a
`total_logs` count. When `total_logs ≤ maxLogsPerPage` the window is
exhausted and no
further probe is needed.

**Inner loop — entity pages**
Within each slice, entities are paginated by `(_firstSeenLog,
entity.id)` up to `docsLimit`
per query. The slice boundary (`sliceEnd`) is applied as a compound
inclusive upper bound on
every entity page.

**State persistence**
After each entity page, `checkpointTimestamp` and `paginationRecoveryId`
are written so a
mid-slice crash can be resumed on the next run without re-processing
already-ingested entities.
After a slice completes, `checkpointTimestamp` advances to the slice end
and `paginationRecoveryId`
is cleared.

---

## 2 — Independent timestamp management for CCS

CCS extraction no longer receives `fromDateISO`/`toDateISO` from the
caller.
It now computes and owns its own time window using a new
`CcsLogExtractionState` saved object.

**`CcsExtractToUpdatesParams` changes**

| Removed | Added |
|---|---|
| `fromDateISO` | `lookbackPeriod` — how far back to look on a fresh
start (e.g. `'3h'`) |
| `toDateISO` | `delay` — trailing-edge delay applied to `now` for
`toDateISO` (e.g. `'1m'`) |
| | `windowOverride?` — explicit `{ fromDateISO, toDateISO }` for
API-triggered runs |

**`CcsLogExtractionState` saved object (new)**

| Field | Purpose |
|---|---|
| `checkpointTimestamp` | `_firstSeenLog` of the last processed entity;
used as `fromDateISO` on the next run |
| `paginationRecoveryId` | Entity ID cursor for mid-slice crash recovery
|

**Window resolution (`resolveExtractionWindow`)**

```
windowOverride set       →  use it directly; skip all state reads/writes (isOverride = true)
paginationRecoveryId set →  effectiveFrom = checkpointTimestamp, recoveryId = paginationRecoveryId
checkpointTimestamp set  →  effectiveFrom = checkpointTimestamp  (normal continuation)
otherwise                →  effectiveFrom = now − lookbackPeriod  (fresh start)
toDateISO                =  now − delay  (always, unless override)
```

API-triggered runs (`windowOverride` set) pass `skipStateUpdates = true`
to both loops so
they never corrupt the scheduled-run checkpoint.

---

## Callers updated

- **`LogsExtractionClient`**: removes `fromDateISO`/`toDateISO` from the
CCS call; passes
  `lookbackPeriod` and `delay` from config.
- **`force_ccs_extract_to_updates` route**: keeps
`fromDateISO`/`toDateISO` in the request
  body (explicit intent) and forwards them as `windowOverride`.

  ---
  ## Testing manually:

1. Start an ECH deployment on 9.4-SNAPSHOT
2. Go to Stack Management > API Keys and generate a new `Cross-Cluster`
api key.
  - Save the provided credentials
3. Start kibana and elasticsearch local
4. Add the stored credentials to the local deployment running this
command in your CLI:`.es/9.4.0/bin/elasticsearch-keystore add
cluster.remote.${REMOTE_CLUSTER_NAME}.credentials`. This command will
prompt you to add the credential.
5. Reload security settings from kibana dev tools `POST
/_nodes/reload_secure_settings`
6. Go to the cloud console of your deployment, under security, at the
bottom of the page copy the proxy address
<img width="1143" height="194" alt="image"
src="https://github.com/user-attachments/assets/19ce4142-c184-466a-a1e1-a91ecdbec18f"
/>
7. Register a new cluster with the proxy address

```
PUT _cluster/settings
{
  "persistent": {
    "cluster.remote.${REMOTE_CLUSTER_NAME}.mode": "proxy",
    "cluster.remote.${REMOTE_CLUSTER_NAME}.proxy_address": "${PROXY_ADDRESS}"
  }
}
```

8. Add data to the remote cluster observer it be ingested in your
environment!

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit 90f1efb)

# Conflicts:
#	x-pack/solutions/security/plugins/entity_store/server/domain/asset_manager/asset_manager_client.test.ts
@romulets romulets requested a review from kibanamachine as a code owner April 29, 2026 16:19
@romulets romulets added the backport This PR is a backport of another PR label Apr 29, 2026
@romulets romulets enabled auto-merge (squash) April 29, 2026 16:19
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kibanamachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

@romulets romulets merged commit b9264aa into elastic:9.4 Apr 29, 2026
21 checks passed
@romulets romulets linked an issue Apr 30, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport This PR is a backport of another PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Entity Store] CCS timeouting too easily

3 participants