[Entity Store] Implement logs pagination in CCS#266307
[Entity Store] Implement logs pagination in CCS#266307romulets merged 11 commits intoelastic:mainfrom
Conversation
|
Catch flakiness early (recommended): run the flaky test runner against this PR before merging. A new integration test ( Trigger a run with the Flaky Test Runner UI or post this comment on the PR: Share feedback in the #appex-qa channel. Posted via Macroscope — Flaky Test Runner nudge |
bc70cb8 to
66895d4
Compare
|
/flaky scoutConfig:x-pack/solutions/security/plugins/entity_store/test/scout/api/playwright.config.ts:30 |
Flaky Test Runner✅ Build triggered - kibana-flaky-test-suite-runner#11983
|
Flaky Test Runner Stats🟠 Some tests failed. - kibana-flaky-test-suite-runner#11983[❌] x-pack/solutions/security/plugins/entity_store/test/scout/api/playwright.config.ts: 29/30 tests passed. |
florent-leborgne
left a comment
There was a problem hiding this comment.
minimal docs change - LGTM
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
cc @romulets |
|
Starting backport for target branches: 9.4 https://github.com/elastic/kibana/actions/runs/25119149957 |
💔 All backports failedManual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
) # Backport This will backport the following commits from `main` to `9.4`: - [[Entity Store] Implement logs pagination in CCS (#266307)](#266307) <!--- Backport version: 11.0.2 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Rômulo Farias","email":"romulo.farias@elastic.co"},"sourceCommit":{"committedDate":"2026-04-29T15:50:24Z","message":"[Entity Store] Implement logs pagination in CCS (#266307)\n\n## Summary\n\nThis PR introduces two major improvements to CCS (cross-cluster search)\nlogs extraction:\nlog-slice pagination (mirroring what local extraction already had) and\nindependent timestamp\nmanagement so CCS no longer relies on the caller to supply its time\nwindow.\n\nA third fix resolves a subtle boundary bug in the time-window filter\nthat caused log documents\nto be silently dropped when all remaining logs share the same\nmillisecond timestamp.\n\n---\n\n## 1 — Log-slice pagination for CCS extraction\n\nCCS extraction previously used a single-pass entity loop with no raw-log\ncapping.\nIt now uses the same two-level pagination that local extraction uses:\n\n**Outer loop — log slices**\nA boundary probe (`buildLogPaginationCursorProbeEsql`) runs before each\nentity batch.\nIt sorts raw logs ascending by `(@timestamp, _id)`, takes the first\n`maxLogsPerPage`\ndocuments, and returns the last one as the inclusive slice end\n(`sliceEnd`) plus a\n`total_logs` count. When `total_logs ≤ maxLogsPerPage` the window is\nexhausted and no\nfurther probe is needed.\n\n**Inner loop — entity pages**\nWithin each slice, entities are paginated by `(_firstSeenLog,\nentity.id)` up to `docsLimit`\nper query. The slice boundary (`sliceEnd`) is applied as a compound\ninclusive upper bound on\nevery entity page.\n\n**State persistence**\nAfter each entity page, `checkpointTimestamp` and `paginationRecoveryId`\nare written so a\nmid-slice crash can be resumed on the next run without re-processing\nalready-ingested entities.\nAfter a slice completes, `checkpointTimestamp` advances to the slice end\nand `paginationRecoveryId`\nis cleared.\n\n---\n\n## 2 — Independent timestamp management for CCS\n\nCCS extraction no longer receives `fromDateISO`/`toDateISO` from the\ncaller.\nIt now computes and owns its own time window using a new\n`CcsLogExtractionState` saved object.\n\n**`CcsExtractToUpdatesParams` changes**\n\n| Removed | Added |\n|---|---|\n| `fromDateISO` | `lookbackPeriod` — how far back to look on a fresh\nstart (e.g. `'3h'`) |\n| `toDateISO` | `delay` — trailing-edge delay applied to `now` for\n`toDateISO` (e.g. `'1m'`) |\n| | `windowOverride?` — explicit `{ fromDateISO, toDateISO }` for\nAPI-triggered runs |\n\n**`CcsLogExtractionState` saved object (new)**\n\n| Field | Purpose |\n|---|---|\n| `checkpointTimestamp` | `_firstSeenLog` of the last processed entity;\nused as `fromDateISO` on the next run |\n| `paginationRecoveryId` | Entity ID cursor for mid-slice crash recovery\n|\n\n**Window resolution (`resolveExtractionWindow`)**\n\n```\nwindowOverride set → use it directly; skip all state reads/writes (isOverride = true)\npaginationRecoveryId set → effectiveFrom = checkpointTimestamp, recoveryId = paginationRecoveryId\ncheckpointTimestamp set → effectiveFrom = checkpointTimestamp (normal continuation)\notherwise → effectiveFrom = now − lookbackPeriod (fresh start)\ntoDateISO = now − delay (always, unless override)\n```\n\nAPI-triggered runs (`windowOverride` set) pass `skipStateUpdates = true`\nto both loops so\nthey never corrupt the scheduled-run checkpoint.\n\n---\n\n## Callers updated\n\n- **`LogsExtractionClient`**: removes `fromDateISO`/`toDateISO` from the\nCCS call; passes\n `lookbackPeriod` and `delay` from config.\n- **`force_ccs_extract_to_updates` route**: keeps\n`fromDateISO`/`toDateISO` in the request\n body (explicit intent) and forwards them as `windowOverride`.\n \n ---\n ## Testing manually:\n\n1. Start an ECH deployment on 9.4-SNAPSHOT\n2. Go to Stack Management > API Keys and generate a new `Cross-Cluster`\napi key.\n - Save the provided credentials\n3. Start kibana and elasticsearch local\n4. Add the stored credentials to the local deployment running this\ncommand in your CLI:`.es/9.4.0/bin/elasticsearch-keystore add\ncluster.remote.${REMOTE_CLUSTER_NAME}.credentials`. This command will\nprompt you to add the credential.\n5. Reload security settings from kibana dev tools `POST\n/_nodes/reload_secure_settings`\n6. Go to the cloud console of your deployment, under security, at the\nbottom of the page copy the proxy address\n<img width=\"1143\" height=\"194\" alt=\"image\"\nsrc=\"https://github.com/user-attachments/assets/19ce4142-c184-466a-a1e1-a91ecdbec18f\"\n/>\n7. Register a new cluster with the proxy address\n\n```\nPUT _cluster/settings\n{\n \"persistent\": {\n \"cluster.remote.${REMOTE_CLUSTER_NAME}.mode\": \"proxy\",\n \"cluster.remote.${REMOTE_CLUSTER_NAME}.proxy_address\": \"${PROXY_ADDRESS}\"\n }\n}\n``` \n\n8. Add data to the remote cluster observer it be ingested in your\nenvironment!\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"90f1efbbf95dab6166fabf01244580b59635d791","branchLabelMapping":{"^v9.5.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","ci:build-cloud-image","ci:cloud-deploy","backport:version","v9.4.0","v9.5.0"],"title":"[Entity Store] Implement logs pagination in CCS","number":266307,"url":"https://github.com/elastic/kibana/pull/266307","mergeCommit":{"message":"[Entity Store] Implement logs pagination in CCS (#266307)\n\n## Summary\n\nThis PR introduces two major improvements to CCS (cross-cluster search)\nlogs extraction:\nlog-slice pagination (mirroring what local extraction already had) and\nindependent timestamp\nmanagement so CCS no longer relies on the caller to supply its time\nwindow.\n\nA third fix resolves a subtle boundary bug in the time-window filter\nthat caused log documents\nto be silently dropped when all remaining logs share the same\nmillisecond timestamp.\n\n---\n\n## 1 — Log-slice pagination for CCS extraction\n\nCCS extraction previously used a single-pass entity loop with no raw-log\ncapping.\nIt now uses the same two-level pagination that local extraction uses:\n\n**Outer loop — log slices**\nA boundary probe (`buildLogPaginationCursorProbeEsql`) runs before each\nentity batch.\nIt sorts raw logs ascending by `(@timestamp, _id)`, takes the first\n`maxLogsPerPage`\ndocuments, and returns the last one as the inclusive slice end\n(`sliceEnd`) plus a\n`total_logs` count. When `total_logs ≤ maxLogsPerPage` the window is\nexhausted and no\nfurther probe is needed.\n\n**Inner loop — entity pages**\nWithin each slice, entities are paginated by `(_firstSeenLog,\nentity.id)` up to `docsLimit`\nper query. The slice boundary (`sliceEnd`) is applied as a compound\ninclusive upper bound on\nevery entity page.\n\n**State persistence**\nAfter each entity page, `checkpointTimestamp` and `paginationRecoveryId`\nare written so a\nmid-slice crash can be resumed on the next run without re-processing\nalready-ingested entities.\nAfter a slice completes, `checkpointTimestamp` advances to the slice end\nand `paginationRecoveryId`\nis cleared.\n\n---\n\n## 2 — Independent timestamp management for CCS\n\nCCS extraction no longer receives `fromDateISO`/`toDateISO` from the\ncaller.\nIt now computes and owns its own time window using a new\n`CcsLogExtractionState` saved object.\n\n**`CcsExtractToUpdatesParams` changes**\n\n| Removed | Added |\n|---|---|\n| `fromDateISO` | `lookbackPeriod` — how far back to look on a fresh\nstart (e.g. `'3h'`) |\n| `toDateISO` | `delay` — trailing-edge delay applied to `now` for\n`toDateISO` (e.g. `'1m'`) |\n| | `windowOverride?` — explicit `{ fromDateISO, toDateISO }` for\nAPI-triggered runs |\n\n**`CcsLogExtractionState` saved object (new)**\n\n| Field | Purpose |\n|---|---|\n| `checkpointTimestamp` | `_firstSeenLog` of the last processed entity;\nused as `fromDateISO` on the next run |\n| `paginationRecoveryId` | Entity ID cursor for mid-slice crash recovery\n|\n\n**Window resolution (`resolveExtractionWindow`)**\n\n```\nwindowOverride set → use it directly; skip all state reads/writes (isOverride = true)\npaginationRecoveryId set → effectiveFrom = checkpointTimestamp, recoveryId = paginationRecoveryId\ncheckpointTimestamp set → effectiveFrom = checkpointTimestamp (normal continuation)\notherwise → effectiveFrom = now − lookbackPeriod (fresh start)\ntoDateISO = now − delay (always, unless override)\n```\n\nAPI-triggered runs (`windowOverride` set) pass `skipStateUpdates = true`\nto both loops so\nthey never corrupt the scheduled-run checkpoint.\n\n---\n\n## Callers updated\n\n- **`LogsExtractionClient`**: removes `fromDateISO`/`toDateISO` from the\nCCS call; passes\n `lookbackPeriod` and `delay` from config.\n- **`force_ccs_extract_to_updates` route**: keeps\n`fromDateISO`/`toDateISO` in the request\n body (explicit intent) and forwards them as `windowOverride`.\n \n ---\n ## Testing manually:\n\n1. Start an ECH deployment on 9.4-SNAPSHOT\n2. Go to Stack Management > API Keys and generate a new `Cross-Cluster`\napi key.\n - Save the provided credentials\n3. Start kibana and elasticsearch local\n4. Add the stored credentials to the local deployment running this\ncommand in your CLI:`.es/9.4.0/bin/elasticsearch-keystore add\ncluster.remote.${REMOTE_CLUSTER_NAME}.credentials`. This command will\nprompt you to add the credential.\n5. Reload security settings from kibana dev tools `POST\n/_nodes/reload_secure_settings`\n6. Go to the cloud console of your deployment, under security, at the\nbottom of the page copy the proxy address\n<img width=\"1143\" height=\"194\" alt=\"image\"\nsrc=\"https://github.com/user-attachments/assets/19ce4142-c184-466a-a1e1-a91ecdbec18f\"\n/>\n7. Register a new cluster with the proxy address\n\n```\nPUT _cluster/settings\n{\n \"persistent\": {\n \"cluster.remote.${REMOTE_CLUSTER_NAME}.mode\": \"proxy\",\n \"cluster.remote.${REMOTE_CLUSTER_NAME}.proxy_address\": \"${PROXY_ADDRESS}\"\n }\n}\n``` \n\n8. Add data to the remote cluster observer it be ingested in your\nenvironment!\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"90f1efbbf95dab6166fabf01244580b59635d791"}},"sourceBranch":"main","suggestedTargetBranches":["9.4"],"targetPullRequestStates":[{"branch":"9.4","label":"v9.4.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v9.5.0","branchLabelMappingKey":"^v9.5.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/266307","number":266307,"mergeCommit":{"message":"[Entity Store] Implement logs pagination in CCS (#266307)\n\n## Summary\n\nThis PR introduces two major improvements to CCS (cross-cluster search)\nlogs extraction:\nlog-slice pagination (mirroring what local extraction already had) and\nindependent timestamp\nmanagement so CCS no longer relies on the caller to supply its time\nwindow.\n\nA third fix resolves a subtle boundary bug in the time-window filter\nthat caused log documents\nto be silently dropped when all remaining logs share the same\nmillisecond timestamp.\n\n---\n\n## 1 — Log-slice pagination for CCS extraction\n\nCCS extraction previously used a single-pass entity loop with no raw-log\ncapping.\nIt now uses the same two-level pagination that local extraction uses:\n\n**Outer loop — log slices**\nA boundary probe (`buildLogPaginationCursorProbeEsql`) runs before each\nentity batch.\nIt sorts raw logs ascending by `(@timestamp, _id)`, takes the first\n`maxLogsPerPage`\ndocuments, and returns the last one as the inclusive slice end\n(`sliceEnd`) plus a\n`total_logs` count. When `total_logs ≤ maxLogsPerPage` the window is\nexhausted and no\nfurther probe is needed.\n\n**Inner loop — entity pages**\nWithin each slice, entities are paginated by `(_firstSeenLog,\nentity.id)` up to `docsLimit`\nper query. The slice boundary (`sliceEnd`) is applied as a compound\ninclusive upper bound on\nevery entity page.\n\n**State persistence**\nAfter each entity page, `checkpointTimestamp` and `paginationRecoveryId`\nare written so a\nmid-slice crash can be resumed on the next run without re-processing\nalready-ingested entities.\nAfter a slice completes, `checkpointTimestamp` advances to the slice end\nand `paginationRecoveryId`\nis cleared.\n\n---\n\n## 2 — Independent timestamp management for CCS\n\nCCS extraction no longer receives `fromDateISO`/`toDateISO` from the\ncaller.\nIt now computes and owns its own time window using a new\n`CcsLogExtractionState` saved object.\n\n**`CcsExtractToUpdatesParams` changes**\n\n| Removed | Added |\n|---|---|\n| `fromDateISO` | `lookbackPeriod` — how far back to look on a fresh\nstart (e.g. `'3h'`) |\n| `toDateISO` | `delay` — trailing-edge delay applied to `now` for\n`toDateISO` (e.g. `'1m'`) |\n| | `windowOverride?` — explicit `{ fromDateISO, toDateISO }` for\nAPI-triggered runs |\n\n**`CcsLogExtractionState` saved object (new)**\n\n| Field | Purpose |\n|---|---|\n| `checkpointTimestamp` | `_firstSeenLog` of the last processed entity;\nused as `fromDateISO` on the next run |\n| `paginationRecoveryId` | Entity ID cursor for mid-slice crash recovery\n|\n\n**Window resolution (`resolveExtractionWindow`)**\n\n```\nwindowOverride set → use it directly; skip all state reads/writes (isOverride = true)\npaginationRecoveryId set → effectiveFrom = checkpointTimestamp, recoveryId = paginationRecoveryId\ncheckpointTimestamp set → effectiveFrom = checkpointTimestamp (normal continuation)\notherwise → effectiveFrom = now − lookbackPeriod (fresh start)\ntoDateISO = now − delay (always, unless override)\n```\n\nAPI-triggered runs (`windowOverride` set) pass `skipStateUpdates = true`\nto both loops so\nthey never corrupt the scheduled-run checkpoint.\n\n---\n\n## Callers updated\n\n- **`LogsExtractionClient`**: removes `fromDateISO`/`toDateISO` from the\nCCS call; passes\n `lookbackPeriod` and `delay` from config.\n- **`force_ccs_extract_to_updates` route**: keeps\n`fromDateISO`/`toDateISO` in the request\n body (explicit intent) and forwards them as `windowOverride`.\n \n ---\n ## Testing manually:\n\n1. Start an ECH deployment on 9.4-SNAPSHOT\n2. Go to Stack Management > API Keys and generate a new `Cross-Cluster`\napi key.\n - Save the provided credentials\n3. Start kibana and elasticsearch local\n4. Add the stored credentials to the local deployment running this\ncommand in your CLI:`.es/9.4.0/bin/elasticsearch-keystore add\ncluster.remote.${REMOTE_CLUSTER_NAME}.credentials`. This command will\nprompt you to add the credential.\n5. Reload security settings from kibana dev tools `POST\n/_nodes/reload_secure_settings`\n6. Go to the cloud console of your deployment, under security, at the\nbottom of the page copy the proxy address\n<img width=\"1143\" height=\"194\" alt=\"image\"\nsrc=\"https://github.com/user-attachments/assets/19ce4142-c184-466a-a1e1-a91ecdbec18f\"\n/>\n7. Register a new cluster with the proxy address\n\n```\nPUT _cluster/settings\n{\n \"persistent\": {\n \"cluster.remote.${REMOTE_CLUSTER_NAME}.mode\": \"proxy\",\n \"cluster.remote.${REMOTE_CLUSTER_NAME}.proxy_address\": \"${PROXY_ADDRESS}\"\n }\n}\n``` \n\n8. Add data to the remote cluster observer it be ingested in your\nenvironment!\n\n---------\n\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"90f1efbbf95dab6166fabf01244580b59635d791"}}]}] BACKPORT--> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
This PR introduces two major improvements to CCS (cross-cluster search) logs extraction:
log-slice pagination (mirroring what local extraction already had) and independent timestamp
management so CCS no longer relies on the caller to supply its time window.
A third fix resolves a subtle boundary bug in the time-window filter that caused log documents
to be silently dropped when all remaining logs share the same millisecond timestamp.
1 — Log-slice pagination for CCS extraction
CCS extraction previously used a single-pass entity loop with no raw-log capping.
It now uses the same two-level pagination that local extraction uses:
Outer loop — log slices
A boundary probe (
buildLogPaginationCursorProbeEsql) runs before each entity batch.It sorts raw logs ascending by
(@timestamp, _id), takes the firstmaxLogsPerPagedocuments, and returns the last one as the inclusive slice end (
sliceEnd) plus atotal_logscount. Whentotal_logs ≤ maxLogsPerPagethe window is exhausted and nofurther probe is needed.
Inner loop — entity pages
Within each slice, entities are paginated by
(_firstSeenLog, entity.id)up todocsLimitper query. The slice boundary (
sliceEnd) is applied as a compound inclusive upper bound onevery entity page.
State persistence
After each entity page,
checkpointTimestampandpaginationRecoveryIdare written so amid-slice crash can be resumed on the next run without re-processing already-ingested entities.
After a slice completes,
checkpointTimestampadvances to the slice end andpaginationRecoveryIdis cleared.
2 — Independent timestamp management for CCS
CCS extraction no longer receives
fromDateISO/toDateISOfrom the caller.It now computes and owns its own time window using a new
CcsLogExtractionStatesaved object.CcsExtractToUpdatesParamschangesfromDateISOlookbackPeriod— how far back to look on a fresh start (e.g.'3h')toDateISOdelay— trailing-edge delay applied tonowfortoDateISO(e.g.'1m')windowOverride?— explicit{ fromDateISO, toDateISO }for API-triggered runsCcsLogExtractionStatesaved object (new)checkpointTimestamp_firstSeenLogof the last processed entity; used asfromDateISOon the next runpaginationRecoveryIdWindow resolution (
resolveExtractionWindow)API-triggered runs (
windowOverrideset) passskipStateUpdates = trueto both loops sothey never corrupt the scheduled-run checkpoint.
Callers updated
LogsExtractionClient: removesfromDateISO/toDateISOfrom the CCS call; passeslookbackPeriodanddelayfrom config.force_ccs_extract_to_updatesroute: keepsfromDateISO/toDateISOin the requestbody (explicit intent) and forwards them as
windowOverride.Testing manually:
Cross-Clusterapi key..es/9.4.0/bin/elasticsearch-keystore add cluster.remote.${REMOTE_CLUSTER_NAME}.credentials. This command will prompt you to add the credential.POST /_nodes/reload_secure_settings