[New Filebeat Input] Akamai#48846
Conversation
Enforce a single EdgeGrid auth path and align Akamai polling behavior around offset expiry, invalid timestamp retries, and page-level cursor checkpointing with stronger runtime metrics/logging.
… tests Reorganize tests by concern and add a table-driven mock-server harness that validates pagination, recovery, error handling, and partial publish behavior end-to-end.
Isolate workspace ignore updates from Akamai input functional changes to keep review history clean.
🤖 GitHub commentsJust comment with:
|
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
Add full input documentation following existing x-pack filebeat conventions (config options, recovery behavior, metrics table, tracer rotation, common options). Link the new page from the input type index. Mark the input as beta in code, lower default recovery_interval to 1h, and remove the superseded doc.go. Co-authored-by: Cursor <cursoragent@cursor.com>
Vale Linting ResultsSummary: 1 warning, 9 suggestions found
|
| File | Line | Rule | Message |
|---|---|---|---|
| docs/reference/filebeat/filebeat-input-akamai.md | 26 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'using' instead of 'via'. |
💡 Suggestions (9)
| File | Line | Rule | Message |
|---|---|---|---|
| docs/reference/filebeat/filebeat-input-akamai.md | 116 | Elastic.Semicolons | Use semicolons judiciously. |
| docs/reference/filebeat/filebeat-input-akamai.md | 135 | Elastic.WordChoice | Consider using 'deactivate, deselect, hide, turn off' instead of 'disable', unless the term is in the UI. |
| docs/reference/filebeat/filebeat-input-akamai.md | 145 | Elastic.WordChoice | Consider using 'deactivate, deselect, hide, turn off' instead of 'disable', unless the term is in the UI. |
| docs/reference/filebeat/filebeat-input-akamai.md | 150 | Elastic.WordChoice | Consider using 'deactivate, deselect, hide, turn off' instead of 'disable', unless the term is in the UI. |
| docs/reference/filebeat/filebeat-input-akamai.md | 299 | Elastic.WordChoice | Consider using 'deactivate, deselect, hide, turn off' instead of 'disable', unless the term is in the UI. |
| docs/reference/filebeat/filebeat-input-akamai.md | 304 | Elastic.WordChoice | Consider using 'efficient' instead of 'easy', unless the term is in the UI. |
| docs/reference/filebeat/filebeat-input-akamai.md | 363 | Elastic.Semicolons | Use semicolons judiciously. |
| docs/reference/filebeat/filebeat-input-akamai.md | 368 | Elastic.WordChoice | Consider using 'deactivate, deselect, hide, turn off' instead of 'disable', unless the term is in the UI. |
| docs/reference/filebeat/filebeat-input-akamai.md | 370 | Elastic.WordChoice | Consider using 'deactivate, deselect, hide, turn off' instead of 'disable', unless the term is in the UI. |
The Vale linter checks documentation changes against the Elastic Docs style guide.
To use Vale locally or report issues, refer to Elastic style guide for Vale.
🔍 Preview links for changed docs |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
x-pack/filebeat/input/akamai/input.go (2)
254-257:⚠️ Potential issue | 🔴 CriticalZero-event pages never mark chain as drained.
When
eventCount == 0, the loop breaks without settingp.cursor.CaughtUp = true. The next poll interval will replay the samechain_from/chain_towindow indefinitely since the chain never advances.Proposed fix
if eventCount == 0 { p.log.Debug("no events received, poll cycle complete") + p.cursor.CaughtUp = true break }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/filebeat/input/akamai/input.go` around lines 254 - 257, When eventCount == 0 inside the poll loop, set the cursor's caught-up marker before breaking so the chain is marked drained; specifically, in the block checking eventCount (the branch in which you currently call p.log.Debug("no events received, poll cycle complete") and break), add p.cursor.CaughtUp = true (or call the appropriate method on p.cursor that marks it caught up) so that p.cursor.CaughtUp is true prior to break and the chain_from/chain_to window will advance on the next poll. Ensure you reference p.cursor.CaughtUp (or the cursor's mark-as-drained method) in the same scope as the eventCount check.
572-586:⚠️ Potential issue | 🔴 CriticalPersisted cursor omits
CaughtUpflag.
fullCursoris built without theCaughtUpfield. After restart, the loaded cursor always hasCaughtUp=false, causing a drained chain to be treated as in-progress. This forces unnecessary chain replays on every restart.Proposed fix
if totalPublished > 0 { fullCursor := cursor{ ChainFrom: p.cursor.ChainFrom, ChainTo: p.cursor.ChainTo, + CaughtUp: eventCount < p.cfg.EventLimit, LastOffset: pageCtx.Offset, OffsetObtainedAt: time.Now(), }Note:
eventCountneeds to be passed to or computed in this scope, or usep.cursor.CaughtUpwhich is set at line 275.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/filebeat/input/akamai/input.go` around lines 572 - 586, The persisted cursor (`fullCursor`) is missing the CaughtUp field so restarts load CaughtUp=false and replay drained chains; update the construction of fullCursor inside the totalPublished > 0 block to include the CaughtUp value (either pass/compute eventCount here and set CaughtUp = eventCount == 0, or copy p.cursor.CaughtUp) before calling p.cursorStore.Save; make this change where fullCursor is created and ensure the value persists through the p.acks.Add callback so Save stores the correct CaughtUp state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@x-pack/filebeat/input/akamai/config.go`:
- Around line 231-297: After unpacking the YAML into a config in
InputManager.Create(), call config.Validate() and return any validation error
before proceeding; specifically modify InputManager.Create() to invoke the
config.Validate() method (the Validate function on type config) immediately
after unmarshalling/decoding and before using the config or registering the
input so invalid configs cannot reach Run(); ensure the Create() function
propagates the returned error to the caller instead of continuing on success.
---
Duplicate comments:
In `@x-pack/filebeat/input/akamai/input.go`:
- Around line 254-257: When eventCount == 0 inside the poll loop, set the
cursor's caught-up marker before breaking so the chain is marked drained;
specifically, in the block checking eventCount (the branch in which you
currently call p.log.Debug("no events received, poll cycle complete") and
break), add p.cursor.CaughtUp = true (or call the appropriate method on p.cursor
that marks it caught up) so that p.cursor.CaughtUp is true prior to break and
the chain_from/chain_to window will advance on the next poll. Ensure you
reference p.cursor.CaughtUp (or the cursor's mark-as-drained method) in the same
scope as the eventCount check.
- Around line 572-586: The persisted cursor (`fullCursor`) is missing the
CaughtUp field so restarts load CaughtUp=false and replay drained chains; update
the construction of fullCursor inside the totalPublished > 0 block to include
the CaughtUp value (either pass/compute eventCount here and set CaughtUp =
eventCount == 0, or copy p.cursor.CaughtUp) before calling p.cursorStore.Save;
make this change where fullCursor is created and ensure the value persists
through the p.acks.Add callback so Save stores the correct CaughtUp state.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d0efe69d-2b68-4dd4-8b2a-c016ee1222df
📒 Files selected for processing (4)
docs/reference/filebeat/filebeat-input-akamai.mdx-pack/filebeat/input/akamai/config.gox-pack/filebeat/input/akamai/config_test.gox-pack/filebeat/input/akamai/input.go
🚧 Files skipped from review as they are similar to previous changes (1)
- x-pack/filebeat/input/akamai/config_test.go
|
@andrewkroh, @efd6, @belimawr All comments have been resolved and AI review feedback also addressed. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@x-pack/filebeat/input/akamai/input.go`:
- Around line 254-257: When eventCount == 0 the code sets p.cursor.CaughtUp
in-memory but never persists it because totalPublished == 0 prevents the normal
ACK path; update the branch that sets p.cursor.CaughtUp to also call the
cursor-persistence routine so the caught-up state is written out (i.e., after
p.cursor.CaughtUp = true invoke your existing cursor save/persist method—e.g.,
p.saveCursor/p.persistCursor/p.storeCursor—or the function that writes p.cursor
to disk/state). Apply the same change to the other code path that updates the
ChainTo/CaughtUp state so the drained/caught-up flag is persisted even when
totalPublished is zero.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dc512d0a-0da0-47b0-8df7-b9af091698ec
📒 Files selected for processing (1)
x-pack/filebeat/input/akamai/input.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@x-pack/filebeat/input/akamai/input.go`:
- Line 78: The deferred cs.Close() can run while asynchronous ACK callbacks
(spawned via go result.ackCallback() in x-pack/filebeat/input/akamai/acks.go)
are still executing, risking failed cursor writes; modify the ACK handling so
shutdown waits for all callbacks before closing the cursorStore: either run
callbacks synchronously instead of launching goroutines in the ACK handler or
add a synchronization barrier (e.g., a WaitGroup stored on the ACK manager) that
increments when scheduling result.ackCallback() and is awaited during shutdown
before calling cs.Close(), ensuring all callbacks complete before
cursorStore.Close().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2bd8c219-348b-4d42-8a70-1d7e4fc49fd6
📒 Files selected for processing (2)
x-pack/filebeat/input/akamai/input.gox-pack/filebeat/input/akamai/input_test.go
belimawr
left a comment
There was a problem hiding this comment.
I looked at a few files, mostly input.go, store.go and acks.go, focusing on how the input interacts with the pipeline and the store.
Overall LGTM.
I was looking at the test coverage and akamaiInput.Run is not tested at all. That concerns me because this is the part of the code that brings it all together: the pipeline, the store, the client (reading from Akamai), etc.
For other input V2 we usually mock the pipeline and the store for testing and also have some integration tests that run the whole Filebeat.
To test the inputs we have some mocks for the pipeline, store and a bit of extra code to start stop the input, you can look at https://github.com/elastic/beats/blob/main/filebeat/input/journald/environment_test.go as an example.
Another option would be an integration test that runs Filebeat and uses a mock for the Akamai API (I'm not sure how easy it would be to mock it). There are plenty of examples at https://github.com/elastic/beats/tree/main/filebeat/tests/integration, one simple one is
beats/filebeat/tests/integration/filestream_test.go
Lines 83 to 129 in e23ae25
@belimawr Added 'AkamaiRun' tests with store and other required pipeline mocks. |
|
@efd6, @andrewkroh, @belimawr, @rdner, @theletterf - We are pausing this PR atm and looking into making this into a native otel receiver. |
|
Instead of adding an Akamai SIEM input into Filebeat, this is becoming an OpenTelemetry receiver that will be integrated into EDOT. |
Type of change
Proposed commit message
WHAT
x-pack/filebeat/input/akamaiwith EdgeGrid HMAC-SHA256 signing built into the client.v2.Inputdirectly (like netflow/awss3) instead ofinputcursor.Input. This gives direct access tobeat.PipelineConnectorfor batch publishing and custom ACK handling. No dependency on the inputcursor framework — cursor persistence, pipeline client creation, and ACK tracking are all managed within the input.PublishAll: Workers accumulate events into configurable batches (batch_size, default 1000) and publish viaclient.PublishAll(), reducing pipeline lock acquisitions from one-per-event to one-per-batch. Withevent_limit=60000andbatch_size=1000, that's 60 lock acquisitions per page instead of 60,000.chain_from,chain_to,last_offset,offset_obtained_at) to the statestore. This ensures zero partial cursor writes and exactly one store write per page.stream_buffer_size, defaultworkers * 4) to worker goroutines — no full-page buffering. A one-line delay pattern cleanly separates the offset context from events. Raw JSON is passed through as-is; unmarshalling is left to ingest pipelines.chain_from/chain_to) and drains it via offset pagination. Chains overlap by 10s to prevent boundary gaps. Offsets are proactively dropped when they exceed a configurable TTL, avoiding unnecessary 416 round-trips.416/ stale offset → replay the current chain window400 invalid timestamp→ refresh HMAC and retry (configurable attempts), then fall back to chain replayfromtoo old → clamp to 12h lookback and continue400s → non-recoverable for the cyclemax_recovery_attemptscap (default 3) prevents infinite loopsgolang.org/x/time/rate, comprehensive metrics (counters, histograms, worker utilization), and structured error logging with full cursor context.WHY
The CEL-based Akamai integration was hitting reliability issues in production — data latency and gaps caused by the SIEM v1 API's short-lived offset tokens (~2min TTL) and tight HMAC timestamp windows. A dedicated input gives us direct control over pagination, recovery, and cursor management.
We chose
v2.Inputoverinputcursor.Inputfor three reasons:PublishAllaccess — inputcursor'sPublisheronly exposes per-eventPublish(event, cursor), which grabs the pipeline lock on every call. Withv2.Inputwe get thebeat.Clientdirectly and can batch events throughPublishAll, taking the lock once per batch instead of once per event.updateOp, which means cursor writes happen on every ACK batch. With our approach, the cursor is written exactly once per page, atomically, only after all events are confirmed delivered by the output. No partial state ever hits the store.Benchmark Results
All benchmarks ran against the same Akamai SIEM proxy (
proteus-akamai-94a79776.sit.estc.dev) withevent_limit=60000, file output, and manual Ctrl+C termination. Each page returns exactly 60,000 events.Full Comparison (7 runs, sorted by sustained throughput)
What we observed
The v2.Input + PublishAll architecture is a clear improvement over inputcursor. Every v2 run outperformed or matched the inputcursor baseline. The best v2 configuration hit ~11,000 evt/s vs ~7,200 for the baseline — roughly a 49% gain. This comes from two things: batch publishing reduces pipeline lock acquisitions dramatically (60 per page vs 60,000), and the decoupled cursor model eliminates per-event
updateOpoverhead entirely.Cursor persistence is much cleaner. The inputcursor approach wrote cursor state ~60 times per page (once per ACK batch). The v2 approach writes exactly once per page, atomically, after all events are confirmed delivered. No partial cursor state is ever persisted, which simplifies recovery semantics significantly.
At
event_limit=60000, single-worker configurations performed best. The 1-worker runs consistently topped the throughput chart and had the most stable page durations (93-95% of pages under 5s, zero pages over 10s). Multi-worker runs showed lower throughput and more variability. This makes sense at 60k — the per-event work is minimal (string copy + beat.Event construction), so the overhead of goroutine scheduling and concurrent queue access outweighs any parallelism benefit.That said, worker scaling matters at higher event limits. The API supports
event_limitup to 600,000. At 600k events per page, a single worker would need to stream and publish 10x more events sequentially, and the output could become the bottleneck — especially with Elasticsearch or Kafka where eachPublishAllblocks on network I/O. We expect multi-worker configurations to become valuable at higher event limits and with latency-sensitive outputs. This needs further testing.batch_sizebetween 1000-2000 is the sweet spot for these runs. Going beyond 2000 didn't help — the 5k batch run was actually slower due to memory overhead and the time spent accumulating events before each publish. Below 1000 (per-event publish) is where the old inputcursor approach lived, and that's clearly suboptimal.Memory scales with workers, not with events. RSS ranged from 103 MB (1 worker) to 292 MB (20 workers, 5k batch). The actual working set (
memory_alloc) was comparable across all runs (~15-21 MB). The RSS delta is mostly Go runtime address space reservation for goroutine stacks and batch buffers.Queue sizing is secondary. We saw no meaningful throughput difference between 10k and 16k queue capacities. The queue and output flush settings are fine-tuning knobs, not performance drivers.
Checklist
stresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.Disruptive User Impact
akamai) and does not change behavior for existing inputs.Author's Checklist
How to test this PR locally
go test ./x-pack/filebeat/input/akamai -count=1go test ./x-pack/filebeat/input/akamai -run TestInput -count=1go test ./x-pack/filebeat/input/akamai -run TestProcessPageBatchPublish -count=1go test ./x-pack/filebeat/input/akamai -run TestConfigValidation -count=1go test ./x-pack/filebeat/input/akamai -run TestInputMetrics -count=1Related issues
Use cases
Screenshots
Logs