Skip to content

[receiver/elasticapmintake]Group events to avoid duplicate resource and scope spans#1214

Merged
lahsivjar merged 11 commits into
elastic:mainfrom
lahsivjar:optimize-resource-spans
May 26, 2026
Merged

[receiver/elasticapmintake]Group events to avoid duplicate resource and scope spans#1214
lahsivjar merged 11 commits into
elastic:mainfrom
lahsivjar:optimize-resource-spans

Conversation

@lahsivjar

Copy link
Copy Markdown
Contributor

Summary

Group trace and log events that share a resource attribute set into a single
ResourceSpans / ResourceLogs per processBatch call. The grouping key is
an xxhash fingerprint of the event fields that affect the resource map.
Resource-attribute writes and the fingerprint hash are both implemented as
visitors over a single walker (mappers.WalkResourceAttributes) — adding a
new resource field is a one-place edit that both paths pick up
automatically. Metric events are not collapsed (would risk duplicate metric
names within a ScopeMetrics); a follow-up will handle them.

Motivation

Profiling the intake hot path showed every event allocating its own
ResourceSpans/ResourceLogs plus a fresh resource attribute map, even
when consecutive events came from the same agent metadata.
pcommon.Map.PutStr boxing on identical-resource fan-out dominated
per-event allocations.

The walker pattern was added to make the change safe to maintain: keeping a
fingerprint and a resource-attribute writer manually in sync is exactly the
kind of two-list-drift bug that produces silent data loss (events with
different values for an unhashed field get merged, the second event's value
is dropped via Map.PutStr's update-on-existing semantics). One walker
makes the field set the single source of truth.

Benchmark

benchstat of BenchmarkProcessBatch + BenchmarkHandleStream* (the
new direct-path bench suite this PR adds) on origin/main vs this branch.
10 runs each, -benchtime=2s, Apple M4 Pro, Go 1.25.

Allocations per op (geomean −12.20%)

                                        │  main         │  branch                     │
ProcessBatch/global_labels_no_shadow              913.0      793.0   -13.14% (p=0.000)
ProcessBatch/global_labels_with_shadow            975.0      890.0    -8.72% (p=0.000)
HandleStream/transactions                        1.437k     1.402k    -2.44% (p=0.000)
HandleStream/spans                               2.130k     1.926k    -9.58% (p=0.000)
HandleStream/transactions_spans                  1.570k     1.381k   -12.04% (p=0.000)
HandleStream/errors                              1.470k     1.364k    -7.21% (p=0.000)
HandleStream/logs                                1.359k     1.202k   -11.55% (p=0.000)
HandleStream/metricsets                           604.0      609.0    +0.83% (p=0.000)
HandleStream/histograms                           197.0      201.0    +2.03% (p=0.000)
HandleStream/metric_global_label_shadow           323.0      328.0    +1.55% (p=0.000)
HandleStreamGlobalLabels/no_shadow                731.0      610.0   -16.55% (p=0.000)
HandleStreamGlobalLabels/with_shadow              792.0      707.0   -10.73% (p=0.000)
HandleStreamSize/transactions/10                  873.0      671.0   -23.14% (p=0.000)
HandleStreamSize/transactions/100                8.227k     6.207k   -24.55% (p=0.000)
HandleStreamSize/transactions/1000               81.77k     61.57k   -24.71% (p=0.000)
HandleStreamMixed/mixed/50                       3.308k     2.648k   -19.95% (p=0.000)
HandleStreamMixed/mixed/500                      40.72k     32.46k   -20.27% (p=0.000)
geomean                                          1.761k     1.546k   -12.20%

Bytes per op (geomean −3.24%)

                                        │  main         │  branch                     │
ProcessBatch/global_labels_no_shadow             1.098Mi    1.092Mi   -0.57% (p=0.000)
ProcessBatch/global_labels_with_shadow           1.102Mi    1.097Mi   -0.46% (p=0.000)
HandleStream/transactions                        1.102Mi    1.096Mi   -0.53% (p=0.000)
HandleStream/spans                               1.173Mi    1.148Mi   -2.15% (p=0.000)
HandleStream/transactions_spans                  1.115Mi    1.099Mi   -1.46% (p=0.000)
HandleStream/errors                              1.105Mi    1.092Mi   -1.15% (p=0.000)
HandleStream/logs                                1.105Mi    1.086Mi   -1.68% (p=0.000)
HandleStream/metricsets                          1.046Mi    1.047Mi   +0.01% (p=0.000)
HandleStream/histograms                          1.016Mi    1.016Mi   +0.01% (p=0.000)
HandleStream/metric_global_label_shadow          1.027Mi    1.027Mi   +0.03% (p=0.000)
HandleStreamGlobalLabels/no_shadow               1.072Mi    1.064Mi   -0.75% (p=0.000)
HandleStreamGlobalLabels/with_shadow             1.075Mi    1.069Mi   -0.55% (p=0.000)
HandleStreamSize/transactions/10                 1.085Mi    1.069Mi   -1.40% (p=0.000)
HandleStreamSize/transactions/100                1.805Mi    1.653Mi   -8.42% (p=0.000)
HandleStreamSize/transactions/1000               9.005Mi    7.488Mi  -16.85% (p=0.000)
HandleStreamMixed/mixed/50                       1.306Mi    1.256Mi   -3.85% (p=0.000)
HandleStreamMixed/mixed/500                      4.775Mi    4.145Mi  -13.19% (p=0.000)
geomean                                          1.397Mi    1.352Mi   -3.24%

Time per op (geomean −0.92%)

                                        │  main        │  branch                      │
ProcessBatch/global_labels_no_shadow             197.7µ     203.3µ        ~  (p=0.280)
ProcessBatch/global_labels_with_shadow           210.2µ     217.6µ    +3.52% (p=0.000)
HandleStream/transactions                        137.5µ     142.6µ    +3.65% (p=0.000)
HandleStream/spans                               149.0µ     153.7µ    +3.13% (p=0.000)
HandleStream/transactions_spans                  140.5µ     143.1µ    +1.80% (p=0.000)
HandleStream/errors                              134.0µ     135.1µ    +0.83% (p=0.007)
HandleStream/logs                                129.5µ     127.4µ    -1.62% (p=0.002)
HandleStream/metricsets                          89.18µ     93.48µ    +4.82% (p=0.000)
HandleStream/histograms                          57.22µ     56.74µ        ~  (p=0.063)
HandleStream/metric_global_label_shadow          66.83µ     68.37µ    +2.30% (p=0.000)
HandleStreamGlobalLabels/no_shadow               89.93µ     90.25µ        ~  (p=0.363)
HandleStreamGlobalLabels/with_shadow             93.80µ     96.93µ    +3.34% (p=0.002)
HandleStreamSize/transactions/10                 97.25µ     95.97µ    -1.32% (p=0.003)
HandleStreamSize/transactions/100                384.6µ     344.0µ   -10.55% (p=0.000)
HandleStreamSize/transactions/1000               3.190m     2.917m    -8.54% (p=0.000)
HandleStreamMixed/mixed/50                       190.0µ     183.3µ    -3.54% (p=0.000)
HandleStreamMixed/mixed/500                      1.645m     1.425m   -13.36% (p=0.000)
geomean                                          180.2µ     178.6µ    -0.92%

Throughput (B/s) for HandleStream* benches (geomean +1.47%)

                                        │  main         │  branch                     │
HandleStreamSize/transactions/100              58.13Mi/s   64.98Mi/s  +11.79% (p=0.000)
HandleStreamSize/transactions/1000             69.17Mi/s   75.63Mi/s   +9.33% (p=0.000)
HandleStreamMixed/mixed/50                     46.41Mi/s   48.11Mi/s   +3.67% (p=0.000)
HandleStreamMixed/mixed/500                    64.71Mi/s   74.69Mi/s  +15.42% (p=0.000)
geomean                                        34.16Mi/s   34.67Mi/s   +1.47%

Notes:

  • The largest wins concentrate on size-sweep and mixed workloads (the path
    the collapse optimises). On 1000 transactions: −24.7% allocs, −16.9%
    B/op, −8.5% ns/op, +9.3% throughput
    .
  • Metrics-only benches (metricsets, histograms,
    metric_global_label_shadow) sit within ±2% on allocs as expected — the
    metric path is intentionally not collapsed.
  • The small ns/op regressions on per-event-type benches (e.g.
    HandleStream/transactions +3.65%) come from the walker's visitor
    interface dispatch and the labels-sort pass; they're real but small, and
    the alloc reduction more than compensates on workloads where multiple
    events actually share a resource (i.e. anything beyond a 5-event
    fixture).

@lahsivjar lahsivjar requested review from a team as code owners May 8, 2026 14:53
@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 3a49b525-0f75-4452-b8a9-cb74e813232b

📥 Commits

Reviewing files that changed from the base of the PR and between edd2856 and 8ac2ab6.

📒 Files selected for processing (9)
  • receiver/elasticapmintakereceiver/go.mod
  • receiver/elasticapmintakereceiver/internal/mappers/resource_walker.go
  • receiver/elasticapmintakereceiver/resource_grouping.go
  • receiver/elasticapmintakereceiver/resource_grouping_test.go
  • receiver/elasticapmintakereceiver/testdata/spans_expected.yaml
  • receiver/elasticapmintakereceiver/testdata/spans_representative_count_expected.yaml
  • receiver/elasticapmintakereceiver/testdata/transactions_expected.yaml
  • receiver/elasticapmintakereceiver/testdata/transactions_spans_expected.yaml
  • receiver/elasticapmintakereceiver/testdata/unknown-span-type_expected.yaml
💤 Files with no reviewable changes (1)
  • receiver/elasticapmintakereceiver/testdata/spans_representative_count_expected.yaml
✅ Files skipped from review due to trivial changes (1)
  • receiver/elasticapmintakereceiver/go.mod
🚧 Files skipped from review as they are similar to previous changes (2)
  • receiver/elasticapmintakereceiver/resource_grouping.go
  • receiver/elasticapmintakereceiver/testdata/transactions_expected.yaml

📝 Walkthrough

<review_stack_artifact>

</review_stack_artifact>

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lahsivjar

Copy link
Copy Markdown
Contributor Author

[For reviewers] I have intentionally left out metrics from grouping optimizations as it is not straightforward.

@lahsivjar lahsivjar requested review from axw and vigneshshanmugam May 8, 2026 21:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces per-event allocations in the Elastic APM intake receiver by grouping trace and log events that share the same resource attributes into a single ResourceSpans / ResourceLogs per processBatch call, using an xxhash-based resource fingerprint derived from a shared resource-attribute walker.

Changes:

  • Added per-batch resource grouping (signalGroups) and a stable resource fingerprint, reusing ScopeSpans / ScopeLogs for identical resources.
  • Introduced mappers.WalkResourceAttributes as the single source of truth for both resource attribute writes and fingerprinting.
  • Added/updated benchmarks and updated golden testdata YAML outputs to reflect the new grouping behavior.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
receiver/elasticapmintakereceiver/receiver.go Uses resource fingerprint + per-batch caches to group trace/log events by resource; routes span/log conversion via cached scopes; switches resource attribute mapping to the walker visitor.
receiver/elasticapmintakereceiver/resource_grouping.go Implements signalGroups caches and the xxhash(v2) resource fingerprint visitor.
receiver/elasticapmintakereceiver/resource_grouping_test.go Adds a unit test ensuring numeric label float values don’t incorrectly merge resources.
receiver/elasticapmintakereceiver/internal/mappers/resource_walker.go Adds WalkResourceAttributes walker + ResourceAttrVisitor to drive both hashing and pcommon writes from one field list.
receiver/elasticapmintakereceiver/internal/mappers/intakeV2ToSemConv.go Removes the old resource-attribute translation function in favor of the new walker-based approach.
receiver/elasticapmintakereceiver/internal/mappers/intakeV2ToElasticSpecificFields.go Removes elastic-specific resource attribute mapping + label mapping (now handled by the walker).
receiver/elasticapmintakereceiver/internal/mappers/intakeV2ToDerivedFields.go Removes derived resource attributes for agent name/version (now handled by the walker).
receiver/elasticapmintakereceiver/receiver_bench_test.go Adds direct-path HandleStream* benchmark suite and synthetic payload generators.
receiver/elasticapmintakereceiver/go.mod Promotes github.com/cespare/xxhash/v2 to a direct dependency.
receiver/elasticapmintakereceiver/testdata/unknown-span-type_expected.yaml Updates expected output to reflect grouped resource spans and reordered resource attributes.
receiver/elasticapmintakereceiver/testdata/transactions_spans_expected.yaml Updates expected output to reflect grouped resource spans and reordered resource attributes.
receiver/elasticapmintakereceiver/testdata/transactions_expected.yaml Updates expected output to reflect grouped resource spans and reordered resource attributes.
receiver/elasticapmintakereceiver/testdata/spans_representative_count_expected.yaml Updates expected output to reflect resource grouping (removes duplicated resources).
receiver/elasticapmintakereceiver/testdata/spans_expected.yaml Updates expected output to reflect grouped resource spans and reordered resource attributes.
receiver/elasticapmintakereceiver/testdata/span-links_expected.yaml Updates expected output to reflect resource grouping (removes duplicated resources).
receiver/elasticapmintakereceiver/testdata/logs_expected.yaml Updates expected output to reflect grouped resource logs and reordered resource attributes/log records.
receiver/elasticapmintakereceiver/testdata/invalid_ids_expected.yaml Updates expected output to reflect resource grouping (removes duplicated resources).
receiver/elasticapmintakereceiver/testdata/hostdata_expected.yaml Updates expected output to reflect resource grouping/reordering of resource attributes.
receiver/elasticapmintakereceiver/testdata/errors_expected.yaml Updates expected output to reflect grouped resource logs and reordered resource attributes/log records.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +90 to +92
// Including the key in the hash makes write order irrelevant for fields
// the visitor sees as Put*(key, value) — re-ordering the walker visits
// would produce the same fingerprint for the same set of attributes.
Comment on lines +306 to +326
fmt.Sprintf("tx%014x", base+uint64(i)),
fmt.Sprintf("tx%014xtx%014x", base+uint64(i), base+uint64(i)),
1_000_000+(base+uint64(i))*1_000,
)
}
for i := range 8 {
fmt.Fprintf(&buf,
`{"span": {"id": %q, "trace_id": %q, "transaction_id": %q, "parent_id": %q, "name": "SELECT *", "type": "db.postgresql.query", "start": 1, "duration": 2, "timestamp": %d}}`+"\n",
fmt.Sprintf("sp%014x", base+uint64(i)),
fmt.Sprintf("tx%014xtx%014x", base+uint64(i), base+uint64(i)),
fmt.Sprintf("tx%014x", base+uint64(i)),
fmt.Sprintf("tx%014x", base+uint64(i)),
1_000_000+(base+uint64(i))*1_000+1,
)
}
fmt.Fprintf(&buf,
`{"error": {"id": %q, "trace_id": %q, "transaction_id": %q, "parent_id": %q, "timestamp": %d, "log": {"message": "boom"}}}`+"\n",
fmt.Sprintf("er%014x", base),
fmt.Sprintf("tx%014xtx%014x", base, base),
fmt.Sprintf("tx%014x", base),
fmt.Sprintf("tx%014x", base),

@carsonip carsonip left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks, the approach is sound. A risk of hash collision as discussed during private sync but risk is low.

if k == "" || nv == nil {
continue
}
v.PutDouble("numeric_labels."+k, nv.Value)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: not related to this PR but is it true that numeric labels always only use .Value, not .Values? Asking because there is .Values handling in apm-data for numeric labels.

If this turns out to be a bug I'm happy to defer it in a different PR to keep this PR clean.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intake v2 will always produce only .Value, input/elasticapm/internal/modeldecoder always decodes into .Value. I think the .Values is used only for OTel via APM which this receiver doesn't need to deal with.

@lahsivjar lahsivjar merged commit 6937934 into elastic:main May 26, 2026
19 checks passed
@lahsivjar lahsivjar deleted the optimize-resource-spans branch May 26, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants