Skip to content

fix: reuse serialization buffer in PayloadSenderV2 to eliminate LOH fragmentation#2770

Merged
stevejgordon merged 6 commits into
elastic:mainfrom
nimanikoo:improve-memorystream
Jun 22, 2026
Merged

fix: reuse serialization buffer in PayloadSenderV2 to eliminate LOH fragmentation#2770
stevejgordon merged 6 commits into
elastic:mainfrom
nimanikoo:improve-memorystream

Conversation

@nimanikoo

Copy link
Copy Markdown
Contributor

We've been seeing steady process memory growth in long-running deployments (visible after ~2 weeks uptime).
The root cause: ProcessQueueItems was allocating a new MemoryStream(1024) on every flush batch.
A MemoryStream doubles its internal byte array as it grows, and for batches with many spans
the final buffer crosses the 85 KB Large Object Heap threshold. LOH is not compacted during
ordinary GC cycles, so each flush permanently fragments heap and over millions of batches
the process RSS just keeps climbing.

The fix is simple: allocate one MemoryStream as a field, and reset it with SetLength(0)
before each batch instead of creating a new one. SetLength(0) resets the logical length and
write cursor without releasing the underlying buffer, so the same allocation is reused forever.
This is safe because ProcessQueueItems is always called from a single dedicated background
thread (ElasticApmPayloadSenderV2), so there's no concurrency concern.

One subtlety caught during testing: StreamContent disposes its underlying stream when the
using block exits. The fix wraps the reusable buffer in a lightweight, non-owning
MemoryStream view before handing it to StreamContent, so the shared buffer is never
closed between batches.

Benchmark results (Apple M4, .NET 8, BenchmarkDotNet):

Batch size Allocated (old) Allocated (new) Gen1 GC (old) Gen1 GC (new)
5 spans 28.4 KB 21.2 KB 0.11 0.05
20 spans 76.0 KB 60.8 KB 0.52 0.15
50 spans 171.2 KB 140.0 KB 1.71 0.00

At production-scale batch sizes, Gen1 (and with it, LOH-adjacent promotion pressure) drops
to zero. In a high-throughput service flushing thousands of batches per minute, this is the
difference between stable and ever-growing resident memory.

Testing: added SequentialBatches_SerializationBufferIsIsolated to guard that buffer reuse
never bleeds content from one batch into the next.

@cla-checker-service

cla-checker-service Bot commented Jun 18, 2026

Copy link
Copy Markdown

💚 CLA has been signed

@github-actions

Copy link
Copy Markdown

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@github-actions

Copy link
Copy Markdown

👋 @nimanikoo Thanks a lot for your contribution!

It may take some time before we review a PR, so even if you don’t see activity for some time, it does not mean that we have forgotten about it.

Every once in a while we go through a process of prioritization, after which we are focussing on the tasks that were planned for the upcoming milestone. The prioritization status is typically reflected through the PR labels. It could be pending triage, a candidate for a future milestone, or have a target milestone set to it.

@stevejgordon stevejgordon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nimanikoo for identifying and fixing this. The reasoning is sound and the fix looks appropriate. Code-wise, I'm happy to take this as-is. My slight concern is keeping the large buffer if it grows which may mean one large payload causes a large retained allocation than we generally need. But that concern is likely small and I'll review if we want to make any followup changes to resize.

A couple of minor nits on the styling of the XML docs comments and this is good to merge once CI checks pass.

EDIT: Looks like you may need to run dotnet format over this to fix some of the whitespace etc.

Comment thread test/Elastic.Apm.Tests/BackendCommTests/PayloadSenderTests.cs Outdated
Comment thread benchmarks/Elastic.Apm.Benchmarks/PayloadSenderSerializationBenchmarks.cs Outdated
@nimanikoo

Copy link
Copy Markdown
Contributor Author

@stevejgordon
Thanks for the thorough review and the valuable feedback!

I appreciate you taking the time to look into both the implementation and the potential memory implications. I'll address the XML doc styling comments and run dotnet format to fix the remaining formatting issues.

I'm looking forward to contributing more meaningful improvements to the project after this PR . This was my first contribution and more of a starting point, and I'm excited to continue getting involved and helping wherever I can.

Thanks again

@nimanikoo nimanikoo requested a review from stevejgordon June 20, 2026 18:02

@stevejgordon stevejgordon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @nimanikoo. Looks good, but could you please exclude the LICENSE file from the commit. It gets regenerated and often shows as a change we don't really need.

Restore src/instrumentations/Elastic.Apm.MongoDb/LICENSE to its upstream
state — it is auto-generated and should not be part of this PR.
@nimanikoo nimanikoo requested a review from stevejgordon June 22, 2026 12:54
@stevejgordon

Copy link
Copy Markdown
Contributor

run docs-build

@stevejgordon stevejgordon merged commit 39bf95d into elastic:main Jun 22, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants