Skip to content

[RFC] Batch processor migration#15273

Open
jmacd wants to merge 19 commits into
open-telemetry:mainfrom
jmacd:jmacd/batch_rfc_2026
Open

[RFC] Batch processor migration#15273
jmacd wants to merge 19 commits into
open-telemetry:mainfrom
jmacd:jmacd/batch_rfc_2026

Conversation

@jmacd
Copy link
Copy Markdown
Contributor

@jmacd jmacd commented May 7, 2026

Description

Documents a migration strategy for the batch processor in three phases.

Deprecates and removes batchprocessor. Creates a new queuebatchprocessor with exporterhelper-derived processor capabilities and config matching the exporterhelper.

Uses two feature flags to migrate to block_on_overflow and batch::enabled both true by default. These would switch to Beta in the same release where batchprocessor is finally removed.

Part of #8222, #13766.

@jmacd jmacd requested review from a team, bogdandrutu, codeboten, dmitryax and mx-psi as code owners May 7, 2026 18:37
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.27%. Comparing base (478a9fd) to head (7b8749e).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15273   +/-   ##
=======================================
  Coverage   91.27%   91.27%           
=======================================
  Files         709      709           
  Lines       46222    46222           
=======================================
  Hits        42190    42190           
  Misses       2817     2817           
  Partials     1215     1215           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@songy23 songy23 self-requested a review May 7, 2026 18:57
Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Copy link
Copy Markdown
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, I have a couple of clarifying questions.

Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
@mx-psi mx-psi added the rfc:approvals-needed This RFC needs approvals from collector-approvers label May 8, 2026
Copy link
Copy Markdown
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal jmacd, overall i think this is the right direction to go in

Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Copy link
Copy Markdown
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall plan looks good to me.

Comment thread docs/rfcs/batching-migration.md Outdated
Comment on lines +137 to +144
### Double-batching problem

One concern preventing the migration is that we may unintentionally
apply multiple batching processes in a single pipeline. This is a
coordination problem. If a pipeline with a default- or user-configured
batch processor has settings that are out of line with the new
exporterhelper defaults, users will pay the cost of batching and then
re-batching.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other problem is this, in a pipeline "batch -> new_exporter_with_batching_and_wait_for_result" because batch is single thread, you essentially limit the throughput to 1 request per flush interval of the exporter batch, so a significant hit in performance and throughput.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan that I am outlining would make the batch processor into a No-op at the same instant the default batching is enabled. I mean for this instantaneous switch to address your concern.

Comment thread docs/rfcs/batching-migration.md Outdated
Copy link
Copy Markdown
Contributor

@jade-guiton-dd jade-guiton-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the wait_for_result changes have little to do with the batch processor and should be done independently, especially since it has big implications on behavior, unlike double batching, which is mostly just a performance hit.

Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Comment on lines +163 to +166
At the same time, an audit is required to clean up the many various
initial conditions across exporters. We will define an Go package
`batchmigration` with an enum to capture all the common initial
settings,
Copy link
Copy Markdown
Contributor

@jade-guiton-dd jade-guiton-dd May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what exactly "cleaning up initial conditions" means, or what the motivation behind it is, could you elaborate? I'm especially not sure I understand the point of the batchmigration package.

I think it's fine, or even expected, for different exporters to have different default batching settings. If what we want is just to enable batching by default to reduce the config changes needed to remove the batch processor, and we think the worst case scenario is just excessive batching, I would suggest a simpler migration:

  • Changing the output of NewDefaultQueueConfig to enable batching by default under an Alpha feature gate
  • Auditing exporters in Contrib with code owners to determine if they have reasons to deviate from standard practice (queue + batching enabled). If yes, they can keep omitting the queue_sender field, or they can override the default config to disable batching by default; if not, we make sure they have queue_sender in their config.
  • Once we are confident that components that don't want the new upstream default have updated to override it, advance the feature gate to beta.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with this, I think it is okay that there are different configurations per exporters. For example, pull-based exporters may don't care much about batching or queueing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any exporter that does not want queue/batch behavior can simply not install the QueueBatch sender in their configuration or directly configure hard-coded queue/batch settings in their constructor. Do you agree that special cases can simply bypass the feature they do not want?

Cleaning up initial conditions are what Bogdan referred to in https://github.com/open-telemetry/opentelemetry-collector/pull/15273/changes/BASE..6beccb809ee7a80430a54165092672e50be36020#r3213954079, the users who disabled batching simply because it wasn't ready.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree that special cases can simply bypass the feature they do not want?

Yes, that makes sense

Cleaning up initial conditions are what Bogdan referred to in https://github.com/open-telemetry/opentelemetry-collector/pull/15273/changes/BASE..6beccb809ee7a80430a54165092672e50be36020#r3213954079, the users who disabled batching simply because it wasn't ready.

Alright, I think it would be easier to understand as "unify the settings for exporters that may have disabled this because of stability concerns"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered whether we should have multiple defaults, potentially a shared default for exporters that want to be (a) synchronous, (b) single concurrency, etc.

Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Comment on lines +163 to +166
At the same time, an audit is required to clean up the many various
initial conditions across exporters. We will define an Go package
`batchmigration` with an enum to capture all the common initial
settings,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with this, I think it is okay that there are different configurations per exporters. For example, pull-based exporters may don't care much about batching or queueing

Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md
Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread docs/rfcs/batching-migration.md Outdated
@mx-psi
Copy link
Copy Markdown
Member

mx-psi commented Jun 2, 2026

@jmacd would you mind marking as resolved conversations for which you have addressed feedback?

jmacd added 2 commits June 2, 2026 15:03
- Add user-visible behavior table across phases (mx-psi)
- Add concrete release versions to phase table (florianl)
- Promote feature gates Alpha -> Beta (Phase 3) -> Stable+removed (Phase 4) (mx-psi)
- Gate Phase 2 exit on early-adopter feedback; Helm chart switch is the signal (mx-psi)
- Move Helm chart switch into Phase 2 (mx-psi)
- Add mdatagen-enforced default queue_sender check with metadata.yaml opt-out (axw)
- Document wait_for_result + storage extension as invalid combination (axw)
- Add double-batching startup warning work item (dashpole, jade-guiton-dd)
- Enumerate concrete exporter use cases for non-default batching (dashpole)
- Tighten 'vendor-specific components' wording (mx-psi)
- Add Risks section enumerating migration risks (codeboten)
- Fix bug: timeline intro said wait_for_result, should be block_on_overflow

Assisted-by: Claude Opus 4.7
@jmacd
Copy link
Copy Markdown
Contributor Author

jmacd commented Jun 3, 2026

I have addressed reviewer feedback. The proposal is now much simpler: no migration helper library is used, and no de-activation of the batchprocessor, just its removal after a six-month deprecation process. I've proposed to do the work of Phase 1 in the next 1.5 months in order to land documentation and deprecation warnings in the 0.158.0 release (August) and remove the batch processor 6 releases later (October), to remove the feature flag controlling exporterhelper defaults 6 releases later (Feb 2027).

Comment thread docs/rfcs/batching-migration.md
In Phase 3 we drop batchprocessor from the standard distribution
manifest but leave the Go module in the core repo so custom builds
(e.g., via ocb) can keep using it. The double-batching startup
warning continues to protect those custom builds through Phase 3.
The module itself is removed in Phase 4 alongside gate stabilization.

Assisted-by: Claude Opus 4.7
Copy link
Copy Markdown
Contributor

@jade-guiton-dd jade-guiton-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the configoptional helper which I also think isn't quite necessary (especially as an exported function), I think the timeline and migration steps look good. However, I have concerns with setting block_on_overflow: true as the default.

Comment thread docs/rfcs/batching-migration.md Outdated
Comment thread config/configoptional/optional.go
jmacd and others added 3 commits June 3, 2026 09:16
Per review discussion in open-telemetry#15273, reverse the proposed change to make
exporterhelper block_on_overflow true by default. Only the
batch::enabled default flips in exporterhelper.

The new queuebatchprocessor adopts wait_for_result: true and
block_on_overflow: true as its defaults to preserve batchprocessor's
blocking backpressure semantics for users migrating away from it.

Drops the exporterQueueBlockOnOverflow feature gate; only one gate
(exporterQueueBatchEnabled) remains.

Assisted-by: Claude Opus 4.7
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 3, 2026

Merging this PR will improve performance by 58.61%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
✅ 5 untouched benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
BenchmarkBatchMetricProcessor2k 140.4 ms 82.9 ms +69.39%
BenchmarkMultiBatchMetricProcessor2k 139.4 ms 93.9 ms +48.51%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing jmacd:jmacd/batch_rfc_2026 (7b8749e) with main (478a9fd)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfc:approvals-needed This RFC needs approvals from collector-approvers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants