[RFC] Batch processor migration#15273
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #15273 +/- ##
=======================================
Coverage 91.27% 91.27%
=======================================
Files 709 709
Lines 46222 46222
=======================================
Hits 42190 42190
Misses 2817 2817
Partials 1215 1215 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
axw
left a comment
There was a problem hiding this comment.
Looks great overall, I have a couple of clarifying questions.
codeboten
left a comment
There was a problem hiding this comment.
Thanks for the proposal jmacd, overall i think this is the right direction to go in
bogdandrutu
left a comment
There was a problem hiding this comment.
Overall plan looks good to me.
| ### Double-batching problem | ||
|
|
||
| One concern preventing the migration is that we may unintentionally | ||
| apply multiple batching processes in a single pipeline. This is a | ||
| coordination problem. If a pipeline with a default- or user-configured | ||
| batch processor has settings that are out of line with the new | ||
| exporterhelper defaults, users will pay the cost of batching and then | ||
| re-batching. |
There was a problem hiding this comment.
the other problem is this, in a pipeline "batch -> new_exporter_with_batching_and_wait_for_result" because batch is single thread, you essentially limit the throughput to 1 request per flush interval of the exporter batch, so a significant hit in performance and throughput.
There was a problem hiding this comment.
The plan that I am outlining would make the batch processor into a No-op at the same instant the default batching is enabled. I mean for this instantaneous switch to address your concern.
| At the same time, an audit is required to clean up the many various | ||
| initial conditions across exporters. We will define an Go package | ||
| `batchmigration` with an enum to capture all the common initial | ||
| settings, |
There was a problem hiding this comment.
I'm not sure I understand what exactly "cleaning up initial conditions" means, or what the motivation behind it is, could you elaborate? I'm especially not sure I understand the point of the batchmigration package.
I think it's fine, or even expected, for different exporters to have different default batching settings. If what we want is just to enable batching by default to reduce the config changes needed to remove the batch processor, and we think the worst case scenario is just excessive batching, I would suggest a simpler migration:
- Changing the output of
NewDefaultQueueConfigto enable batching by default under an Alpha feature gate - Auditing exporters in Contrib with code owners to determine if they have reasons to deviate from standard practice (queue + batching enabled). If yes, they can keep omitting the
queue_senderfield, or they can override the default config to disable batching by default; if not, we make sure they havequeue_senderin their config. - Once we are confident that components that don't want the new upstream default have updated to override it, advance the feature gate to beta.
There was a problem hiding this comment.
Agreed with this, I think it is okay that there are different configurations per exporters. For example, pull-based exporters may don't care much about batching or queueing
There was a problem hiding this comment.
Any exporter that does not want queue/batch behavior can simply not install the QueueBatch sender in their configuration or directly configure hard-coded queue/batch settings in their constructor. Do you agree that special cases can simply bypass the feature they do not want?
Cleaning up initial conditions are what Bogdan referred to in https://github.com/open-telemetry/opentelemetry-collector/pull/15273/changes/BASE..6beccb809ee7a80430a54165092672e50be36020#r3213954079, the users who disabled batching simply because it wasn't ready.
There was a problem hiding this comment.
Do you agree that special cases can simply bypass the feature they do not want?
Yes, that makes sense
Cleaning up initial conditions are what Bogdan referred to in https://github.com/open-telemetry/opentelemetry-collector/pull/15273/changes/BASE..6beccb809ee7a80430a54165092672e50be36020#r3213954079, the users who disabled batching simply because it wasn't ready.
Alright, I think it would be easier to understand as "unify the settings for exporters that may have disabled this because of stability concerns"?
There was a problem hiding this comment.
I considered whether we should have multiple defaults, potentially a shared default for exporters that want to be (a) synchronous, (b) single concurrency, etc.
| At the same time, an audit is required to clean up the many various | ||
| initial conditions across exporters. We will define an Go package | ||
| `batchmigration` with an enum to capture all the common initial | ||
| settings, |
There was a problem hiding this comment.
Agreed with this, I think it is okay that there are different configurations per exporters. For example, pull-based exporters may don't care much about batching or queueing
…tor into jmacd/batch_rfc_2026
…tor into jmacd/batch_rfc_2026
|
@jmacd would you mind marking as resolved conversations for which you have addressed feedback? |
…tor into jmacd/batch_rfc_2026
- Add user-visible behavior table across phases (mx-psi) - Add concrete release versions to phase table (florianl) - Promote feature gates Alpha -> Beta (Phase 3) -> Stable+removed (Phase 4) (mx-psi) - Gate Phase 2 exit on early-adopter feedback; Helm chart switch is the signal (mx-psi) - Move Helm chart switch into Phase 2 (mx-psi) - Add mdatagen-enforced default queue_sender check with metadata.yaml opt-out (axw) - Document wait_for_result + storage extension as invalid combination (axw) - Add double-batching startup warning work item (dashpole, jade-guiton-dd) - Enumerate concrete exporter use cases for non-default batching (dashpole) - Tighten 'vendor-specific components' wording (mx-psi) - Add Risks section enumerating migration risks (codeboten) - Fix bug: timeline intro said wait_for_result, should be block_on_overflow Assisted-by: Claude Opus 4.7
|
I have addressed reviewer feedback. The proposal is now much simpler: no migration helper library is used, and no de-activation of the batchprocessor, just its removal after a six-month deprecation process. I've proposed to do the work of Phase 1 in the next 1.5 months in order to land documentation and deprecation warnings in the 0.158.0 release (August) and remove the batch processor 6 releases later (October), to remove the feature flag controlling exporterhelper defaults 6 releases later (Feb 2027). |
In Phase 3 we drop batchprocessor from the standard distribution manifest but leave the Go module in the core repo so custom builds (e.g., via ocb) can keep using it. The double-batching startup warning continues to protect those custom builds through Phase 3. The module itself is removed in Phase 4 alongside gate stabilization. Assisted-by: Claude Opus 4.7
…tor into jmacd/batch_rfc_2026
Per review discussion in open-telemetry#15273, reverse the proposed change to make exporterhelper block_on_overflow true by default. Only the batch::enabled default flips in exporterhelper. The new queuebatchprocessor adopts wait_for_result: true and block_on_overflow: true as its defaults to preserve batchprocessor's blocking backpressure semantics for users migrating away from it. Drops the exporterQueueBlockOnOverflow feature gate; only one gate (exporterQueueBatchEnabled) remains. Assisted-by: Claude Opus 4.7
Merging this PR will improve performance by 58.61%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | BenchmarkBatchMetricProcessor2k |
140.4 ms | 82.9 ms | +69.39% |
| ⚡ | BenchmarkMultiBatchMetricProcessor2k |
139.4 ms | 93.9 ms | +48.51% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing jmacd:jmacd/batch_rfc_2026 (7b8749e) with main (478a9fd)
Description
Documents a migration strategy for the batch processor in three phases.
Deprecates and removes
batchprocessor. Creates a newqueuebatchprocessorwith exporterhelper-derived processor capabilities and config matching the exporterhelper.Uses two feature flags to migrate to
block_on_overflowandbatch::enabledboth true by default. These would switch to Beta in the same release where batchprocessor is finally removed.Part of #8222, #13766.