Skip to content

[exporterhelper]: Add RequestMiddleware extension interface#14318

Open
raghu999 wants to merge 35 commits into
open-telemetry:mainfrom
raghu999:controller-interface
Open

[exporterhelper]: Add RequestMiddleware extension interface#14318
raghu999 wants to merge 35 commits into
open-telemetry:mainfrom
raghu999:controller-interface

Conversation

@raghu999
Copy link
Copy Markdown

@raghu999 raghu999 commented Dec 22, 2025

Description

This PR introduces the RequestMiddleware interface and integrates it into the exporterhelper. This is a generalization of the previously proposed ConcurrencyController, allowing for broader control over request execution.

Changes:

  • Defines the RequestMiddleware interface in the exporter/exporterhelper/xexporterhelper package.
  • Updates the exporterhelper's sending queue to accept a list of request_middlewares.
  • Delegates request execution logic to these middlewares, allowing extensions (such as a dynamic concurrency controller) to intercept and manage export requests.

Link to tracking issue

Relates to #14080 (Note: This PR is a prerequisite required for fixing #14080)

Testing

  • Added unit tests for the new RequestMiddleware interface and its integration with the queue sender.
  • Verified that existing exporterhelper tests pass to ensure no regression in current behavior.

Documentation

  • Added GoDoc comments for the new interface and methods.

@raghu999 raghu999 requested review from a team, bogdandrutu and dmitryax as code owners December 22, 2025 01:30
@raghu999 raghu999 mentioned this pull request Dec 22, 2025
@raghu999
Copy link
Copy Markdown
Author

raghu999 commented Jan 5, 2026

@axw @dmitryax @bogdandrutu gentle ping on this PR (#14318). I wanted to check if you could take a look when you have a moment.

This introduces the ConcurrencyController interface + minimal exporterhelper plumbing to allow dynamic concurrency control (intended for the ARC extension path), with unit tests included. The change should be non-invasive unless a controller is explicitly configured.

Would appreciate a review on the API shape/placement and the integration points. Happy to iterate quickly on any feedback or adjust/split if needed. Thanks!

Copy link
Copy Markdown
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @raghu999!

My main feedback so far is:

  • The interface is very narrow, and seems a bit too coupled to ARC. I'd like to see if we can change that into a more general exporter request/sender middleware interface without compromising ARC.
  • The minConsumersWithController bit feels off. I'm not convinced we should change the default due to some other setting - seems like that would be surprising. A couple of thoughts:
    • Can we for now just have the extension log a warning if the default is set low?
    • Would it make sense for the controller to be able to add consumers? Maybe as a separate extension point in asyncQueue, but referencing the same extension?

Comment thread exporter/exporterhelper/README.md
Comment thread exporter/exporterhelper/internal/queue_sender.go Outdated
Comment thread exporter/exporterhelper/internal/queue_sender.go Outdated
Comment thread extension/extensioncapabilities/interfaces.go Outdated
@raghu999
Copy link
Copy Markdown
Author

raghu999 commented Jan 7, 2026

Thanks @raghu999!

My main feedback so far is:

  • The interface is very narrow, and seems a bit too coupled to ARC. I'd like to see if we can change that into a more general exporter request/sender middleware interface without compromising ARC.

  • The minConsumersWithController bit feels off. I'm not convinced we should change the default due to some other setting - seems like that would be surprising. A couple of thoughts:

    • Can we for now just have the extension log a warning if the default is set low?
    • Would it make sense for the controller to be able to add consumers? Maybe as a separate extension point in asyncQueue, but referencing the same extension?

@axw Thanks for the review — I’ve updated the PR based on your feedback:

“Why 200?” / defaults & warnings
I agree that auto-changing sending_queue.num_consumers is surprising. I removed the code that forced it to 200. Now, if concurrency_controller is configured but num_consumers is still at the default (10), exporterhelper logs a warning that the worker pool may cap the middleware’s behavior, while preserving the user’s config.

No-op middleware (avoid nil checks)
I added a NoopRequestMiddleware default so the hot path doesn’t need nil checks. I also guard against the factory returning nil by keeping the no-op middleware in that case.

General middleware interface
I refactored the ARC-coupled interface into a generic RequestMiddleware / RequestMiddlewareFactory. I explored the WrapSender(... internal/request,sender ...) style, but extensioncapabilities can’t depend on exporterhelper/internal/... types due to Go internal visibility rules and it would also introduce an import cycle. Using Handle(ctx, next func(ctx) error) keeps the interface general and decoupled while letting extensions encapsulate timing/permits/error logic.

@raghu999 raghu999 force-pushed the controller-interface branch from 77cbada to 3613de8 Compare January 7, 2026 07:50
@raghu999 raghu999 force-pushed the controller-interface branch from 3613de8 to f2d39d9 Compare January 7, 2026 07:51
Copy link
Copy Markdown
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the ARC-coupled interface into a generic RequestMiddleware / RequestMiddlewareFactory. I explored the WrapSender(... internal/request,sender ...) style, but extensioncapabilities can’t depend on exporterhelper/internal/... types due to Go internal visibility rules and it would also introduce an import cycle. Using Handle(ctx, next func(ctx) error) keeps the interface general and decoupled while letting extensions encapsulate timing/permits/error logic.

The interface doesn't have to live in extensioncapabilities. For example, there's https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/extensionmiddleware for HTTP and gRPC middleware. I wouldn't recommend adding it in there, just using it as an example. Perhaps we could introduce a new package under https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/xextension?

Comment thread exporter/exporterhelper/internal/queuebatch/config.go Outdated
Comment thread exporter/exporterhelper/README.md
@raghu999
Copy link
Copy Markdown
Author

raghu999 commented Jan 7, 2026

I refactored the ARC-coupled interface into a generic RequestMiddleware / RequestMiddlewareFactory. I explored the WrapSender(... internal/request,sender ...) style, but extensioncapabilities can’t depend on exporterhelper/internal/... types due to Go internal visibility rules and it would also introduce an import cycle. Using Handle(ctx, next func(ctx) error) keeps the interface general and decoupled while letting extensions encapsulate timing/permits/error logic.

The interface doesn't have to live in extensioncapabilities. For example, there's https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/extensionmiddleware for HTTP and gRPC middleware. I wouldn't recommend adding it in there, just using it as an example. Perhaps we could introduce a new package under https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/xextension?

Thanks for the review @axw! I've updated the PR to incorporate all suggested changes:

Configuration (config.go):

  1. Renamed Field: Changed RequestMiddlewareID to RequestMiddlewares.
  2. Updated Type: Changed the type to a list ([]component.ID) to be consistent with confighttp and allow multiple middlewares.
  3. Updated Tag: Switched the YAML tag to mapstructure:"request_middlewares".
  4. Documentation: Updated the code comments to reflect the list type and removed the concurrency controller documentation

Interface Location:

  • Refactoring: As suggested, I removed the RequestMiddleware and RequestMiddlewareFactory interfaces from extensioncapabilities.

  • New Location: I've moved them to a new package go.opentelemetry.io/collector/extension/xextension/extensionmiddleware. This keeps the experimental middleware capabilities separate from the stable core extension interfaces.

Ready for a re-review!

@raghu999 raghu999 requested a review from axw January 7, 2026 23:26
@raghu999 raghu999 force-pushed the controller-interface branch from ce09eef to 7466c9e Compare January 7, 2026 23:31
Comment thread extension/xextension/extensionmiddleware/interfaces.go Outdated
Comment thread exporter/exporterhelper/internal/queue_sender.go Outdated
@raghu999
Copy link
Copy Markdown
Author

raghu999 commented Jan 8, 2026

I think all the code can be simplified if:

  • NewQueueSender just stores "next" in a new field, and references that field rather than the parameter in exportFunc
  • queueSender.Start overrides the next field with the wrapped sender

Thanks, @axw.

I agree moving the interface to exporterhelper/internal and aliasing it in xexporterhelper is the right move here. It cleanly resolves the circular dependency issues caused by xextension needing to reference exporterhelper types.

I've applied that change and also refactored queue_sender.go to use the lazy-binding pattern you suggested (storing next as a field and wrapping it in Start). This allowed me to remove the RequestMiddlewareFactory entirely, which significantly simplifies the plumbing.

The tests passed locally. Ready for another look.

@raghu999 raghu999 requested a review from axw January 8, 2026 05:01
Comment thread exporter/exporterhelper/internal/queue_sender.go
Comment thread exporter/exporterhelper/internal/queue_sender_test.go Outdated
Comment thread exporter/exporterhelper/internal/queue_sender_test.go Outdated
Comment thread exporter/exporterhelper/internal/queue_sender.go Outdated
@axw axw changed the title feat: Add ConcurrencyController interface for ARC in exporterhelper [exporterhelper]: Add RequestMiddleware extension interface Feb 9, 2026
@axw
Copy link
Copy Markdown
Contributor

axw commented Feb 9, 2026

@raghu999 it's not really a matter of opinion, https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/docs/new-components.md states that new components are to be implemented in an external repo first, and donated to contrib. (This is a semi-recent change to the docs, in the last few months.)

As for this PR, we're just waiting on a maintainer to have a look and approve. If you would like to accelerate that, it may be beneficial to attend a Collector SIG meeting -- see https://github.com/open-telemetry/opentelemetry-collector?tab=readme-ov-file#community

Copy link
Copy Markdown
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am failing to understand this (sorry not as smart). But why not having the server push back with a "retry-after"? Based on my limited knowledge is almost impossible for you to solve this at the client side, you need the server to push back on the requests and client respect that independent of how many consumers you have (e.g. consumers can be increased also by having more sources of data and you cannot control across sources).

Also, we need to understand and separ the consumption from the queue and the sending part (if needed). I would like to better understand why retry-after mechanism does not work since that is the recommended HTTP(and gRPC) way of dealing with this.

@bogdandrutu bogdandrutu removed the ready-to-merge Code review completed; ready to merge by maintainers label Feb 10, 2026
@raghu999
Copy link
Copy Markdown
Author

@bogdandrutu, thank you for the feedback. You are correct that retry-after is the standard for reactive backpressure; however, this PR introduces the RequestMiddleware interface to support proactive client-side management like Adaptive Concurrency Control (ARC).

While retry-after triggers once a server is already saturated, ARC monitors latency trends to adjust concurrency before the server drops requests or experiences resource exhaustion. By utilizing the WrapSender pattern, we decouple this logic from the core exporterhelper, keeping it lean while allowing advanced extensions to hook into the request lifecycle to maintain optimal throughput

@raghu999
Copy link
Copy Markdown
Author

@bogdandrutu @dmitryax @axw we'd love to get your eyes on the ARC implementation strategies we've proposed here. We want to ensure this aligns perfectly with the Collector’s long-term roadmap.

We are happy to pivot based on your feedback whether that’s refining the current PR or moving this to a dedicated contrib component. My Company is committed to maintaining the ARC extension or ARC exporter helper as a core part of our observability stack. Please let us know how you'd like us to proceed!

@raghu999
Copy link
Copy Markdown
Author

raghu999 commented Mar 6, 2026

@bogdandrutu Any feedback on this?

@github-actions
Copy link
Copy Markdown
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions Bot added the Stale label Mar 21, 2026
@raghu999
Copy link
Copy Markdown
Author

@bogdandrutu @axw @dmathieu @dmitryax Gentle ping so we don't lose this to the stale bot!

I'm hoping we can reach a consensus on the RequestMiddleware interface. As mentioned above, this client-side hook is critical for proactively managing concurrency before we hit the reactive retry-after state.

I've been waiting for a while to unblock my next phase of implementation. If there are still lingering architectural concerns about the interface itself, I'd be happy to jump on the next Collector SIG meeting to hash them out. Otherwise, I'd love to get this merged!

@raghu999 raghu999 requested a review from bogdandrutu March 23, 2026 20:49
@axw
Copy link
Copy Markdown
Contributor

axw commented Mar 24, 2026

I've been waiting for a while to unblock my next phase of implementation. If there are still lingering architectural concerns about the interface itself, I'd be happy to jump on the next Collector SIG meeting to hash them out.

I think that would be a good idea. I don't personally have any concerns (I did approve after all!). Raising it at a SIG meeting sounds like a good next step, to get more feedback/reviews.

@github-actions github-actions Bot removed the Stale label Mar 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions Bot added the Stale label Apr 11, 2026
@meridianmindx
Copy link
Copy Markdown

This is an interesting architectural pattern — introducing a RequestMiddleware interface to generalize control over request execution in the exporter helper. The approach cleanly separates cross-cutting concerns like dynamic concurrency control.

A few questions for consideration:

  • Are there plans for built-in middleware implementations (e.g., circuit breaker, adaptive batch size) that would ship with the collector?
  • How does middleware ordering work? Is there a guarantee about execution order when multiple middlewares are registered?
  • Does the interface allow mutating both the request and the context, or just wrapping execution?

This pattern could also be useful for other components beyond exporters (receivers, processors). Thanks for the clean abstraction!

@github-actions github-actions Bot removed the Stale label Apr 12, 2026
@raghu999
Copy link
Copy Markdown
Author

Thanks everyone for the continued feedback, reviews, and patience! I want to provide a quick update to address recent questions and clarify the architectural direction of the extension.

@axw Apologies for the delay in following up! Thanks again for the review and approval. While we coordinate the best time to sync up at an upcoming SIG meeting, I wanted to lay out the technical details here so we can get a head start asynchronously.

@bogdandrutu Gentle ping to keep this on your radar! To address the lingering architectural concerns regarding the algorithm, I want to provide some context on the design.

This approach is heavily influenced by the Adaptive Request Concurrency (ARC) mechanisms used by Vector, as well as Netflix's "Performance Under Load" architecture. The primary goal is to completely eliminate the need for static rate limits that require constant manual tuning, and instead automatically find the optimal maximum throughput.

How the internal mechanisms achieve this:

  • AIMD & EWMA Control Law: The Controller relies on an Additive Increase / Multiplicative Decrease (AIMD) algorithm. To evaluate downstream health, it tracks the Round Trip Time (RTT) of requests and calculates a healthy latency baseline using an Exponentially Weighted Moving Average (EWMA).
  • Dynamic Backpressure: The extension automatically throttles the concurrency limit when it detects explicit backpressure (e.g., retryable errors) or when the recent RTT exceeds the calculated threshold, indicating a latency spike.
  • Optimized Permit Gating: To ensure concurrency management doesn't introduce unwanted overhead, active parallel requests are gated by a custom TokenPool. We prioritized lean, optimized code for high-throughput performance by implementing a fast path in the pool's Acquire method to completely bypass slow-path allocations.

RequestMiddleware Abstraction

@meridianmindx Thanks for the thoughtful review and excellent questions regarding the abstraction!

Here is a breakdown of how the middleware is designed to function:

  1. Built-in implementations: While ARC is the immediate driver for this interface, standardizing circuit breaking, dynamic batch sizing, and advanced rate limiting are absolutely the logical next steps once this foundation is merged.
  2. Ordering: Middlewares will be executed sequentially in the exact order they are configured and registered in the slice, forming a standard chain.
  3. Mutation: The interface is designed to wrap request execution. This allows you to pass down a modified context.Context (useful for timeouts or tracing) and directly control the flow of the request.
  4. Beyond exporters: Spot on! The adaptive concurrency mechanism is inherently designed so it can also act as an HTTP/gRPC server-side interceptor to protect the Collector's ingress (receivers), not just the exporters.

Let me know if this helps clarify the design approach for the interface and algorithm. I'm happy to iterate further right here on the PR if there are specific adjustments you'd like to see!

@raghu999
Copy link
Copy Markdown
Author

@bogdandrutu It is a completely valid question and a very common architectural debate when dealing with distributed systems.

You are absolutely right that Retry-After is the standard and recommended mechanism for server-side backpressure. However, relying exclusively on Retry-After has a fundamental limitation for high-throughput observability pipelines: it is purely reactive.

By the time a backend (like Elasticsearch or an OTLP gateway) is issuing 429s or 503s with Retry-After headers, it is already in a state of distress. The server is actively burning CPU cycles to accept the connection, read the headers, determine it is overwhelmed, and format a rejection. In addition, network bandwidth is wasted transmitting payloads that are immediately dropped.

Here is why a client-side Adaptive Request Concurrency (ARC) mechanism is not only possible, but critical to solving this:

1. RTT as a Universal Shared Signal (Addressing the Cross-Client Issue)
You correctly pointed out that one Collector instance doesn't know about the traffic generated by other instances. That is actually the exact reason ARC works so well! The extension uses Round Trip Time (RTT) as its primary health indicator.
If 50 different Collector instances suddenly burst traffic to the same backend, the backend's queues will fill and its latency will naturally increase. All 50 independent ARC controllers will detect this RTT degradation simultaneously and independently back off before the server is forced to issue a 429. It leverages the exact same principles as TCP congestion control.

2. Proactive (ARC) vs. Reactive (Retry-After)
Retry-After relies on hitting a wall. It creates a "sawtooth" pattern of traffic: burst -> overwhelm the server -> get 429s -> stop -> burst again when the timer expires (which often causes a thundering herd problem).
ARC, via its AIMD control law and EWMA latency tracking, finds the "sweet spot" of maximum throughput just below the server's breaking point and dynamically hovers there.

3. Separating the Queue from the Sender
The RequestMiddleware abstraction actually facilitates exactly what you are asking for separating the queue consumption from the network sending.
Currently, if we set num_consumers: 10, we artificially cap our throughput even if the downstream is completely idle. With ARC, we can set num_consumers: 200 (allowing the queue to drain rapidly and utilize CPU efficiently) but let the middleware dynamically limit the actual in-flight HTTP/gRPC requests to what the network can currently sustain.

Think of Retry-After as the airbag, and ARC as the anti-lock brakes. We absolutely still want the server to send Retry-After when necessary (and ARC will immediately cut concurrency when it sees retryable errors!), but ARC's job is to prevent us from crashing into that wall in the first place.

Let me know if this helps clarify the philosophy behind the client-side approach!

@raghu999
Copy link
Copy Markdown
Author

@bogdandrutu Gentle ping on this. We've been waiting on your feedback regarding the requested changes for the last few months. Could you please review when you have a moment so we can figure out the best path forward? Appreciate your time!

@raghu999
Copy link
Copy Markdown
Author

raghu999 commented May 4, 2026

@axw @dmitryax @dmathieu @meridianmindx
Hey team, I’d like to get your advice on the best protocol for moving forward here. Since the PR has been blocked by the requested changes for a few months without a follow-up response, I want to make sure I'm following the right community process.

How would the maintainers like me to proceed?

  • Should I continue to hold off and wait for @bogdandrutu to have the bandwidth to review the responses?

  • Would it be cleaner to close this and submit a new PR to reset the review state and get fresh eyes on it?

  • Is there a different architectural implementation for ARC that the community would prefer I explore instead of the RequestMiddleware approach?

I'm eager to unblock this phase of the pipeline and would really appreciate your guidance on how to navigate this so we can keep these contributions moving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants