[exporterhelper]: Add RequestMiddleware extension interface#14318
[exporterhelper]: Add RequestMiddleware extension interface#14318raghu999 wants to merge 35 commits into
Conversation
…pentelemetry-collector into controller-interface
…havior unchanged when ARC is disabled
|
@axw @dmitryax @bogdandrutu gentle ping on this PR (#14318). I wanted to check if you could take a look when you have a moment. This introduces the Would appreciate a review on the API shape/placement and the integration points. Happy to iterate quickly on any feedback or adjust/split if needed. Thanks! |
axw
left a comment
There was a problem hiding this comment.
Thanks @raghu999!
My main feedback so far is:
- The interface is very narrow, and seems a bit too coupled to ARC. I'd like to see if we can change that into a more general exporter request/sender middleware interface without compromising ARC.
- The minConsumersWithController bit feels off. I'm not convinced we should change the default due to some other setting - seems like that would be surprising. A couple of thoughts:
- Can we for now just have the extension log a warning if the default is set low?
- Would it make sense for the controller to be able to add consumers? Maybe as a separate extension point in asyncQueue, but referencing the same extension?
@axw Thanks for the review — I’ve updated the PR based on your feedback: “Why 200?” / defaults & warnings No-op middleware (avoid nil checks) General middleware interface |
77cbada to
3613de8
Compare
3613de8 to
f2d39d9
Compare
…rt.Equal to assert.GreaterOrEqual
axw
left a comment
There was a problem hiding this comment.
I refactored the ARC-coupled interface into a generic RequestMiddleware / RequestMiddlewareFactory. I explored the WrapSender(... internal/request,sender ...) style, but extensioncapabilities can’t depend on exporterhelper/internal/... types due to Go internal visibility rules and it would also introduce an import cycle. Using Handle(ctx, next func(ctx) error) keeps the interface general and decoupled while letting extensions encapsulate timing/permits/error logic.
The interface doesn't have to live in extensioncapabilities. For example, there's https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/extensionmiddleware for HTTP and gRPC middleware. I wouldn't recommend adding it in there, just using it as an example. Perhaps we could introduce a new package under https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/xextension?
Thanks for the review @axw! I've updated the PR to incorporate all suggested changes: Configuration (config.go):
Interface Location:
Ready for a re-review! |
ce09eef to
7466c9e
Compare
Thanks, @axw. I agree moving the interface to exporterhelper/internal and aliasing it in xexporterhelper is the right move here. It cleanly resolves the circular dependency issues caused by xextension needing to reference exporterhelper types. I've applied that change and also refactored queue_sender.go to use the lazy-binding pattern you suggested (storing next as a field and wrapping it in Start). This allowed me to remove the RequestMiddlewareFactory entirely, which significantly simplifies the plumbing. The tests passed locally. Ready for another look. |
|
@raghu999 it's not really a matter of opinion, https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/docs/new-components.md states that new components are to be implemented in an external repo first, and donated to contrib. (This is a semi-recent change to the docs, in the last few months.) As for this PR, we're just waiting on a maintainer to have a look and approve. If you would like to accelerate that, it may be beneficial to attend a Collector SIG meeting -- see https://github.com/open-telemetry/opentelemetry-collector?tab=readme-ov-file#community |
bogdandrutu
left a comment
There was a problem hiding this comment.
I am failing to understand this (sorry not as smart). But why not having the server push back with a "retry-after"? Based on my limited knowledge is almost impossible for you to solve this at the client side, you need the server to push back on the requests and client respect that independent of how many consumers you have (e.g. consumers can be increased also by having more sources of data and you cannot control across sources).
Also, we need to understand and separ the consumption from the queue and the sending part (if needed). I would like to better understand why retry-after mechanism does not work since that is the recommended HTTP(and gRPC) way of dealing with this.
|
@bogdandrutu, thank you for the feedback. You are correct that retry-after is the standard for reactive backpressure; however, this PR introduces the RequestMiddleware interface to support proactive client-side management like Adaptive Concurrency Control (ARC). While retry-after triggers once a server is already saturated, ARC monitors latency trends to adjust concurrency before the server drops requests or experiences resource exhaustion. By utilizing the WrapSender pattern, we decouple this logic from the core exporterhelper, keeping it lean while allowing advanced extensions to hook into the request lifecycle to maintain optimal throughput |
|
@bogdandrutu @dmitryax @axw we'd love to get your eyes on the ARC implementation strategies we've proposed here. We want to ensure this aligns perfectly with the Collector’s long-term roadmap. We are happy to pivot based on your feedback whether that’s refining the current PR or moving this to a dedicated contrib component. My Company is committed to maintaining the ARC extension or ARC exporter helper as a core part of our observability stack. Please let us know how you'd like us to proceed! |
|
@bogdandrutu Any feedback on this? |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
@bogdandrutu @axw @dmathieu @dmitryax Gentle ping so we don't lose this to the stale bot! I'm hoping we can reach a consensus on the RequestMiddleware interface. As mentioned above, this client-side hook is critical for proactively managing concurrency before we hit the reactive retry-after state. I've been waiting for a while to unblock my next phase of implementation. If there are still lingering architectural concerns about the interface itself, I'd be happy to jump on the next Collector SIG meeting to hash them out. Otherwise, I'd love to get this merged! |
I think that would be a good idea. I don't personally have any concerns (I did approve after all!). Raising it at a SIG meeting sounds like a good next step, to get more feedback/reviews. |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
This is an interesting architectural pattern — introducing a A few questions for consideration:
This pattern could also be useful for other components beyond exporters (receivers, processors). Thanks for the clean abstraction! |
|
Thanks everyone for the continued feedback, reviews, and patience! I want to provide a quick update to address recent questions and clarify the architectural direction of the extension. @axw Apologies for the delay in following up! Thanks again for the review and approval. While we coordinate the best time to sync up at an upcoming SIG meeting, I wanted to lay out the technical details here so we can get a head start asynchronously. @bogdandrutu Gentle ping to keep this on your radar! To address the lingering architectural concerns regarding the algorithm, I want to provide some context on the design. This approach is heavily influenced by the Adaptive Request Concurrency (ARC) mechanisms used by Vector, as well as Netflix's "Performance Under Load" architecture. The primary goal is to completely eliminate the need for static rate limits that require constant manual tuning, and instead automatically find the optimal maximum throughput. How the internal mechanisms achieve this:
RequestMiddleware Abstraction@meridianmindx Thanks for the thoughtful review and excellent questions regarding the abstraction! Here is a breakdown of how the middleware is designed to function:
Let me know if this helps clarify the design approach for the interface and algorithm. I'm happy to iterate further right here on the PR if there are specific adjustments you'd like to see! |
|
@bogdandrutu It is a completely valid question and a very common architectural debate when dealing with distributed systems. You are absolutely right that By the time a backend (like Elasticsearch or an OTLP gateway) is issuing 429s or 503s with Here is why a client-side Adaptive Request Concurrency (ARC) mechanism is not only possible, but critical to solving this: 1. RTT as a Universal Shared Signal (Addressing the Cross-Client Issue) 2. Proactive (ARC) vs. Reactive (Retry-After) 3. Separating the Queue from the Sender Think of Let me know if this helps clarify the philosophy behind the client-side approach! |
|
@bogdandrutu Gentle ping on this. We've been waiting on your feedback regarding the requested changes for the last few months. Could you please review when you have a moment so we can figure out the best path forward? Appreciate your time! |
|
@axw @dmitryax @dmathieu @meridianmindx How would the maintainers like me to proceed?
I'm eager to unblock this phase of the pipeline and would really appreciate your guidance on how to navigate this so we can keep these contributions moving. |
Description
This PR introduces the
RequestMiddlewareinterface and integrates it into theexporterhelper. This is a generalization of the previously proposedConcurrencyController, allowing for broader control over request execution.Changes:
RequestMiddlewareinterface in theexporter/exporterhelper/xexporterhelperpackage.exporterhelper's sending queue to accept a list ofrequest_middlewares.Link to tracking issue
Relates to #14080 (Note: This PR is a prerequisite required for fixing #14080)
Testing
RequestMiddlewareinterface and its integration with the queue sender.exporterhelpertests pass to ensure no regression in current behavior.Documentation