Memory Limiter does not obey documented processor behaviour when used in multiple pipelines #11969

pranavmarla · 2024-12-20T22:36:18Z

Describe the bug

Recently my team ran into an issue where the memory_limiter processor did not behave as expected when it was referenced in multiple pipelines. We believe this is because it does not actually follow the documented behaviour for processors in this situation.

Specifically, as per this documentation, when the same processor is referenced in multiple pipelines, each pipeline gets its own independent copy of that processor -- the processor is not "shared" across pipelines.

The same name of the processor can be referenced in the processors key of multiple pipelines. In this case, the same configuration is used for each of these processors, but each pipeline always gets its own instance of the processor. Each of these processors has its own state, and the processors are never shared between pipelines. For example, if batch processor is used in several pipelines, each pipeline has its own batch processor.

Based on this, when we referenced the same memory_limiter processor in multiple pipelines (eg. A and B), we expected:

The memory_limiter processor in pipeline A would only examine the memory used by pipeline A, and same for B.
If pipeline A's memory usage went above the limit defined in the memory_limiter processor, only pipeline A would be halted -- pipeline B would not be impacted.

However, what we actually saw was that, as soon as pipeline A's memory usage breached the limit defined in the memory_limiter processor, both pipelines A and B were halted. This suggests that, contrary to what the documentation says should be the case, the memory_limiter processor is "shared" across pipelines -- i.e. the memory_limiter processor examines + limits the total memory usage of all the pipelines, not just its own individual pipeline.

Also, this issue comment also implies that the memory_limiter does not obey the documented processor behaviour:

I also still confused why it is a processor :) you can only define once per instance right? it applies to all pipelines (or the least defined one wins for all pipelines)

Steps to reproduce

Configure an agent with 1 memory_limiter processor that is referenced in two different pipelines: A logs pipeline and a metrics pipeline. Generate a large volume of logs so that only the memory used by the logs pipeline increases until it breaches the limit defined in the memory_limiter processor.

What did you expect to see?

Since only the logs pipeline's memory usage went over the limit defined in the memory_limiter processor, only the logs pipeline should get halted. The metrics pipeline should be unaffected and should continue sending metrics.

What did you see instead?

Both the logs pipeline and metrics pipeline got halted, even though only the logs pipeline's memory usage was too high. We know the metrics pipeline got halted by the memory_limiter processor because, in the agent logs, the metrics pipeline generated an error message that said "data refused due to high memory usage" which is also present in the code for the memory_limiter processor.

What version did you use?

OTEL agent v0.102.1

Environment

Kubernetes

Suggested Solution

One solution is to refactor the memory_limiter processor so it obeys the documented processor behaviour. However, I don't know how difficult this would be. I'm also not sure if the wider community would want its current behaviour to change.
As I said, perhaps the current behaviour of the memory_limiter processor is actually desired. If that is the case, then the real "bug" here is just that its behaviour in this situation is not clearly documented. Regardless of whether or not we implement solution 1, I think we should at least document this anomalous behaviour of the memory_limiter processor so that it avoids such confusion in the future.
Specifically:
2a. In the general processor documentation, clearly note that the memory_limiter processor is an exception and behaves differently
2b. The memory_limiter processor documentation currently only mentions its behaviour when referenced in a single pipeline. Instead, clearly document (with examples) how it behaves when referenced in multiple pipelines. In particular it should call out that it does not behave as per the current processor documentation.

Workaround?
Assuming the memory_limiter processor's current behaviour is not changed any time soon, is there any suggested workaround to get our desired behaviour (where, if the logs pipeline's memory usage goes too high, only the logs pipeline is halted)?
Eg. Would defining two different memory_limiter processors (eg. memory_limiter/logs and memory_limiter/metrics), such that each pipeline gets a different memory_limiter processor, solve this issue? Or would it not make a difference since, regardless of what they're named, they will both be examining the total memory usage of all the pipelines, not just their own pipeline, which means they will always be impacted by the other pipeline?

The text was updated successfully, but these errors were encountered:

dehaansa · 2025-01-03T16:58:39Z

The memory_limiter processor does not examine the memory usage of a pipeline, it examines the memory usage of the collector host. While the documentation does not say this as explicitly as it could, it does not at any point mention the memory usage of a pipeline.

The behavior is consistent with the documentation, each pipeline gets its own instance, however each instance is looking at total system memory not pipeline memory. Are there specific changes to the documentation that you think would make this behavior more clear/explicit?

If you're looking for a component to limit the memory usage of a single pipeline I would expect that would be a completely new component, or at minimum a completely different explicit mode of the memory_limiter processor (with comprehensive new documentation I hope!).

pranavmarla · 2025-01-14T20:04:23Z

Hi @dehaansa , thanks for replying!

The behavior is consistent with the documentation, each pipeline gets its own instance, however each instance is looking at total system memory not pipeline memory. Are there specific changes to the documentation that you think would make this behavior more clear/explicit?

Thanks for clarifying, this is good to know! So I guess then that there is no actual bug in the functionality. However I think the documentation can be clarified a bit.

Specifically, when my team and I read the portion of the documentation that talks about how when the same processor is referenced in multiple pipelines each pipeline gets its own independent copy of that processor, our takeaway was that processors are always "scoped" to the specific pipeline where they're referenced. So then when we found out that there was a memory_limiter processor, to us it seemed obvious to us that, since it was a processor, memory_limiter was also scoped to a specific pipeline (i.e. it only limited memory of its specific pipeline) which of course turned out to be wrong. Maybe it's clear to everyone else but, based on our experience, I think the following doc changes would be helpful to avoid such confusion in the future:

The memory_limiter processor does not examine the memory usage of a pipeline, it examines the memory usage of the collector host. While the documentation does not say this as explicitly as it could

As you said, I think it would help to make the docs more explicit here and note that the memory_limiter processor affects the memory usage of the entire agent/collector, to avoid any ambiguity.
The memory_limiter processor documentation currently only mentions its behaviour when referenced in a single pipeline. Instead, clearly document (with examples) how it behaves when referenced in multiple pipelines.

Again, I think it would be helpful to have a multi-pipeline example that clearly calls out how it behaves when referenced in multiple pipelines (specifically, calling out the fact that all the referenced pipelines will be halted when any one of them breaches the memory limit).

To be honest, given that it's not scoped to a particular pipeline, I feel like memory_limiter shouldn't even be a processor -- instead, I feel like it would make more sense as an extension (like health_check) since that is very clearly at a higher scope. But I don't know how feasible and/or desirable such a change would be so in lieu of that I think these doc changes should tide us over.

pranavmarla added the bug Something isn't working label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Limiter does not obey documented processor behaviour when used in multiple pipelines #11969

Memory Limiter does not obey documented processor behaviour when used in multiple pipelines #11969

pranavmarla commented Dec 20, 2024

dehaansa commented Jan 3, 2025

pranavmarla commented Jan 14, 2025

Memory Limiter does not obey documented processor behaviour when used in multiple pipelines #11969

Memory Limiter does not obey documented processor behaviour when used in multiple pipelines #11969

Comments

pranavmarla commented Dec 20, 2024

dehaansa commented Jan 3, 2025

pranavmarla commented Jan 14, 2025