You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We collect ~400-500k LogRecords/s with the otel-collector and are sending them to our Loki system via the lokiexporter today. With that exporter, we include between 15 and 20 stream labels depending on the kind of log record it is. We tried to switch to native OTLP ingestion of the LogRecords themselves - along with the ingestion of the resource attributes as structured metadata, but it absolutely blew up our memory footprint by over 10x across both distributor and ingester pods:
For example, here's the Distributor CPU and Memory Usage
(the experiment started at 10am and ended at ~9PM)
Here is the Ingester CPU and Memory Usage
To Reproduce
Hard to say exactly - but here is the #1 log line we have in our system - by like 50x - so this is really the majority of our data and cardinality:
Here's a standard istio-proxy log line looks like with the standard lokiexporter and us explicitly picking out stream labels:
Expected behavior
I certainly understand that Loki is doing more work now to process the data - and I expect memory to go up.. but I did not expect the aggregate memory usage of the distribuors to from ~6-8Gi to ~80-150Gi:
The ingesters are worse - we went from ~350-500Gi -> 8+TB of memory usage, and it still couldn't keep up:
Environment:
Infrastructure: Kubernetes on AWS EKS using BottleRocket
Deployment tool: ArgoCD / Helm
We would love to take advantage of the new system - but it seems there's some critical tuning to do.. any suggestions?
The text was updated successfully, but these errors were encountered:
Describe the bug
We collect ~400-500k
LogRecords/s
with theotel-collector
and are sending them to our Loki system via the lokiexporter today. With that exporter, we include between 15 and 20 stream labels depending on the kind of log record it is. We tried to switch to native OTLP ingestion of the LogRecords themselves - along with the ingestion of the resource attributes as structured metadata, but it absolutely blew up our memory footprint by over 10x across bothdistributor
andingester
pods:For example, here's the Distributor CPU and Memory Usage
(the experiment started at 10am and ended at ~9PM)
Here is the Ingester CPU and Memory Usage
To Reproduce
Hard to say exactly - but here is the #1 log line we have in our system - by like 50x - so this is really the majority of our data and cardinality:
Here's a standard
istio-proxy
log line looks like with the standardlokiexporter
and us explicitly picking out stream labels:We turn that into:
When we run the same logs through the
otlphttpexporer
though and let Loki pick out the resource attributes in the distributor, we see:Expected behavior
I certainly understand that Loki is doing more work now to process the data - and I expect memory to go up.. but I did not expect the aggregate memory usage of the distribuors to from ~6-8Gi to ~80-150Gi:
The ingesters are worse - we went from ~350-500Gi -> 8+TB of memory usage, and it still couldn't keep up:
Environment:
We would love to take advantage of the new system - but it seems there's some critical tuning to do.. any suggestions?
The text was updated successfully, but these errors were encountered: