Skip to content

feat(loki.process): Add debug metrics for CRI stage to track truncation of lines and partial line flushing#5399

Merged
ptodev merged 3 commits into
mainfrom
ptodev/cri-metric
Feb 25, 2026
Merged

feat(loki.process): Add debug metrics for CRI stage to track truncation of lines and partial line flushing#5399
ptodev merged 3 commits into
mainfrom
ptodev/cri-metric

Conversation

@ptodev
Copy link
Copy Markdown
Contributor

@ptodev ptodev commented Jan 29, 2026

Pull Request Details

For partial lines there is currently a log line, but having a metric could be an easier way to tell if something wrong is going on. And it means we could alert on it.

For truncation there are no logs and metrics, so this will be the first time we can track it.

Those are both low cardinality metrics and I don't expect them to have much impact.

PR Checklist

  • Documentation added
  • Tests updated
  • Config converters updated

@ptodev ptodev requested review from a team and clayton-cornell as code owners January 29, 2026 19:51
if c.cfg.MaxPartialLineSizeTruncate && len(e.Line) > int(c.cfg.MaxPartialLineSize) {
e.Line = e.Line[:c.cfg.MaxPartialLineSize]
if c.linesTruncatedMetric != nil {
c.linesTruncatedMetric.Inc()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered adding the log labels as metric labels, to make it easier to identify log streams with long lines. I suspect most of the time it'll be a particular stream. But for now I don't want to make changes that could lead to too many metrics.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 29, 2026

💻 Deploy preview available (feat(loki.process): Add two metrics for CRI stage to track truncation of lines and partial line flushing):

Comment thread internal/component/loki/process/stages/cri.go
Comment thread internal/component/loki/process/stages/cri.go Outdated
Copy link
Copy Markdown
Contributor

@kalleep kalleep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how useful these metrics would be.

Could you explain how these could be used to actually find issues?

@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Feb 12, 2026
@clayton-cornell
Copy link
Copy Markdown
Contributor

No suggestions for docs. Looks OK as-is.

@ptodev
Copy link
Copy Markdown
Contributor Author

ptodev commented Feb 12, 2026

Could you explain how these could be used to actually find issues?

It's hard to tell what values to set for max_partial_line_size and max_partial_lines without some feedback from Alloy. For truncated lines there are neither logs nor metrics. For partial lines there is a low-cardinality log line which should really be a metric instead, so that users can track better at what point in time the problem happened. Maybe there could also be an info-level alert for such things.

@ptodev ptodev requested a review from kalleep February 13, 2026 13:11
@ptodev
Copy link
Copy Markdown
Contributor Author

ptodev commented Feb 18, 2026

Hi @kalleep, have you had a chance to think about this please? I'm open to other ways of debug this but a metric seems like the easy and low cost way of doing it.

@kalleep
Copy link
Copy Markdown
Contributor

kalleep commented Feb 18, 2026

It looks good to me but still we should use the helper function I mentioned here #5399 (comment)

@ptodev ptodev changed the title feat(loki.process): Add two metrics for CRI stage to track truncation of lines and partial line flushing feat(loki.process): Add debug metrics for CRI stage to track truncation of lines and partial line flushing Feb 25, 2026
@ptodev ptodev merged commit 5bf4dcf into main Feb 25, 2026
47 of 49 checks passed
@ptodev ptodev deleted the ptodev/cri-metric branch February 25, 2026 09:58
@github-actions
Copy link
Copy Markdown
Contributor

💻 Deploy preview deleted (feat(loki.process): Add debug metrics for CRI stage to track truncation of lines and partial line flushing).

jharvey10 pushed a commit that referenced this pull request Feb 25, 2026
jharvey10 pushed a commit that referenced this pull request Feb 26, 2026
blewis12 pushed a commit that referenced this pull request Mar 9, 2026
🤖 I have created a release *beep* *boop*
---


## [1.14.0](v1.13.0...v1.14.0)
(2026-03-06)


### ⚠ BREAKING CHANGES

* **loki.secretfilter:** Some config options are removed entirely:
    - `partial_mask` (replaced with `redact_percent`)
    - `allowlist` (now controlled with custom gitleaks config)
    - `enable_entropy` 
    - `include_generic` (now controlled with custom gitleaks config)
    - `types` (now controlled with custom gitleaks config)
* **otelcol.receiver.prometheus:** `otelcol.receiver.prometheus` no
longer sets start times of OTLP metrics. Grafana Cloud and Mimir do not
currently use OTLP metric start times. If you do want your metrics to
have them, you can use `otelcol.processor.metric_start_time` with
`strategy` set to `true_reset_point` to get the same behaviour.

### Features 🌟

* Add automatic reconnection to database_observability components
([#5444](#5444))
([553f967](553f967))
* Add limited type checking for validate command
([#5076](#5076))
([045fb76](045fb76))
* **database_observability.mysql:** Collect client info for query
samples ([#5552](#5552))
([257a699](257a699))
* **database_observability.postgres:** Add exclude databases/users for
`logs` collector ([#5569](#5569))
([5dddd9b](5dddd9b))
* **database_observability.postgres:** Add logs collector
([#5445](#5445))
([46d79d4](46d79d4))
* **database_observability.postgres:** Allow excluding queries ran by
specific users ([#5544](#5544))
([2d0ca15](2d0ca15))
* Deprecate prometheus.write.queue
([#5509](#5509))
([ee0f227](ee0f227))
* Introduce SeriesRefMappingStore
([#5522](#5522))
([33ee297](33ee297))
* **local.file_match, loki.source.file:** Match multiple files using
doublestar `{...}` expressions
([#5470](#5470))
([284e48f](284e48f))
* **loki.process:** Add debug metrics for CRI stage to track truncation
of lines and partial line flushing
([#5399](#5399))
([a1728f6](a1728f6))
* **mixin:** Add OTel Engine Overview dashboard
([#5573](#5573))
([df52116](df52116))
* **mixin:** Add zipped dashboards as a release artifact
([#5603](#5603))
([4f7fe85](4f7fe85))
* **otel:** Add receivers used in the otel k8s helm chart presets
([#5466](#5466))
([100f6ea](100f6ea))
* **otelcol.receiver.prometheus:** Remove requirement to run Alloy with
`--stability.level=experimental` in order to translate Prometheus native
histograms into OTLP exponential histograms.
([#5308](#5308))
([237e985](237e985))
* **otelcol:** Expose missing tail_sampling drop and bytes_limiting
([6021154](6021154))
* **prometheus.exporter.postgres:** Update to version `0.19.0` and
expose new collectors settings
([#4640](#4640))
([aa01e45](aa01e45))
* **prometheus.exporter.postgres:** Update to version 0.19.1
([#5659](#5659))
([9f4e88f](9f4e88f))
* Update github exporter with github app authentication
([#5377](#5377))
([ca741a6](ca741a6))
* Update grafana cadvisor fork to v0.54.1
([#5447](#5447))
([2a3aba0](2a3aba0))
* Upgrade prometheus to version 0.309.1
([#5479](#5479))
([633944b](633944b))


### Bug Fixes 🐛

* Add /FORCEREGISTRY flag to windows installer
([#5517](#5517))
([6b22d4e](6b22d4e))
* Add missing otelcol alias to make OTel Engine work with OTel Collector
helm chart ([#5473](#5473))
([90478cd](90478cd))
* **controller:** Prevent duplicate loaders from being created
([#5446](#5446))
([31d5eea](31d5eea))
* **database_observability.mysql:** Skip wait events with `NULL`
timer_wait ([#5478](#5478))
([48750e5](48750e5))
* **database_observability.postgres:** Correctly handle table name
casing when parsing postgres queries
([#5440](#5440))
([7cca2b9](7cca2b9))
* **deps:** Update module github.com/go-git/go-git/v5 to v5.16.5
[SECURITY] ([#5485](#5485))
([71a1b8b](71a1b8b))
* Ensure Valid/Clear States in Alloy Engine Extension
([#5551](#5551))
([99ad024](99ad024))
* Expose missing `otelcol.processor.tail_sampling` options
([#5606](#5606))
([6021154](6021154))
* **loki.process:** Registration of stage.metric when used inside
stage.match ([#5460](#5460))
([81caf72](81caf72))
* **loki.source.docker:** Parse timestamp correctly when log line only
contains newline ([#5489](#5489))
([162011d](162011d))
* **loki.source.file:** Close file if we cannot find encoding
([#5528](#5528))
([56bcb26](56bcb26))
* **mixin:** Support OTel exporter batching
([#5618](#5618))
([f2b7cb8](f2b7cb8))
* **prometheus.echo:** Return zero for SeriesRef
([#5622](#5622))
([31a8680](31a8680))
* **prometheus.exporter.cloudwatch:** Respect debug flag
([#5469](#5469))
([44ade00](44ade00))
* **prometheus.receive_http:** Bump prometheus patch for bugfix
([#5505](#5505))
([b7a1d05](b7a1d05))
* **prometheus.remote_write:** Fix sent_batch_duration_seconds measuring
before the request was sent [backport]
([#5698](#5698))
([150aecb](150aecb))
* Use read-write mutex locks to prevent concurrent tagsCache map reads
and writes ([#5534](#5534))
([8efed2e](8efed2e))


### Performance

* **loki.secretfilter:** Change secretfilter implementation to use
Gitleaks ([#5503](#5503))
([08e265c](08e265c))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: grafana-alloybot[bot] <167359181+grafana-alloybot[bot]@users.noreply.github.com>
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Mar 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

frozen-due-to-age type/docs Docs Squad label across all Grafana Labs repos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants