Skip to content

feat(loki.write): Add loki pipeline latency metric#5702

Merged
kalleep merged 35 commits into
mainfrom
kalleep/loki-pipeline-latency
Mar 10, 2026
Merged

feat(loki.write): Add loki pipeline latency metric#5702
kalleep merged 35 commits into
mainfrom
kalleep/loki-pipeline-latency

Conversation

@kalleep
Copy link
Copy Markdown
Contributor

@kalleep kalleep commented Mar 3, 2026

Pull Request Details

Alternative to #5656.

In this pr I added loki_write_entry_propagation_latency. This is a histogram with the time it takes from from a source until it's written over the wire.

Every source now uses loki.NewEntry or loki.NewEntryWithCreated, and multiline will propagate created from first entry.

When the entry was either sent or dropped by loki write we record the time it took for all of them. I also added a new version for entries in WAL so that the created time is encoded.

Issue(s) fixed by this Pull Request

Notes to the Reviewer

PR Checklist

  • Documentation added
  • Tests updated
  • Config converters updated

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 3, 2026

TruffleHog Scan Results

Summary: Found 4 potential secrets (0 verified, 4 unverified)

  • Possible secret (CloudflareApiToken) at internal/component/loki/source/cloudflare/tailer.go:3a1c1***4868
  • Possible secret (PrivateKey) at internal/component/loki/source/syslog/internal/syslogtarget/syslogtarget_test.go:55----***---
  • Possible secret (PrivateKey) at internal/component/loki/source/syslog/internal/syslogtarget/syslogtarget_test.go:74----***---
  • Possible secret (PrivateKey) at internal/component/loki/source/syslog/internal/syslogtarget/syslogtarget_test.go:162----***---

Review: Check if unverified secrets are false positives.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 3, 2026

💻 Deploy preview available (feat(loki.write): loki pipeline latency metric):

@kalleep kalleep changed the title feat(loki.write): loki pipeline latency metric feat(loki.write): Add loki pipeline latency metric Mar 3, 2026
@kalleep kalleep marked this pull request as ready for review March 3, 2026 15:27
@kalleep kalleep requested a review from clayton-cornell as a code owner March 3, 2026 15:27
@kalleep
Copy link
Copy Markdown
Contributor Author

kalleep commented Mar 3, 2026

Do we have a sense of the overhead this adds for WAL/in general?

Yes it would add 8 additional bytes in WAL for every entry.

If we are trending towards a batched based Loki pipeline solution (I haven't seen much in the way of updates on this to know what direction we are heading) is this entry level metric still going to make sense

Good point, will have to think a bit about this one but I think it would be acceptable to have one created timestamp per batch. If that is the case we could store one Created timestamp for a RefEntries instead. It would also work currently because in alloy RefEntries only contain one entry at the moment. WDYT?

The overhead in WAL would be less with batches then because we would store 8 bytes for the entire batch instead of 8 bytes per entry in batch.

And yes we are likely going to batches, we are working on the design doc just going a bit slow

@kgeckhart
Copy link
Copy Markdown
Contributor

Yes it would add 8 additional bytes in WAL for every entry.

I guess I was also thinking about general pipeline overhead but that's more nebulous / hard to gauge. It's an extra field on a struct so probably not important.

Good point, will have to think a bit about this one but I think it would be acceptable to have one created timestamp per batch. If that is the case we could store one Created timestamp for a RefEntries instead. It would also work currently because in alloy RefEntries only contain one entry at the moment. WDYT?

I think that makes a lot of sense!

And yes we are likely going to batches, we are working on the design doc just going a bit slow

FTR my comment wasn't suggesting it's moving too slow just qualifying my question in case it was out of date 😅

@kalleep
Copy link
Copy Markdown
Contributor Author

kalleep commented Mar 4, 2026

We are now writing one created to WAL for each RefEntries, it works currently because we are only writing one entry in each RefEntry and when we change to batching RefEntries would contain all entries sharing labels from a batch so the created should be the same for them

@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Mar 6, 2026
@clayton-cornell
Copy link
Copy Markdown
Contributor

No input on docs content - one-line change

Comment thread docs/sources/reference/components/loki/loki.write.md Outdated
Comment thread internal/component/common/loki/client/batch.go
Co-authored-by: Piotr <17101802+thampiotr@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.

Comment thread internal/component/common/loki/client/batch.go
Comment thread internal/component/common/loki/entry.go Outdated
Comment thread internal/component/common/loki/wal/encoding.go Outdated
Comment thread internal/component/common/loki/wal/encoding.go
@kalleep kalleep force-pushed the kalleep/loki-pipeline-latency branch from 15da5d9 to fdc07a0 Compare March 9, 2026 16:14
@kalleep kalleep requested a review from Copilot March 9, 2026 16:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 3 comments.

out <- collapsed
entry := s.startLineEntry
entry.Line = s.buffer.String()

Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change turns the previous deep-copy into a shallow copy of s.startLineEntry. That means the emitted entry can still share underlying references (e.g., extracted map / labels map / structured metadata slice) with s.startLineEntry, which can lead to unintended mutations across pipeline stages and potential data races if downstream stages modify those structures. Consider restoring the previous behavior by cloning the mutable fields (at least extracted, labels, and structured metadata) when flushing, while still preserving the created timestamp propagation.

Suggested change
// Clone mutable fields to avoid sharing underlying references with s.startLineEntry.
if entry.Extracted != nil {
clonedExtracted := make(map[string]interface{}, len(entry.Extracted))
for k, v := range entry.Extracted {
clonedExtracted[k] = v
}
entry.Extracted = clonedExtracted
}
if entry.Labels != nil {
clonedLabels := make(model.LabelSet, len(entry.Labels))
for k, v := range entry.Labels {
clonedLabels[k] = v
}
entry.Labels = clonedLabels
}
if entry.StructuredMetadata != nil {
// Use append with a zero-length, zero-capacity slice to force allocation of a new backing array.
entry.StructuredMetadata = append(entry.StructuredMetadata[:0:0], entry.StructuredMetadata...)
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safe, the state is local to a specific stream. When we detect the start of a multi-line we set startLineEntry to that entry and continue.

When the line is finished we update this entry's line with everything we have written into the buffer and reset the state. In reset we clear buffer, set currentLines to 0 and set startLineEntry to it's zero value so there is no more references to extractex, structured metadata and labels.

The copy of these we have before was unnecessary and is just doing additional allocations

Comment thread internal/component/common/loki/wal/encoding.go Outdated
Comment thread internal/component/common/loki/wal/encoding.go Outdated
kalleep and others added 2 commits March 10, 2026 08:59
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@kalleep kalleep merged commit cc744a1 into main Mar 10, 2026
48 of 49 checks passed
@kalleep kalleep deleted the kalleep/loki-pipeline-latency branch March 10, 2026 11:31
kalleep added a commit that referenced this pull request Mar 11, 2026
…flushed (#5746)

### Issue(s) fixed by this Pull Request
In #5702 some changes to
`stage.multiline` was made to support created timestamp along with some
optimizations (No longer do a deep clone). But we also made sure to
always reset to a zero entry.

This is a issue when a multiline entry is flushed before completion,
either by max line or max wait. Then all lines that is not considered
start line will be added to an empty entry and later flushed.

To fix this we revert to the old behavior where we clone the stored
entry and keep it around for other lines.

### Notes to the Reviewer

<!-- Add any relevant notes for the reviewers and testers of this PR.
-->

### PR Checklist

<!-- Remove items that do not apply. For completed items, change [ ] to
[x]. -->

- [ ] Documentation added
- [x] Tests updated
- [ ] Config converters updated

---------

Co-authored-by: Piotr <17101802+thampiotr@users.noreply.github.com>
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Mar 25, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

frozen-due-to-age type/docs Docs Squad label across all Grafana Labs repos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants