fix: Ensure successive WAL replays don't overwrite each other #14848

benclive · 2024-11-08T15:24:13Z

What this PR does / why we need it:

Fixes an edge case in WAL replay where log lines could be lost on WAL replay.
We previously reset the stream counter to 0 after a successful WAL replay. This is a problem if the ingester restarts again before the WAL content is flushed, as we will discard any new WAL entries - anything received after the previous replay - as a duplicate instead of a new log line.

e.g. in this scenario, each startup will create a new segment file in the WAL. After replay, we'd ingest 1A, 1B, 1C, 2D but 2A, 2B, 2C will be classed as duplicates of 1A, 1B, and 1C because they have the same entry count.

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

owen-d

I'm reading this and am not sure I understand what this solves (it might fix a bug but I don't see it yet).

The stream.entryCount is used to dedupe entries between wal segments (traditional WAL, writes entries as they arrive) and checkpoints (an in-memory dump of streams in a more memory efficient form):

	// This allows us to discard WAL entries during replays which were
	// already recovered via checkpoints. Historically out of order
	// errors were used to detect this, but this counter has been
	// introduced to facilitate removing the ordering constraint.
	entryCt int64```

The checkpoint is technically another WAL, but whereas the regular WAL appends logs as they arrive in ingesters, the checkpoints are built by copying all in-memory streams to disk, logged at an interval. The reason we do this is to make replay time relatively fixed and bound disk usage:

Option (A) replay via regular WALs: WAL replay is a function of the number of segments written since the last restart. This could overload the disk due to long uptime and make replays take forever as we e.g. replay hours|days|weeks of old pushes.
Option (B) Regularly checkpoint from internal memory state, remove out of date WALs in the process. Replay is now a function of memory (checkpoint size) + any new WALs written since the last checkpoint.

We use option B. The entryCount above is a way for us to drop logs which are present in a checkpoint and a regular WAL, since due to some timing consistency we can't know if a user log will be in both, although we know the converse (checkpoint_10 ensures that all logs from wal_0->wal_9 are either in the checkpoint file or have been flushed to storage). The entrycount stores the number of lines added since the last sucessful WAL replay.

Ah, I think I see the bug you mean. It can occur when:

ingester restarts; replay (checkpoint_1, wal_1) succeeds. Counters are reset. New entries are written to wal_2
ingester restarts again, before a checkpoint occurs (which would propagate the now-low entryCount to the most recent checkpoint's disk repr)
during replay, checkpoint_1 is loaded first before replaying wals. All entries from wal_2 are dropped because they have lower counters than what's reported by the checkpoint.```

Wow, great catch. Followup question: does this mirror what we see, e.g. do we notice dropped logs only during periods of quick restarts? If not there could be another issue.

re; fixing this: I think never resetting the entry count works fine. Let's try this.

fix: Ensure successive WAL replays don't overwrite each other

2569d1a

benclive requested a review from a team as a code owner November 8, 2024 15:24

pull-request-size bot added the size/M label Nov 8, 2024

lint

cae2039

owen-d approved these changes Nov 8, 2024

View reviewed changes

benclive merged commit ec95ed1 into main Nov 12, 2024
57 checks passed

benclive deleted the fix-wal-replay-counters branch November 12, 2024 14:05

This was referenced Dec 23, 2024

chore(k234): release 3.4.0 #15536

Open

chore(k235): release 3.4.0 #15555

Open

loki-gh-app bot mentioned this pull request Jan 6, 2025

chore(k236): release 3.4.0 #15595

Open

loki-gh-app bot mentioned this pull request Jan 13, 2025

chore(k237): release 3.4.0 #15705

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Ensure successive WAL replays don't overwrite each other #14848

fix: Ensure successive WAL replays don't overwrite each other #14848

benclive commented Nov 8, 2024 •

edited

Loading

owen-d left a comment •

edited

Loading

fix: Ensure successive WAL replays don't overwrite each other #14848

fix: Ensure successive WAL replays don't overwrite each other #14848

Conversation

benclive commented Nov 8, 2024 • edited Loading

owen-d left a comment • edited Loading

Choose a reason for hiding this comment

benclive commented Nov 8, 2024 •

edited

Loading

owen-d left a comment •

edited

Loading