Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ensure successive WAL replays don't overwrite each other #14848

Merged
merged 2 commits into from
Nov 12, 2024

Conversation

benclive
Copy link
Contributor

@benclive benclive commented Nov 8, 2024

What this PR does / why we need it:

Fixes an edge case in WAL replay where log lines could be lost on WAL replay.
We previously reset the stream counter to 0 after a successful WAL replay. This is a problem if the ingester restarts again before the WAL content is flushed, as we will discard any new WAL entries - anything received after the previous replay - as a duplicate instead of a new log line.

e.g. in this scenario, each startup will create a new segment file in the WAL. After replay, we'd ingest 1A, 1B, 1C, 2D but 2A, 2B, 2C will be classed as duplicates of 1A, 1B, and 1C because they have the same entry count.
image

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@benclive benclive requested a review from a team as a code owner November 8, 2024 15:24
Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading this and am not sure I understand what this solves (it might fix a bug but I don't see it yet).

The stream.entryCount is used to dedupe entries between wal segments (traditional WAL, writes entries as they arrive) and checkpoints (an in-memory dump of streams in a more memory efficient form):

	// This allows us to discard WAL entries during replays which were
	// already recovered via checkpoints. Historically out of order
	// errors were used to detect this, but this counter has been
	// introduced to facilitate removing the ordering constraint.
	entryCt int64```

The checkpoint is technically another WAL, but whereas the regular WAL appends logs as they arrive in ingesters, the checkpoints are built by copying all in-memory streams to disk, logged at an interval. The reason we do this is to make replay time relatively fixed and bound disk usage:

  • Option (A) replay via regular WALs: WAL replay is a function of the number of segments written since the last restart. This could overload the disk due to long uptime and make replays take forever as we e.g. replay hours|days|weeks of old pushes.
  • Option (B) Regularly checkpoint from internal memory state, remove out of date WALs in the process. Replay is now a function of memory (checkpoint size) + any new WALs written since the last checkpoint.

We use option B. The entryCount above is a way for us to drop logs which are present in a checkpoint and a regular WAL, since due to some timing consistency we can't know if a user log will be in both, although we know the converse (checkpoint_10 ensures that all logs from wal_0->wal_9 are either in the checkpoint file or have been flushed to storage). The entrycount stores the number of lines added since the last sucessful WAL replay.

Ah, I think I see the bug you mean. It can occur when:

  1. ingester restarts; replay (checkpoint_1, wal_1) succeeds. Counters are reset. New entries are written to wal_2
  2. ingester restarts again, before a checkpoint occurs (which would propagate the now-low entryCount to the most recent checkpoint's disk repr)
  3. during replay, checkpoint_1 is loaded first before replaying wals. All entries from wal_2 are dropped because they have lower counters than what's reported by the checkpoint.```

Wow, great catch. Followup question: does this mirror what we see, e.g. do we notice dropped logs only during periods of quick restarts? If not there could be another issue.

re; fixing this: I think never resetting the entry count works fine. Let's try this.

@benclive benclive merged commit ec95ed1 into main Nov 12, 2024
57 checks passed
@benclive benclive deleted the fix-wal-replay-counters branch November 12, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants