Skip to content

[IMPROVED] NRG: Peer activity tracking#7402

Merged
neilalexander merged 1 commit intomainfrom
maurice/nrg-active-tracking
Oct 8, 2025
Merged

[IMPROVED] NRG: Peer activity tracking#7402
neilalexander merged 1 commit intomainfrom
maurice/nrg-active-tracking

Conversation

@MauriceVanVeen
Copy link
Copy Markdown
Member

The peer activity timestamp tracking was somewhat inconsistent.

The leader should always be able to reach and report on all peer activities, because it sends heartbeats it will know best when all peers were last seen. Followers only talk to the leader, so they can only track the leader's activity.

However, during leader changes or cluster resizes old active timestamps would remain and would not be updated. This is not an issue per se, but since it's exposed as "last seen" and "active" in various APIs this would look very misleading. A last seen timestamp could show "17 hours ago" and that would feel very problematic "a node hasn't been active for 17 hours!?". But actually that server happened to be a leader 17 hours ago and there's a new leader since, so that timestamp has just not been updated anymore after that. This PR ensures we track these timestamps more consistently, and clear these old timestamps on leader changes.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner October 7, 2025 14:02
Copy link
Copy Markdown
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit 93ac646 into main Oct 8, 2025
90 of 92 checks passed
@neilalexander neilalexander deleted the maurice/nrg-active-tracking branch October 8, 2025 08:56
neilalexander added a commit that referenced this pull request Oct 10, 2025
Includes the following:

- #7400
- #7399
- #7401
- #7402
- #7404
- #7405
- #7409
- #7413

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Oct 28, 2025
Includes the following:

- #7380
- #7384
- #7385
- #7388
- #7395
- #7400
- #7399
- #7401
- #7402
- #7423
- #7424
- #7411
- #7428
- #7429
- #7431
- #7435
- #7433
- #7443
- #7455
- #7465
- #7466
- #7460
- #7484
- #7479

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Nov 6, 2025
Follow-up of #7402. When
shutting down a server with LDM or having the leader step down, all peer
timestamps would be cleared. This resulted in quorum being reported as
lost for all Raft nodes that the server was leader for, a "NO quorum,
stalled" message to be printed, and an advisory to be sent.

This PR fixes that by ensuring the leader remembers the timestamps after
stepping down. Once the new leader comes online the other follower's
timestamp can still be cleared.

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
neilalexander added a commit that referenced this pull request Mar 2, 2026
Follow-up to #7402. Also
resetting the last replicated index, since we only know it if we're the
leader, since only the leader is in contact with all servers and knows
the amount of messages they have persisted in their logs.

The `Lag` and `Current` values were mostly unusable/stale/incorrect when
read on a follower node. These values are returned in requests like
stream and consumer info, as well as JSZ. So, this PR makes the
`n.Peers()` state more usable and consistent:
- The `Lag/Current` fields now accurately describe whether the current
server is current (or lags) when compared with the peer in the list.
- Only the leader will report `Lag`. All followers that are part of
quorum will report `Current: true, Lag: 0`. Other non-current followers
not part of quorum will show `Lag` of the amount of entries that are
committed but not yet persisted on this peer. (This was already the
case)
- A follower will always report other followers as not current with no
lag. It doesn't have any contact with other peers, so this data is not
useful either way. (Previously this contained stale/unused data making
it incorrect and deceiving at worst).
- A follower will report in `Current` whether it has seen the leader
recently.

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants