Skip to content

Fix heartbeat self-conflict causes unnecessary lease release after transient timeout#3418

Merged
Aaronontheweb merged 4 commits into
akkadotnet:devfrom
Arkatufus:fix-lease-self-conflict
Mar 19, 2026
Merged

Fix heartbeat self-conflict causes unnecessary lease release after transient timeout#3418
Aaronontheweb merged 4 commits into
akkadotnet:devfrom
Arkatufus:fix-lease-self-conflict

Conversation

@Arkatufus
Copy link
Copy Markdown
Contributor

Summary

  • When a heartbeat PUT succeeds on the API server but times out on the client, the retry uses a stale version and gets a CAS conflict. If the conflict response shows the same owner, the lease was never actually lost — but the actor was unconditionally releasing it.
  • The Left<> handler in the Granted state now checks whether the conflict owner matches the current node. Same owner: update the version and continue heartbeating. Different owner: release the lease (unchanged behavior).
  • Affects both Kubernetes and Azure lease actor implementations.

Changes

LeaseActor.cs (both Kubernetes and Azure): Added an owner equality check before the existing unconditional release in the heartbeat conflict handler. Uses string.Equals with StringComparison.Ordinal for strict matching.

LeaseActorSpec.cs (both Kubernetes and Azure): Added 3 reproduction tests per implementation:

  • HeartbeatSelfConflictAfterTimeoutShouldStayGranted — timeout → retry → self-conflict → stays granted
  • HeartbeatSelfConflictShouldNotCallLeaseLostCallback — lease-lost callback must not fire on self-conflict
  • HeartbeatConflictWithDifferentOwnerAfterTimeoutShouldRelease — guards that genuine conflicts still release

Test plan

  • All 6 new tests pass (3 Kubernetes + 3 Azure)
  • All existing lease actor tests pass on both platforms
  • Azure integration tests pass with Docker/Azurite

Copy link
Copy Markdown
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick

// Self-conflict: our previous PUT succeeded on the server but timed out
// on the client, so we retried with a stale version. The lease is still ours —
// update the version and continue heartbeating.
_log.Warning(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resource.Value.Owner and _ownerName are always the same so we could clean this log message up and simplify it.

@Aaronontheweb Aaronontheweb self-requested a review March 18, 2026 20:24
Copy link
Copy Markdown
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb merged commit 158c514 into akkadotnet:dev Mar 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants