Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: memberlist delete operation #644

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

NickAnge
Copy link
Contributor

@NickAnge NickAnge commented Feb 5, 2025

What this PR does:

After the rollout of delete operation in memberlist we observed 2 things:

  • Error failed to decode value: snappy: corrupt input. (logs)
  • Delete was happening but the actual entry was not deleted completed. So the entry is deleted it and then comes back.

In this PR we tackle these two issues by:

  • returning an actual change when the only thing that gets updated is the Delete status, so we do not have to handlePossibleNilEncode which was returning an []byte{} that was causing the Decode issue.
  • returning no change when the incoming entry has Delete true. Thist was an edge case in which the current HA tracker had deleted the entry and another one was coming with delete True. This can happen if we have regular push/pull interval.

Verified it locally.

Tests

  • Introduces checkMemberlistEntry, that will constantly checks the memberlist entries up to specific amount of time before considers the test failure. It utilizes test.poll
  • extend TestDelete to check if there are any encoding errors during the Delete flow. Covers failed to decode value: snappy: corrupt input

Which issue(s) this PR fixes:

Fixes #

Checklist

  • Tests updated
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@NickAnge NickAnge changed the title Nickange/debugging memberlist delete fix: memberlist delete operation Feb 10, 2025
@NickAnge NickAnge force-pushed the nickange/debugging-memberlist-delete branch from 3352299 to a3ad9d2 Compare February 10, 2025 14:58
@NickAnge NickAnge force-pushed the nickange/debugging-memberlist-delete branch from a3ad9d2 to 7e6d3bb Compare February 10, 2025 15:02
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change makes sense, but it definitely needs unit tests. If you can even cover cases which haven't been covered so far, that'd be great

@@ -1695,19 +1702,3 @@ func updateTimeMillis(ts time.Time) int64 {
}
return ts.UnixMilli()
}

func handlePossibleNilEncode(codec codec.Codec, change Mergeable) ([]byte, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided that the handle* funcions are not needed since we handle the change=nil at the mergeValueForKey level. That was the main reason these were needed at the first place. I validated that the []byte{}was the one causing the Error failed to decode value: snappy: corrupt input at the first place. Please let me know what you think

@NickAnge
Copy link
Contributor Author

NickAnge commented Feb 11, 2025

Update after the test in mimir-dev-11

  • The entry is deleted as expected ✅

Bad news

  • We still see failed to decode value: snappy: corrupt input (logs) after the delete operation , which I am planning to investigate. Its not causing any impact apart from some "bad" messages that are not ingested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants