KAFKA-16106: revert classic state transitions if deletion fails #16511

jeffkbkim · 2024-07-02T19:06:15Z

An expire-group-metadata operation generates tombstone records, updates the groups state and decrements group size counters, then performs a write to the log. If there is a __consumer_offsets partition reassignment, for instance, this operation fails. The groups state is reverted to an earlier snapshot but classic group size counters are not. This begins an inconsistency between the metrics and the actual groups size. This applies to all unsuccessful write operations that alter the classic group state.

However, some operations that alter the classic group state does not produce records. This means that we cannot rely on timeline data structures as we do for consumer group states.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

dongnuo123 · 2024-07-09T19:11:38Z

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorShard.java

@@ -575,18 +581,32 @@ public CoordinatorResult<OffsetDeleteResponseData, CoordinatorRecord> deleteOffs
    public CoordinatorResult<Void, CoordinatorRecord> cleanupGroupMetadata() {
        long startMs = time.milliseconds();
        List<CoordinatorRecord> records = new ArrayList<>();
+        AtomicInteger deletedClassicGroupCount = new AtomicInteger(0);


Just to confirm, is the AtomicInteger used to match the following forEach loop?

If you mean by using it to conform to the lambda expression, yes

dongnuo123 · 2024-07-09T19:18:13Z

...-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupCoordinatorShardTest.java

+
+        when(groupMetadataManager.groupIds()).thenReturn(mkSet("group-id", "other-group-id"));
+        when(offsetMetadataManager.cleanupExpiredOffsets(eq("group-id"), eq(new ArrayList<>()))).thenReturn(true);
+        when(groupMetadataManager.maybeDeleteGroup(eq("group-id"), eq(new ArrayList<>()))).thenReturn(true);


Should we add something to the record list and assert it's non null later?

the test above, testCleanupGroupMetadata (https://github.com/apache/kafka/pull/16511/files/b73af9c786d4ad29259d0bfeb7c16db3324eff4b#diff-3a0b9cad0253e0f6d4665efd0d6f7efd5bd5dd96d3ba31005cab06fa728aad8fR990)

tests that the records we add are reflected. would this be sufficient?

dajac added the KIP-848 label Jul 4, 2024

jeffkbkim added 2 commits July 8, 2024 16:16

revert classic state transitions if deletion fails

5e62002

add tests

b73af9c

jeffkbkim force-pushed the KAFKA-16106-classic-group branch from 866ce8b to b73af9c Compare July 8, 2024 20:17

jeffkbkim marked this pull request as ready for review July 8, 2024 20:17

dongnuo123 reviewed Jul 9, 2024

View reviewed changes

fix build

45cac55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-16106: revert classic state transitions if deletion fails #16511

KAFKA-16106: revert classic state transitions if deletion fails #16511

jeffkbkim commented Jul 2, 2024 •

edited

Loading

dongnuo123 Jul 9, 2024

jeffkbkim Jul 10, 2024

dongnuo123 Jul 9, 2024

jeffkbkim Jul 10, 2024

KAFKA-16106: revert classic state transitions if deletion fails #16511

Are you sure you want to change the base?

KAFKA-16106: revert classic state transitions if deletion fails #16511

Conversation

jeffkbkim commented Jul 2, 2024 • edited Loading

Committer Checklist (excluded from commit message)

dongnuo123 Jul 9, 2024

Choose a reason for hiding this comment

jeffkbkim Jul 10, 2024

Choose a reason for hiding this comment

dongnuo123 Jul 9, 2024

Choose a reason for hiding this comment

jeffkbkim Jul 10, 2024

Choose a reason for hiding this comment

jeffkbkim commented Jul 2, 2024 •

edited

Loading