Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18225: ClientQuotaCallback#updateClusterMetadata is unsupported by kraft #18196

Merged
merged 56 commits into from
Feb 10, 2025

Conversation

m1a2st
Copy link
Contributor

@m1a2st m1a2st commented Dec 15, 2024

Jira: https://issues.apache.org/jira/browse/KAFKA-18225

we don't implement the ClientQuotaCallback#updateClusterMetadata in Kraft mode. We will implement it in 4.0 version to pass CustomQuotaCallbackTest test in Kraft mode.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@github-actions github-actions bot added the triage PRs from the community label Dec 15, 2024
@m1a2st m1a2st changed the title KAFKA-18225: ClientQuotaCallback#updateClusterMetadata is unsupported by kraft [WIP]KAFKA-18225: ClientQuotaCallback#updateClusterMetadata is unsupported by kraft Dec 15, 2024
@github-actions github-actions bot added core Kafka Broker kraft labels Dec 15, 2024
@@ -565,3 +531,97 @@ class KRaftMetadataCache(
}
}

object KRaftMetadataCache {

def toCluster(clusterId: String, image: MetadataImage): Cluster = {
Copy link
Contributor

@TaiJuWu TaiJuWu Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This result is different with KraftMetadataCache#toCluster (the old will filter partitions base on listener).
If we decide to change this one, maybe we need to document it.

@m1a2st m1a2st changed the title [WIP]KAFKA-18225: ClientQuotaCallback#updateClusterMetadata is unsupported by kraft KAFKA-18225: ClientQuotaCallback#updateClusterMetadata is unsupported by kraft Dec 17, 2024
@github-actions github-actions bot removed the triage PRs from the community label Dec 19, 2024
@chia7712
Copy link
Member

chia7712 commented Jan 8, 2025

@m1a2st please fix the conflicts

m1a2st added 2 commits January 8, 2025 19:39
# Conflicts:
#	core/src/test/scala/integration/kafka/api/CustomQuotaCallbackTest.scala
@m1a2st
Copy link
Contributor Author

m1a2st commented Jan 8, 2025

Thanks for @chia7712 reminder, resolve the conflict

Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m1a2st thanks for this fix!

@@ -48,6 +51,11 @@ class DynamicClientQuotaPublisher(
): Unit = {
val deltaName = s"MetadataDelta up to ${newImage.highestOffsetAndEpoch().offset}"
try {
val clientQuotaCallback = conf.getConfiguredInstance(QuotaConfig.CLIENT_QUOTA_CALLBACK_CLASS_CONFIG, classOf[ClientQuotaCallback])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use the callback in quotaManagers rather than creating a new one! They are different instances and so this approach can update the callback used by quotaManagers

@@ -235,6 +235,13 @@ public Optional<Node> node(String listenerName) {
}
return Optional.of(new Node(id, endpoint.host(), endpoint.port(), rack.orElse(null), fenced));
}

public List<Node> nodes() {
List<Node> nodes = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can leverage existent method. for example:

return listeners.keySet().stream().flatMap(l -> node(l).stream()).toList();

Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m1a2st thanks for this patch. I test your patch on my local and please fix following code also:

  1. deleteTopic needs to use authorized admin
  2. You must call removeQuotaOverrides after creating group1_user2. Otherwise, the removeQuota method of the custom callback will not be invoked. This is an interesting discrepancy. In ZooKeeper mode, removeQuota is called when altering SASL/SCRAM credentials. However, in Kraft mode, this behavior is absent. I'm uncertain whether this constitutes a breaking change. It appears to be an unusual behavior, as it is triggered by the addition of users. Instead of implementing this peculiar behavior, I suggest updating the documentation of the callback to reflect the actual implementation.
    cc @dajac and @cmccabe

@@ -179,13 +212,12 @@ class CustomQuotaCallbackTest extends IntegrationTestHarness with SaslSetup {
}

private def createTopic(topic: String, numPartitions: Int, leader: Int): Unit = {
// TODO createTopic
TestUtils.createTopicWithAdmin(createAdminClient(), topic, brokers, controllerServers, numPartitions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to honor the leader to make replica leader hosted by correct broker - otherwise, this test will get flaky in the future

Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


public static class CustomQuotaCallback implements ClientQuotaCallback {

public static AtomicInteger dynamicTopicCounter = new AtomicInteger(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please distinguish the count for different node id? for example:

        public static final Map<String, AtomicInteger> COUNTERS = new ConcurrentHashMap<>();
        private String nodeId;

        @Override
        public void configure(Map<String, ?> configs) {
            nodeId = (String) configs.get("node.id");
        }

        @Override
        public boolean updateClusterMetadata(Cluster cluster) {
            COUNTERS.computeIfAbsent(nodeId, k -> new AtomicInteger()).incrementAndGet();
            return true;
        }

with this counter, we can ensure all controllers do call the callback.

Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m1a2st thanks for this patch. I leave some comments to fix the tests. PTAL

);

CustomQuotaCallback.COUNTERS.clear();
admin.alterClientQuotas(clientQuotaAlterations, new AlterClientQuotasOptions().validateOnly(true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to make some changes to trigger the metadata update, so please remove validateOnly(true)

);
// Reset the counters, and we expect the callback to be triggered again in all controllers
CustomQuotaCallback.COUNTERS.clear();
admin.alterClientQuotas(clientQuotaAlterations, new AlterClientQuotasOptions());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method can trigger the callback as it don't cause either topic or cluster change.

# Conflicts:
#	core/src/main/scala/kafka/server/MetadataCache.scala
Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I left a bunch of comments. It looks like the new test testCustomQuotaCallbackWithControllerServer() is consistently failing. Also this needs to be rebased to resolve the conflict.

delta: MetadataDelta,
newImage: MetadataImage,
): Unit = {
val deltaName = s"MetadataDelta up to ${newImage.highestOffsetAndEpoch().offset}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only compute this val in the catch block?

s"publishing dynamic topic or cluster changes from $deltaName", t)
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing new line

* Topics that are being deleted will not be included in `cluster`.
*
* @param cluster Cluster metadata including partitions and their leaders if known
* @param cluster Cluster metadata including topic and cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this change. I prefer the previous message.

@@ -89,11 +89,11 @@ public interface ClientQuotaCallback extends Configurable {
boolean quotaResetRequired(ClientQuotaType quotaType);

/**
* Metadata update callback that is invoked whenever UpdateMetadata request is received from
* the controller. This is useful if quota computation takes partitions into account.
* Metadata update callback that is invoked whenever the topic and cluster delta changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to mention cluster/topic deltas here, this is a public API and part of our javadoc. I think we can say something like: This callback is invoked whenever the cluster metadata changes. This includes brokers added or removed, topics created or deleted, and partition leadership changes.

@@ -144,4 +149,95 @@ object MetadataCache {
): KRaftMetadataCache = {
new KRaftMetadataCache(brokerId, kraftVersionSupplier)
}

def toCluster(clusterId: String, image: MetadataImage): Cluster = {
val brokerToNodes = new util.HashMap[Integer, java.util.List[Node]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop the java. prefix in java.util.List

val brokerToNodes = new util.HashMap[Integer, java.util.List[Node]]
image.cluster().brokers()
.values().stream()
.filter(broker => !broker.fenced())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> .filter(!_.fenced()) but maybe you wrote it this way with the future conversion to Java in mind. In that case we can keep the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I want to keep this because we need to convert it to a Java future.

.filter(broker => !broker.fenced())
.forEach { broker => brokerToNodes.put(broker.id(), broker.nodes()) }

def getNodes(id: Int): java.util.List[Node] = brokerToNodes.get(id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again drop the java. prefix

@@ -87,11 +90,25 @@ class CustomQuotaCallbackTest extends IntegrationTestHarness with SaslSetup {

override def configureSecurityBeforeServersStart(testInfo: TestInfo): Unit = {
super.configureSecurityBeforeServersStart(testInfo)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to keep configureSecurityBeforeServersStart() if all it does is call the super implementation?

@ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames)
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersClassicGroupProtocolOnly_ZK_implicit"))
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersClassicGroupProtocolOnly"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my limited testing it looks like this test works with the new consumer too. Is there a reason you used getTestQuorumAndGroupProtocolParametersClassicGroupProtocolOnly instead of getTestQuorumAndGroupProtocolParametersAll?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is that this test targets the server, not the consumer. Therefore, I didn't test different consumers separately, but I will address this.

createScramCredentials(createAdminClient(), JaasTestUtils.KAFKA_SCRAM_ADMIN, JaasTestUtils.KAFKA_SCRAM_ADMIN_PASSWORD)
}

override def addFormatterSettings(formatter: Formatter): Unit = {
formatter.setScramArguments(
List(s"SCRAM-SHA-256=[name=${JaasTestUtils.KAFKA_SCRAM_ADMIN},password=${JaasTestUtils.KAFKA_SCRAM_ADMIN_PASSWORD}]").asJava)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use util.List.of() instead of asJava here


@Override
public Double quotaLimit(ClientQuotaType quotaType, Map<String, String> metricTags) {
return 0.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to return Double.MAX_VALUE ... otherwise, the ops can't get pass as they are rejected due to throttle.

@chia7712
Copy link
Member

chia7712 commented Feb 5, 2025

It looks like the new test testCustomQuotaCallbackWithControllerServer() is consistently failing.

@mimaison that is caused by #18196 (comment)

@m1a2st
Copy link
Contributor Author

m1a2st commented Feb 6, 2025

Thanks for @chia7712 and @mimaison review, I address comments, PTAL

Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - will merge it tomorrow

In combined mode, the callback will be triggered twice - not sure whether it is a potential issue

@dajac
Copy link
Member

dajac commented Feb 10, 2025

@chia7712 Thanks for the review. Can we merge it?

@chia7712 chia7712 merged commit 70adf74 into apache:trunk Feb 10, 2025
9 checks passed
chia7712 pushed a commit that referenced this pull request Feb 10, 2025
…by kraft (#18196)

This commit ensures that the ClientQuotaCallback#updateClusterMetadata method is executed in KRaft mode. This method is triggered whenever a topic or cluster metadata change occurs. However, in KRaft mode, the current implementation of the updateClusterMetadata API is inefficient due to the requirement of creating a full Cluster object. To address this, a follow-up issue (KAFKA-18239) has been created to explore more efficient mechanisms for providing cluster information to the ClientQuotaCallback without incurring the overhead of a full Cluster object creation.

Reviewers: Mickael Maison <[email protected]>, TaiJuWu <[email protected]>, Chia-Ping Tsai <[email protected]>
ijuma added a commit to ijuma/kafka that referenced this pull request Feb 11, 2025
…ssume-baseline-3.3-for-server

* apache-github/trunk:
  KAFKA-18366 Remove KafkaConfig.interBrokerProtocolVersion (apache#18820)
  KAFKA-18658 add import control for examples module (apache#18812)
  MINOR: Java version and TLS documentation improvements (apache#18822)
  KAFKA-18743 Remove leader.imbalance.per.broker.percentage as it is not supported by Kraft (apache#18821)
  KAFKA-18225 ClientQuotaCallback#updateClusterMetadata is unsupported by kraft (apache#18196)
  KAFKA-18441: Remove flaky tag on KafkaAdminClientTest#testAdminClientApisAuthenticationFailure (apache#18847)
  MINOR: fix KStream#to incorrect javadoc (apache#18838)
  KAFKA-18763: changed the assertion statement for acknowledgements to include only successful acks (apache#18846)
  MINOR: Accept specifying consumer group assignors by their short names (apache#18832)
pdruley pushed a commit to pdruley/kafka that referenced this pull request Feb 12, 2025
…by kraft (apache#18196)

This commit ensures that the ClientQuotaCallback#updateClusterMetadata method is executed in KRaft mode. This method is triggered whenever a topic or cluster metadata change occurs. However, in KRaft mode, the current implementation of the updateClusterMetadata API is inefficient due to the requirement of creating a full Cluster object. To address this, a follow-up issue (KAFKA-18239) has been created to explore more efficient mechanisms for providing cluster information to the ClientQuotaCallback without incurring the overhead of a full Cluster object creation.

Reviewers: Mickael Maison <[email protected]>, TaiJuWu <[email protected]>, Chia-Ping Tsai <[email protected]>
manoj-mathivanan pushed a commit to manoj-mathivanan/kafka that referenced this pull request Feb 19, 2025
…by kraft (apache#18196)

This commit ensures that the ClientQuotaCallback#updateClusterMetadata method is executed in KRaft mode. This method is triggered whenever a topic or cluster metadata change occurs. However, in KRaft mode, the current implementation of the updateClusterMetadata API is inefficient due to the requirement of creating a full Cluster object. To address this, a follow-up issue (KAFKA-18239) has been created to explore more efficient mechanisms for providing cluster information to the ClientQuotaCallback without incurring the overhead of a full Cluster object creation.

Reviewers: Mickael Maison <[email protected]>, TaiJuWu <[email protected]>, Chia-Ping Tsai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants