-
Notifications
You must be signed in to change notification settings - Fork 14.8k
KAFKA-19858: Set default min.insync.replicas=2 for __remote_log_metadata topic to prevent data loss (KIP-1235) #20811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
…ata topic to prevent data loss
| public static final String REMOTE_LOG_METADATA_TOPIC_REPLICATION_FACTOR_PROP = "remote.log.metadata.topic.replication.factor"; | ||
| public static final String REMOTE_LOG_METADATA_TOPIC_PARTITIONS_PROP = "remote.log.metadata.topic.num.partitions"; | ||
| public static final String REMOTE_LOG_METADATA_TOPIC_RETENTION_MS_PROP = "remote.log.metadata.topic.retention.ms"; | ||
| public static final String REMOTE_LOG_METADATA_TOPIC_MIN_ISR_PROP = "remote.log.metadata.topic.min.isr"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiafu1115 that is a good idea, but I think this new config still need a KIP. Would you mind opening a KIP for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chia7712 Sure. thanks for your reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe if there's going to be a KIP in this area, we should take the opportunity to make __remote_log_metadata into an internal topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndrewJSchofield ok. I will consider this and try to combine two things into one KIP. thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndrewJSchofield Hi. I checked the code about the "internal" topic and summarised the result as follows:
It seems that __remote_log_metadata isn’t a purely internal topic, since it can be configured as a normal topic in the remote Kafka cluster — even though its name can’t be changed. And it is using the standard producer to write data instead of local append directly.
org.apache.kafka.server.log.remote.storage.RemoteLogManager#configAndWrapRlmmPlugin
private Plugin<RemoteLogMetadataManager> configAndWrapRlmmPlugin(RemoteLogMetadataManager rlmm) {
final Map<String, Object> rlmmProps = new HashMap<>();
endpoint.ifPresent(e -> {
rlmmProps.put(REMOTE_LOG_METADATA_COMMON_CLIENT_PREFIX + "bootstrap.servers", e.host() + ":" + e.port());
rlmmProps.put(REMOTE_LOG_METADATA_COMMON_CLIENT_PREFIX + "security.protocol", e.securityProtocol().name);
});
// update the remoteLogMetadataProps here to override endpoint config if any
rlmmProps.putAll(rlmConfig.remoteLogMetadataManagerProps());
// ...
}
Besides, the current producer clientId for __remote_log_metadata will conflict with the follow code will cause write's reject:
//KafkaApis.scala
val internalTopicsAllowed = request.header.clientId == "__admin_client"
So maybe we shouldn’t make it an “internal” topic in this KIP when we’re not sure it’s matched with current design, and it will also help to keep this KIP stay simple and clear without any concern. WDTY?
also cc @chia7712 @kamalcph for more thoughts.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiafu1115 OK, thanks for investigating the internal topic question. I agree that it's best not to make it an internal topic, because of the pluggable nature of this topic's use. Keeping your KIP simple is fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndrewJSchofield
My pleasure! I also learn a lot from the investigating. I’ve already opened a PR(#20822) for it, and it doesn’t need a KIP. We can continue the discussion there later. After all, the topic really feels like an internal one — making it internal could have some nice benefits.
For this PR, if you think the approach looks good, I can prepare a tiny KIP now for more discussion. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think a separate tiny KIP for the internal topic question seems like a good way to get the community to decide whether it's a good thing. Your PR for it looks about right. Of course, we'd need the KIP to pass before taking the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndrewJSchofield ok. for this minsync change. it is time for me to create one KIP. right? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a KIP for min.insync change. If you want to pursue the internal topic question, you could open a second KIP. There's no real reason to tangle the two questions together. Up to you though.
|
If min-isr is set to 2, then when two nodes goes down in the cluster, all the upload/delete operations won't make progress for those unavailable metadata partitions. It is good to leave it to 1 by default, the user can change the settings dynamically based on the replication-factor of the topic using |
Thanks for you comments. I think it is one broker broken vs two brokers broken. I list some extra points for why we need to consider to change it: Metadata loss is more critical than unavailable Edge case vs. common risk Industry standard or best pratice Secure by default Thanks for your discussion! |
The __remote_log_metadata internal topic currently lacks a configurable
min.insync.replicas setting, relying on the broker-level default
(typically 1) when the factor is 3 as default. This creates a data loss
risk in production environments, as writes may be acknowledged by only a
single replica.
What's more, this is inconsistent with some other cases such as
__transaction_state, which explicitly sets min.insync.replicas=2 via the
transaction.state.log.min.isr broker configuration. Both topics store
critical metadata and should have equivalent durability guarantees.
Note: