Skip to content

Conversation

@jiafu1115
Copy link
Contributor

@jiafu1115 jiafu1115 commented Nov 3, 2025

The __remote_log_metadata internal topic currently lacks a configurable
min.insync.replicas setting, relying on the broker-level default
(typically 1) when the factor is 3 as default. This creates a data loss
risk in production environments, as writes may be acknowledged by only a
single replica.

What's more, this is inconsistent with some other cases such as
__transaction_state, which explicitly sets min.insync.replicas=2 via the
transaction.state.log.min.isr broker configuration. Both topics store
critical metadata and should have equivalent durability guarantees.

Note:

public static final String
REMOTE_LOG_METADATA_TOPIC_REPLICATION_FACTOR_PROP =
"rlmm.config.remote.log.metadata.topic.replication.factor";
public static final short
DEFAULT_REMOTE_LOG_METADATA_TOPIC_REPLICATION_FACTOR = 3;

@github-actions github-actions bot added triage PRs from the community storage Pull requests that target the storage module tiered-storage Related to the Tiered Storage feature small Small PRs labels Nov 3, 2025
public static final String REMOTE_LOG_METADATA_TOPIC_REPLICATION_FACTOR_PROP = "remote.log.metadata.topic.replication.factor";
public static final String REMOTE_LOG_METADATA_TOPIC_PARTITIONS_PROP = "remote.log.metadata.topic.num.partitions";
public static final String REMOTE_LOG_METADATA_TOPIC_RETENTION_MS_PROP = "remote.log.metadata.topic.retention.ms";
public static final String REMOTE_LOG_METADATA_TOPIC_MIN_ISR_PROP = "remote.log.metadata.topic.min.isr";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiafu1115 that is a good idea, but I think this new config still need a KIP. Would you mind opening a KIP for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 Sure. thanks for your reminder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe if there's going to be a KIP in this area, we should take the opportunity to make __remote_log_metadata into an internal topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndrewJSchofield ok. I will consider this and try to combine two things into one KIP. thanks.

Copy link
Contributor Author

@jiafu1115 jiafu1115 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndrewJSchofield Hi. I checked the code about the "internal" topic and summarised the result as follows:

image It seems that __remote_log_metadata isn’t a purely internal topic, since it can be configured as a normal topic in the remote Kafka cluster — even though its name can’t be changed. And it is using the standard producer to write data instead of local append directly.
org.apache.kafka.server.log.remote.storage.RemoteLogManager#configAndWrapRlmmPlugin
private Plugin<RemoteLogMetadataManager> configAndWrapRlmmPlugin(RemoteLogMetadataManager rlmm) {
    final Map<String, Object> rlmmProps = new HashMap<>();
    endpoint.ifPresent(e -> {
        rlmmProps.put(REMOTE_LOG_METADATA_COMMON_CLIENT_PREFIX + "bootstrap.servers", e.host() + ":" + e.port());
        rlmmProps.put(REMOTE_LOG_METADATA_COMMON_CLIENT_PREFIX + "security.protocol", e.securityProtocol().name);
    });
    // update the remoteLogMetadataProps here to override endpoint config if any
    rlmmProps.putAll(rlmConfig.remoteLogMetadataManagerProps());
    // ...
}

Besides, the current producer clientId for __remote_log_metadata will conflict with the follow code will cause write's reject:

//KafkaApis.scala
val internalTopicsAllowed = request.header.clientId == "__admin_client"

So maybe we shouldn’t make it an “internal” topic in this KIP when we’re not sure it’s matched with current design, and it will also help to keep this KIP stay simple and clear without any concern. WDTY?

also cc @chia7712 @kamalcph for more thoughts.

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiafu1115 OK, thanks for investigating the internal topic question. I agree that it's best not to make it an internal topic, because of the pluggable nature of this topic's use. Keeping your KIP simple is fine with me.

Copy link
Contributor Author

@jiafu1115 jiafu1115 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndrewJSchofield
My pleasure! I also learn a lot from the investigating. I’ve already opened a PR(#20822) for it, and it doesn’t need a KIP. We can continue the discussion there later. After all, the topic really feels like an internal one — making it internal could have some nice benefits.
For this PR, if you think the approach looks good, I can prepare a tiny KIP now for more discussion. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think a separate tiny KIP for the internal topic question seems like a good way to get the community to decide whether it's a good thing. Your PR for it looks about right. Of course, we'd need the KIP to pass before taking the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndrewJSchofield ok. for this minsync change. it is time for me to create one KIP. right? thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a KIP for min.insync change. If you want to pursue the internal topic question, you could open a second KIP. There's no real reason to tangle the two questions together. Up to you though.

@github-actions github-actions bot removed the triage PRs from the community label Nov 4, 2025
@kamalcph
Copy link
Contributor

kamalcph commented Nov 4, 2025

If min-isr is set to 2, then when two nodes goes down in the cluster, all the upload/delete operations won't make progress for those unavailable metadata partitions. It is good to leave it to 1 by default, the user can change the settings dynamically based on the replication-factor of the topic using kafka-configs.sh script.

@jiafu1115
Copy link
Contributor Author

jiafu1115 commented Nov 4, 2025

If min-isr is set to 2, then when two nodes goes down in the cluster, all the upload/delete operations won't make progress for those unavailable metadata partitions. It is good to leave it to 1 by default, the user can change the settings dynamically based on the replication-factor of the topic using kafka-configs.sh script.

@kamalcph

Thanks for you comments. I think it is one broker broken vs two brokers broken. I list some extra points for why we need to consider to change it:

Metadata loss is more critical than unavailable
With min.isr=1, Leader crash = permanent metadata loss → remote data becomes unreachable. Silent data loss is worse than temporary write blocking.

Edge case vs. common risk
Single node failure (common): min.isr=2 still works
Two nodes down (rare): Write blocked
Correct trade-off: Block writes in extreme cases rather than risk data loss

Industry standard or best pratice
Kafka's own metadata topics use min.isr=2:
__transaction_state → min.isr=2

Secure by default
Most users won't tune internal configs. Defaults must be safe. Users needing higher availability can explicitly lower it.

Thanks for your discussion!

@jiafu1115 jiafu1115 changed the title KAFKA-19858: Set default min.insync.replicas=2 for __remote_log_metadata topic to prevent data loss KAFKA-19858: Set default min.insync.replicas=2 for __remote_log_metadata topic to prevent data loss (KIP-1235) Nov 5, 2025
@jiafu1115
Copy link
Contributor Author

jiafu1115 commented Nov 5, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-approved small Small PRs storage Pull requests that target the storage module tiered-storage Related to the Tiered Storage feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants