-
Notifications
You must be signed in to change notification settings - Fork 48
Include kafka_internal usage in Manage Disk Space #1257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include kafka_internal usage in Manage Disk Space #1257
Conversation
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThe documentation for Redpanda transactions and disk utilization was reorganized and expanded. The transactions documentation now includes clearer explanations of multi-partition transactions, atomicity, exactly-once semantics, and transaction failure handling. New best practices and configuration guidance were added, along with practical code examples. Additionally, a new section was added to the disk utilization documentation, explaining how transaction metadata in the Sequence Diagram(s)Not applicable: changes are limited to documentation content and do not introduce or modify control flow or feature logic. Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
modules/develop/pages/transactions.adoc (1)
150-158: Minor Java sample issues – missing semicolon and non-string UUID
var target = "target-topic"is missing its terminating semicolon.UUID.newUUID()should beUUID.randomUUID().toString()to produce the requiredString.-var target = "target-topic" +var target = "target-topic"; -pprops.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, UUID.newUUID()); +pprops.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, UUID.randomUUID().toString());
🧹 Nitpick comments (2)
modules/manage/pages/cluster-maintenance/disk-utilization.adoc (1)
178-182:rpk cluster config setexample sets two keys in one call – double-check CLI behaviour
rpk cluster config sethistorically accepts a single<property> <value>pair per invocation.
If the current version still follows that pattern, the example may mislead users:rpk cluster config set transaction_coordinator_delete_retention_ms <milliseconds> rpk cluster config set transactional_id_expiration_ms <milliseconds>Confirm whether multi-key support exists; if not, split the example accordingly.
modules/develop/pages/transactions.adoc (1)
267-268: Order requirement wording may confuse readersThe bullet says:
`transaction_coordinator_delete_retention_ms` is not lower than `transactional_id_expiration_ms`If the intent is “delete retention ≥ id expiration” consider stating it positively (“must be greater than or equal to”) to avoid misconfiguration.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modules/develop/pages/transactions.adoc(6 hunks)modules/manage/pages/cluster-maintenance/disk-utilization.adoc(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Feediver1
PR: redpanda-data/docs#1153
File: modules/reference/pages/properties/topic-properties.adoc:45-50
Timestamp: 2025-07-16T19:33:20.420Z
Learning: In the Redpanda documentation, topic property cross-references like <<max.compaction.lag.ms>> and <<min.compaction.lag.ms>> require corresponding property definition sections with anchors like [[maxcompactionlagms]] and [[mincompactionlagms]] to prevent broken links.
Learnt from: JakeSCahill
PR: redpanda-data/docs#1192
File: modules/deploy/partials/requirements.adoc:91-93
Timestamp: 2025-07-02T14:54:03.506Z
Learning: In Redpanda documentation, use GiB (binary units, powers of 2) for Kubernetes-specific memory requirements because Kubernetes treats memory units like Mi, Gi as binary units. Use GB (decimal units, powers of 10) for general broker memory requirements in non-Kubernetes contexts.
Learnt from: paulohtb6
PR: redpanda-data/docs#0
File: :0-0
Timestamp: 2025-07-15T20:38:27.458Z
Learning: In Redpanda documentation, "Redpanda Data" refers to the company name, while "Redpanda" refers to the product name. These terms should be used appropriately based on context.
modules/manage/pages/cluster-maintenance/disk-utilization.adoc (2)
Learnt from: Feediver1
PR: #1153
File: modules/reference/pages/properties/topic-properties.adoc:45-50
Timestamp: 2025-07-16T19:33:20.420Z
Learning: In the Redpanda documentation, topic property cross-references like <<max.compaction.lag.ms>> and <<min.compaction.lag.ms>> require corresponding property definition sections with anchors like [[maxcompactionlagms]] and [[mincompactionlagms]] to prevent broken links.
Learnt from: JakeSCahill
PR: #1192
File: modules/deploy/partials/requirements.adoc:91-93
Timestamp: 2025-07-02T14:54:03.506Z
Learning: In Redpanda documentation, use GiB (binary units, powers of 2) for Kubernetes-specific memory requirements because Kubernetes treats memory units like Mi, Gi as binary units. Use GB (decimal units, powers of 10) for general broker memory requirements in non-Kubernetes contexts.
modules/develop/pages/transactions.adoc (2)
Learnt from: Feediver1
PR: #1153
File: modules/reference/pages/properties/topic-properties.adoc:45-50
Timestamp: 2025-07-16T19:33:20.420Z
Learning: In the Redpanda documentation, topic property cross-references like <<max.compaction.lag.ms>> and <<min.compaction.lag.ms>> require corresponding property definition sections with anchors like [[maxcompactionlagms]] and [[mincompactionlagms]] to prevent broken links.
Learnt from: kbatuigas
PR: #1113
File: modules/manage/partials/iceberg/use-iceberg-catalogs.adoc:100-107
Timestamp: 2025-05-07T01:06:00.937Z
Learning: In AsciiDoc documentation for Redpanda, the syntax + and -- around content blocks within a [tabs] section are valid AsciiDoc formatting elements for tabbed content. The + after a tab name (like rpk::) indicates that the following block belongs to that tab, and the -- markers enclose the content for that tab. These are not diff artifacts and should not be removed.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Redirect rules - redpanda-docs-preview
- GitHub Check: Header rules - redpanda-docs-preview
- GitHub Check: Pages changed - redpanda-docs-preview
🔇 Additional comments (2)
modules/manage/pages/cluster-maintenance/disk-utilization.adoc (1)
172-174: Possible brokenconfig_reflinks due to path segmentproperties/cluster-propertiesThe
config_ref:macro typically points toreference:cluster-properties.adoc#<property>.
Usingproperties/cluster-propertiesmay generate a 404 in the rendered docs:* config_ref:transaction_coordinator_delete_retention_ms,true,reference:cluster-properties[`transaction_coordinator_delete_retention_ms`] * config_ref:transactional_id_expiration_ms,true,reference:cluster-properties[`transactional_id_expiration_ms`]Please verify the path and update to avoid broken cross-references.
modules/develop/pages/transactions.adoc (1)
274-282: Cross-reference anchor check formax_transactions_per_coordinator
xref:reference:cluster-properties#max_transactions_per_coordinatorassumes the property anchor exists.
If the anchor incluster-properties.adocis autogenerated it will bemax_transactions_per_coordinatoronly if the heading exactly matches the property name. Please verify to prevent a dead link.
|
lgtm--who is the SME reviewer for this one? |
Feediver1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still needs sme approval
| * When upgrading a self-managed deployment, make sure to use maintenance mode with a glossterm:rolling upgrade[]. | ||
|
|
||
| endif::[] | ||
| The required `transactional.id` property acts as a producer identity. It enables reliability semantics that span multiple producer sessions by allowing the client to guarantee that all transactions issued by the client with the same ID have completed prior to starting any new transactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is transactional.id a property set by a client, so not a Redpanda config property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct.
bharathv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| * When upgrading a self-managed deployment, make sure to use maintenance mode with a glossterm:rolling upgrade[]. | ||
|
|
||
| endif::[] | ||
| The required `transactional.id` property acts as a producer identity. It enables reliability semantics that span multiple producer sessions by allowing the client to guarantee that all transactions issued by the client with the same ID have completed prior to starting any new transactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct.
Co-authored-by: Joyce Fee <[email protected]>
d9d51dd to
62c6d78
Compare
| The required `transactional.id` property acts as a producer identity. It enables reliability semantics that span multiple producer sessions by allowing the client to guarantee that all transactions issued by the client with the same ID have completed prior to starting any new transactions. | ||
|
|
||
| The two primary use cases for transactions are: | ||
| By default, the `enable_transactions` cluster configuration property is set to true. However, in the following use cases, clients must explicitly use the Transactions API to perform operations within a transaction: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider linking to the prperty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
| === Atomic publishing of multiple messages | ||
|
|
||
| With its event sourcing microservice architecture, a banking IT system illustrates the necessity for transactions well. A bank has multiple branches, and each branch is an independent microservice that manages its own non-intersecting set of accounts. Each branch keeps its own ledger, which is represented as a Redpanda partition. When a branch representing a microservice starts, it replays its ledger to reconstruct the actual state. | ||
| A banking IT system with an event-sourcing microservice architecture illustrates the necessity for transactions. A bank has multiple branches, and each branch is an independent microservice that manages its own non-intersecting set of accounts. Each branch keeps its own ledger, represented as a Redpanda partition. When a branch (microservice) starts, it replays its ledger to reconstruct the actual state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be simplified. Some terms here might not be widely recognized, specially by non-native speakers. One thing that makes me question this is "ledger". It's unclear if it's a tech-term or bank-term.
Check if the following makes sense in this context:
| A banking IT system with an event-sourcing microservice architecture illustrates the necessity for transactions. A bank has multiple branches, and each branch is an independent microservice that manages its own non-intersecting set of accounts. Each branch keeps its own ledger, represented as a Redpanda partition. When a branch (microservice) starts, it replays its ledger to reconstruct the actual state. | |
| A banking system with an event-sourcing microservice architecture illustrates the necessity for transactions. A bank has multiple branches, and each branch is an independent microservice that manages its own accounts. Each branch stores its transaction history in a Redpanda partition. When a branch starts, it replays this history to reconstruct the current account balances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
|
||
| To help avoid common pitfalls and optimize performance, consider the following when configuring transactional workloads in Redpanda: | ||
|
|
||
| * Ongoing transactions can prevent consumers from advancing. To avoid this, don't set transaction timeout (`transaction.timeout.ms` in Java client) to high values: the longer the timeout, the longer consumers may be blocked. By default, it's about a minute, but it's a client setting that depends on the client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ongoing transactions can prevent consumers from advancing.
Advancing what? offsets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why do we mention the java client specifically? This is a kafka setting, so it's likely that all supported clients have this value for their producers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's about a minute,
This is vague. We could be precise here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a client setting that depends on the client.
Could be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Ongoing transactions can prevent consumers from advancing. To avoid this, don't set transaction timeout (`transaction.timeout.ms` in Java client) to high values: the longer the timeout, the longer consumers may be blocked. By default, it's about a minute, but it's a client setting that depends on the client. | |
| * Ongoing transactions can prevent consumers from advancing their offsets. To avoid this, don't set transaction timeout (`transaction.timeout.ms`) to high values: the longer the timeout, the longer consumers may be blocked. The default is 60,000 ms (60 seconds). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bharathv would you mind confirming if the suggested changes are correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Advancing what? offsets?
When a Kafka consumer is configured to use the read_committed isolation level, it will only process messages that have been part of successfully committed transactions. This mechanism ensures data consistency, preventing consumers from acting on incomplete or potentially aborted transactions. However, a stuck transaction can prevent this process from moving forward, impacting consumer progress and potentially causing a backlog.
A stuck transaction with a large timeout can block other committed transactions, this is by design, so large timeouts and stuck transactions results in seemingly stuck consumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why do we mention the java client specifically? This is a kafka setting, so it's likely that all supported clients have this value for their producers.
Just to be a clear that is a Kafka "client" setting. It is named transaction.timeout.ms in Kafka Java client implementation but could be named something else in a different implementation. I think it refers to Java client because it is the most popular one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's about a minute,
Exactly a minute, 60000ms to be exact (since the configuration is in ms).
| ifndef::env-cloud[] | ||
| * When running transactional workloads from clients, tune xref:reference:cluster-properties#max_transactions_per_coordinator[`max_transactions_per_coordinator`] to the number of active transactions that you expect your clients to run at any given time (if your client transaction IDs are not reused). | ||
| + | ||
| The total number of transactions in the cluster at any one time is `max_transactions_per_coordinator * transaction_coordinator_partitions` (`transaction_coordinator_partitions` default is 50). When the threshold is exceeded, Redpanda terminates old sessions. If an idle producer corresponding to the terminated session wakes up and produces, its batches are rejected with the message `invalid producer epoch` or `invalid_producer_id_mapping`, depending on where it is in the transaction execution phase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The total number of transactions in the cluster at any one time is `max_transactions_per_coordinator * transaction_coordinator_partitions` (`transaction_coordinator_partitions` default is 50). When the threshold is exceeded, Redpanda terminates old sessions. If an idle producer corresponding to the terminated session wakes up and produces, its batches are rejected with the message `invalid producer epoch` or `invalid_producer_id_mapping`, depending on where it is in the transaction execution phase. | |
| The cluster's total transaction limit is `max_transactions_per_coordinator * transaction_coordinator_partitions` (default is 50 partitions). When this limit is exceeded, Redpanda terminates old sessions. If a terminated producer later tries to produce data, Redpanda rejects its batches with `invalid producer epoch` or `invalid_producer_id_mapping` errors. |
This phrase flow is kinda awkward. Here's a suggestion. Feel free to pick whatever works best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
paulohtb6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments to improve wording. But overall LGTM
Co-authored-by: Paulo Borges <[email protected]>
|
in the customer ticket ...for where the inspo came for creating this doc request ) not sure that their processing and the 15 mins transactions were a particularly niche case They did not understand why the tx directory was getting so big... That was down to the fact that the kafka_internal tx files contain ALL historical transactions.. e.g complete/committed and current open ones. <<< I would like this to be called out in the doc... and they had the defaults for #transaction_coordinator_delete_retention_ms). Default: 604800000 (7 days). Which is completely over the top given their transactions were only open 15mins to 1hour max Maybe if we could have an example... If you typical transactions run for 1 hour ... then you can consider setting #transaction_coordinator_delete_retention_ms) to 90 mins (or something ) |
Description
This pull request enhances the documentation for Redpanda's transaction capabilities, focusing on improving clarity, adding examples, and introducing best practices for managing transactional workloads and disk usage. The changes include updates to the transactional API, examples of use cases, and guidance for optimizing performance and handling transaction failures.
Enhancements to Transaction Documentation:
Expanded Transaction Capabilities:
transactional.idproperty for producer identity and reliability across sessions.Examples and Use Cases:
Best Practices and Configuration Guidance:
Transaction Management:
Disk Usage Optimization:
kafka_internal/txtopic, including tuningtransaction_coordinator_delete_retention_msandtransactional_id_expiration_msproperties.Resolves https://redpandadata.atlassian.net/browse/
Review deadline:
Page previews
Manage Disk Space
Transactions
transaction_coordinator_disk_usageChecks