-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change status of intra broker balancing from beta to GA #855
Change status of intra broker balancing from beta to GA #855
Conversation
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@daisukebe @wzzzrd86 added you as reviewers so you can eyeball and compare with #850, thanks! |
In Redpanda, every partition replica is assigned to a CPU core on a broker. While Redpanda's default <<partition-replica-balancing,partition balancing>> monitors cluster-level events, such as the addition of new brokers or broker failure to balance partition assignments, it does not account for the distribution of partitions _within_ an individual broker. | ||
|
||
Prior to Redpanda version 24.2, this meant that some cores on a broker could inadvertently host many partitions of heavily-used topics and cause the CPU to be xref:manage:monitoring.adoc#cpu-usage[overburdened]. Additionally, when the partition rebalance moved some partitions away from a broker, the remaining partitions did not necessarily rebalance across the broker's cores. Or, if a broker's core count was increased, Redpanda did not assign any partitions to the new cores until new partitions were created or old partitions were moved out. | ||
|
||
Starting in v24.2, topic-aware intra-broker partition balancing allows for dynamically reassigning partitions within a broker. Redpanda prioritizes an even distribution of a topic's partition replicas across all cores in a broker. If a broker's core count changes, when the broker starts back up, Redpanda can check partition assignments across the broker's cores and reassign partitions, so that a balanced assignment is maintained across all cores. Redpanda can also check partition assignments when partitions are added to or removed from a broker, and rebalance the remaining partitions between cores. | ||
|
||
NOTE: Decreasing the number of CPU cores in a running cluster is supported from v24.2 only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will 24.2 docs still contain info about the curl command? Otherwise there is no way for the users to enable it for 24.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes 24.2 is currently what's in main
. Before we release 24.3, we cut a new maintenance branch off main
for 24.2 and then merge v-WIP/24.3
into main
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this note. Add the ability to decrease core count in production to the ‘Whats new’
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this note. Add the ability to decrease core count in production to the ‘Whats new’
Agreed. End users might not realize that the ability to decrease core count is tied to intra-broker partition balancing. Additionally, this is a significant improvement and deserves a mention in the 'What's New' section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @JakeSCahill @daisukebe , we'll add a note in our upcoming What's New: https://github.com/redpanda-data/docs/pull/865/files#diff-7e9028daec2320182888e36f1be6d5a941c79362fab72135786f5b0cb0456a30R20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
curl -X PUT -d '{"state": "active"}' http://127.0.0.1:9644/v1/features/node_local_core_assignment | ||
``` | ||
==== | ||
|
||
In Redpanda, every partition replica is assigned to a CPU core on a broker. While Redpanda's default <<partition-replica-balancing,partition balancing>> monitors cluster-level events, such as the addition of new brokers or broker failure to balance partition assignments, it does not account for the distribution of partitions _within_ an individual broker. | ||
|
||
Prior to Redpanda version 24.2, this meant that some cores on a broker could inadvertently host many partitions of heavily-used topics and cause the CPU to be xref:manage:monitoring.adoc#cpu-usage[overburdened]. Additionally, when the partition rebalance moved some partitions away from a broker, the remaining partitions did not necessarily rebalance across the broker's cores. Or, if a broker's core count was increased, Redpanda did not assign any partitions to the new cores until new partitions were created or old partitions were moved out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this. Anything prior to 24.2 is irrelevant for 24.3 users.
curl -X PUT -d '{"state": "active"}' http://127.0.0.1:9644/v1/features/node_local_core_assignment | ||
``` | ||
==== | ||
|
||
In Redpanda, every partition replica is assigned to a CPU core on a broker. While Redpanda's default <<partition-replica-balancing,partition balancing>> monitors cluster-level events, such as the addition of new brokers or broker failure to balance partition assignments, it does not account for the distribution of partitions _within_ an individual broker. | ||
|
||
Prior to Redpanda version 24.2, this meant that some cores on a broker could inadvertently host many partitions of heavily-used topics and cause the CPU to be xref:manage:monitoring.adoc#cpu-usage[overburdened]. Additionally, when the partition rebalance moved some partitions away from a broker, the remaining partitions did not necessarily rebalance across the broker's cores. Or, if a broker's core count was increased, Redpanda did not assign any partitions to the new cores until new partitions were created or old partitions were moved out. | ||
|
||
Starting in v24.2, topic-aware intra-broker partition balancing allows for dynamically reassigning partitions within a broker. Redpanda prioritizes an even distribution of a topic's partition replicas across all cores in a broker. If a broker's core count changes, when the broker starts back up, Redpanda can check partition assignments across the broker's cores and reassign partitions, so that a balanced assignment is maintained across all cores. Redpanda can also check partition assignments when partitions are added to or removed from a broker, and rebalance the remaining partitions between cores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, referencing the version in versioned docs is distracting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Description
node_local_core_assignment
feature flag. The flag is now enabled by default as of 24.3.node_local_core_assignment
is enabled - the same feature flag that enables intra-broker partition balancing.Resolves https://github.com/redpanda-data/documentation-private/issues/
Review deadline: 15 Nov
Page previews
24.3 > Cluster balancing > Intra-broker partition balancing
Checks