-
Notifications
You must be signed in to change notification settings - Fork 1
Sharding strategy and recommendations #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
11e641f
Sharding: Add section "Sizing considerations"
WalBeh 8375f1c
Sharding: Modernize and refactor, just regular copy-editing
amotl 27b5c39
Sharding: Remove "Smaller shards also result in reduced efficiency..."
amotl fdbf9bd
Sharding: Remove 3-70 GB recommendation. Recommend 10-50 GB.
amotl 710e7c5
Sharding: Explain "1000 shards per node" limit as "protection limit"
amotl 08c76fd
Sharding: Relocate notes about finding the right balance to "general"
amotl 20aa42a
Sharding: Rename bottom section and relocate subsection
amotl bcd13a6
Sharding: Add miniature section about "segments"
amotl 1227850
Sharding: Remove statement about ignoring replica partitions
amotl ca082cc
Sharding: Implement suggestions by CodeRabbit
amotl d4ca40a
Sharding: Implement suggestions by Marios
amotl 580221a
Chore: Fix broken link reference
amotl fdc8076
Sharding/Replicas: Implement suggestions by Marios and CodeRabbit
amotl 2849eb7
Sharding/Shard-per-CPU: Implement suggestions by Marios
amotl 1d73a79
Sharding: Use "5-50 GB" size recommendation
amotl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,41 +1,139 @@ | ||
| (sharding-guide)= | ||
|
|
||
| (sharding-performance)= | ||
|
|
||
| # Sharding Performance Guide | ||
| # Sharding performance guide | ||
|
|
||
| :::{div} sd-text-muted | ||
| Applying sharding can drastically improve the performance on large datasets. | ||
| ::: | ||
|
|
||
| This document is a sharding best practice guide for CrateDB. | ||
| A brief recap: CrateDB tables are split into a configured number of shards. | ||
| These shards are distributed across the cluster to optimize concurrent and | ||
| parallel data processing. | ||
|
|
||
| Whenever possible, CrateDB will parallelize query workloads and distribute them | ||
| across the whole cluster. The more CPUs this query workload can be distributed | ||
| across, the faster the query will run. | ||
|
|
||
| :::{seealso} | ||
| This guide assumes you know the basics. | ||
| If you are looking for an intro to sharding, see also the | ||
| {ref}`sharding-partitioning` and the | ||
| {ref}`sharding reference <crate-reference:ddl-sharding>` documentation. | ||
| ::: | ||
|
|
||
|
|
||
| ## General recommendations | ||
|
|
||
| To avoid running your clusters with too many shards or too large shards, | ||
| implement the following guidelines as a rule of thumb: | ||
|
|
||
| - Use shard sizes between 5 GB and 50 GB. | ||
|
|
||
| A brief recap: CrateDB tables are split into a configured number of shards, and | ||
| then these shards are distributed across the cluster. | ||
| - Keep the number of records on each shard below 200 million. | ||
|
|
||
| Finding the right balance when it comes to sharding will vary on a lot of | ||
| things. While it is generally advisable to slightly over-allocate, we | ||
| recommend to benchmark your particular setup to find the sweet spot to | ||
| implement an appropriate sharding strategy. | ||
|
|
||
| Figuring out how many shards to use for your tables requires you to think about | ||
| the type of data you're processing, the types of queries you're running, and | ||
| the type of hardware you're using. | ||
| the type of data you are processing, the types of queries you are running, and | ||
| the type of hardware you are using. | ||
|
|
||
| :::{NOTE} | ||
| This guide assumes you know the basics. | ||
| - Too many shards can degrade search performance and make the cluster unstable. | ||
| This is referred to as _oversharding_. | ||
|
|
||
| If you are looking for an intro to sharding, see {ref}`sharding | ||
| <crate-reference:ddl-sharding>`. | ||
| ::: | ||
| - Very large shards can slow down cluster operations and prolong recovery times | ||
| after failures. | ||
|
|
||
| ## Optimising for query performance | ||
| ## Sizing considerations | ||
|
|
||
| (sharding-under-allocation)= | ||
| General principles require careful consideration of cluster | ||
| sizing and architecture. | ||
| Keep the following things in mind when building your sharding strategy. | ||
| Each shard incurs overhead in terms of open files, RAM allocation, and CPU cycles | ||
| for maintenance operations. | ||
|
|
||
| ### Shard size vs. number of shards | ||
|
|
||
| The optimal approach balances shard count with shard size. Individual shards should | ||
| typically contain 5-50 GB of data, being the sweet spot for most | ||
| workloads. In large clusters, this often means fewer shards than total CPU cores, | ||
| as larger shards can still be processed efficiently by multiple CPU cores during | ||
| query execution. | ||
amotl marked this conversation as resolved.
Show resolved
Hide resolved
amotl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### Shard-per-CPU ratio | ||
|
|
||
| If most nodes have more shards per table than they have CPUs, the cluster can | ||
| experience performance degradations. | ||
| For example, on clusters with substantial CPU resources (e.g., 8 nodes × 32 CPUs | ||
| = 256 total CPUs), creating 256+ shards per table often proves counterproductive. | ||
| If you don't manually set the number of shards per table, CrateDB will make a | ||
| best guess, based on the assumption that your nodes have two CPUs each. | ||
| The general advice is to calculate with 1 shard per CPU as a starting point. | ||
|
|
||
| ### 1000 shards per node limit | ||
|
|
||
| To avoid _oversharding_, CrateDB by default limits the number of shards per node to | ||
| 1_000 as a protection limit. Any operation that would exceed that limit | ||
| leads to an exception. | ||
| For an 8-node cluster, this allows up to 8_000 total shards across all tables. | ||
| Approaching this limit typically indicates a suboptimal sharding strategy rather | ||
| than optimal performance tuning. See also relevant documentation about | ||
| {ref}`table reconfiguration <number-of-shards>` wrt. sharding options. | ||
|
|
||
| ### Partitions | ||
|
|
||
| If you are using {ref}`partitioned tables <crate-reference:partitioned-tables>`, | ||
| note that each partition is clustered into as many shards as you configure | ||
| for the table. | ||
|
|
||
| For example, a table with four shards and two partitions will have eight | ||
| shards that can be commonly queried across. But a query that only touches | ||
| one partition will only query across four shards. | ||
|
|
||
| How this factors into balancing your shard allocation will depend on the | ||
| types of queries you intend to run. | ||
|
|
||
| ### Under-allocation is bad | ||
| ### Replicas | ||
|
|
||
| CrateDB uses replicas for both data durability and query performance. When a | ||
| node goes down, replicas ensure no data is lost. For read operations, CrateDB | ||
| randomly distributes queries across both primary and replica shards, improving | ||
| concurrent read throughput. | ||
|
|
||
| Each replica adds to the total shard count in the cluster. By default, CrateDB | ||
| uses the replica setting `0-1` on newly created tables, resulting in twice the | ||
| number of configured shards. The more replicas you add, the higher the | ||
| multiplier (x3, x4, etc.) for capacity planning | ||
|
|
||
| See the {ref}`replication reference <crate-reference:ddl-replication>` | ||
| documentation for more details. | ||
|
|
||
| ### Segments | ||
|
|
||
| The number of segments within a shard affects query performance because more | ||
| segments have to be visited. | ||
|
|
||
|
Comment on lines
+115
to
+119
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| ## Notes | ||
|
|
||
| :::{caution} | ||
| :class: hero | ||
| Balancing the number and size of your shards is important for the performance | ||
| and stability of your CrateDB clusters. | ||
| ::: | ||
|
|
||
| (sharding-under-allocation)= | ||
| ### Avoid under-allocation | ||
|
|
||
| :::{CAUTION} | ||
| If you have fewer shards than CPUs in the cluster, this is called | ||
| *under-allocation*, and it means you're not getting the best performance out | ||
| of CrateDB. | ||
| ::: | ||
|
|
||
| Whenever possible, CrateDB will parallelize query workloads and distribute them | ||
| across the whole cluster. The more CPUs this query workload can be distributed | ||
| across, the faster the query will run. | ||
|
|
||
| To increase the chances that a query can be parallelized and distributed | ||
| maximally, there should be at least as many shards for a table than there are | ||
| CPUs in the cluster. This is because CrateDB will automatically balance shards | ||
|
|
@@ -45,7 +143,8 @@ In summary: the smaller your shards are, the more of them you will have, and so | |
| the more likely it is that they will be distributed across the whole cluster, | ||
| and hence across all of your CPUs, and hence the faster your queries will run. | ||
|
|
||
| ### Significant over-allocation is bad | ||
| (sharding-over-allocation)= | ||
| ### Avoid extensive over-allocation | ||
|
|
||
| :::{CAUTION} | ||
| If you have more shards per table than CPUs, this is called *over-allocation*. A | ||
|
|
@@ -57,48 +156,10 @@ When you have slightly more shards per table than CPUs, you ensure that query | |
| workloads can be parallelized and distributed maximally, which in turn ensures | ||
| maximal query performance. | ||
|
|
||
| However, if most nodes have more shards per table than they have CPUs, you | ||
| could actually see performance degradation. Each shard comes with a cost in | ||
| terms of open files, RAM, and CPU cycles. Smaller shards also means small shard | ||
| indexes, which can adversely affect computed search term relevance. | ||
|
|
||
| For performance reasons, one thousand shards per table per node is considered | ||
| the highest recommended configuration. If you exceed this you will experience a | ||
| failing cluster check. | ||
|
|
||
| ### Balancing allocation | ||
|
|
||
| Finding the right balance when it comes to sharding will vary on a lot of | ||
| things. And while it's generally advisable to slightly over-allocate, it's also | ||
| a good idea to benchmark your particular setup so as to find the sweet spot. | ||
|
|
||
| If you don't manually set the number of shards per table, CrateDB will make a best guess, | ||
| based on the assumption that your nodes have two CPUs each. | ||
|
|
||
| :::{TIP} | ||
| For the purposes of calculating how many shards a table should be clustered | ||
| into, you can typically ignore replica partitions as these are not usually | ||
| queried across for reads. | ||
amotl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ::: | ||
|
|
||
| :::{CAUTION} | ||
| If you are using {ref}`partitioned tables <crate-reference:partitioned-tables>`, | ||
| note that each partition is | ||
| clustered into as many shards as you configure for the table. | ||
|
|
||
| For example, a table with four shards and two partitions will have eight | ||
| shards that can be commonly queried across. But a query that only touches | ||
| one partition will only query across four shards. | ||
|
|
||
| How this factors into balancing your shard allocation will depend on the | ||
| types of queries you intend to run. | ||
| ::: | ||
|
|
||
| (sharding-ingestion)= | ||
| ### Optimize for ingestion | ||
|
|
||
| ## Optimising for ingestion performance | ||
|
|
||
| As with [Optimising for query performance], when doing heavy ingestion, it is | ||
| When doing heavy ingestion, it is | ||
| good to cluster a table across as many nodes as possible. However, [we have | ||
| found][we have found] that ingestion throughput can often increase as the table shard per CPU | ||
| ratio on each node *decreases*. | ||
|
|
@@ -108,7 +169,7 @@ sizes, batch insert size, and the hardware. In particular: using solid-state | |
| drives (SSDs) instead of hard-disk drives (HDDs) can massively increase | ||
| ingestion throughput. | ||
|
|
||
| It's a good idea to benchmark your particular setup so as to find the sweet | ||
| We recommend to benchmark your particular ingest workload to find the sweet | ||
| spot. | ||
|
|
||
| [we have found]: https://cratedb.com/blog/big-cluster-insights-ingesting | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.