Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 20 additions & 24 deletions docs/admin/sharding-partitioning.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
(sharding-partitioning)=

# Sharding and Partitioning
# Sharding and Partitioning 101

## Introduction

Expand Down Expand Up @@ -92,23 +92,7 @@ INSERT INTO second_table (ts, val) VALUES (1620415701974, 2.31);
We can see that there are now 8 shards for the table `second_table` in the
cluster.

:::{danger}
**Over-sharding and over-partitioning**

Sharding can drastically improve the performance on large datasets.
However, having too many small shards will most likely degrade performance.
Over-sharding and over-partitioning are common flaws leading to an overall
poor performance.

**As a rule of thumb, a single shard should hold somewhere between 5 - 50
GB of data.**

To avoid oversharding, CrateDB by default limits the number of shards per
node to 1000. Any operation that would exceed that limit, leads to an
exception.
:::

## How to choose your sharding and partitioning strategy
## Strategy

An optimal sharding and partitioning strategy always depends on the specific
use case and should typically be determined by conducting
Expand All @@ -119,13 +103,25 @@ for a benchmark.
- Identify the record size
- Calculate the throughput

Then, to calculate the number of shards, you should consider that the size of each
shard should roughly be between 5 - 50 GB, and that each node can only manage
up to 1000 shards.
Then, to calculate the number of shards, consider that each shard should
roughly be between 5-50 GB, and that each node can manage
up to 1_000 shards by default.

:::{caution}
**Over-sharding and over-partitioning**

Sharding can drastically improve the performance on large datasets.
However, having too many small shards will most likely degrade performance.
Over-sharding and over-partitioning are common flaws leading to an overall
poor performance.

Learn how to discover an optimal sharding strategy for your dataset
in the {ref}`sharding-guide`.
:::

### Time series example
## Example

To illustrate the steps above, let's use them on behalf of an example. Imagine
Let's create a basic sharding strategy by using a concrete example. Imagine
you want to create a *partitioned table* on a *three-node cluster* to store
time series data with the following assumptions:

Expand All @@ -136,7 +132,7 @@ time series data with the following assumptions:
Given the daily throughput is around 10 GB/day, the monthly throughput is 30 times
that (~ 300 GB). The partition column can be day, week, month, quarter, etc. So,
assuming a monthly partition, the next step is to calculate the number of shards
with the **shard size recommendation** (5 - 50 GB) and the **number of nodes** in
with the **shard size recommendation** (5-50 GB) and the **number of nodes** in
the cluster in mind.

With three shards, each shard would hold 100 GB (300 GB / 3 shards), which is above
Expand Down
2 changes: 1 addition & 1 deletion docs/feature/cluster/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ data loss, and to improve read performance.
## Synopsis
With a monthly throughput of 300 GB, partitioning your table by month,
and using six shards, each shard will manage 50 GB of data, which is
within the recommended size range (5 - 50 GB).
within the recommended size range (5-50 GB).

Through replication, the table will store three copies of your data,
in order to reduce the chance of permanent data loss.
Expand Down
2 changes: 1 addition & 1 deletion docs/integrate/locust/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,7 @@ If you want to download the locust data, you can do that on the last tab.
When you want to run a load test against a CrateDB Cluster with multiple queries, Locust is a great and flexible tool that lets you quickly define a load test and see what numbers regarding users and RPS are possible for that particular setup.


[CrateDB CLI tools]: https://cratedb.com/docs/crate/clients-tools/en/latest/connect/cli.html#cli
[CrateDB CLI tools]: project:#cli
[DBeaver]: https://dbeaver.io
[fully-managed]: https://console.cratedb.cloud/
[Locust]: https://locust.io
Expand Down
16 changes: 6 additions & 10 deletions docs/performance/scaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,25 +48,21 @@ large result sets from CrateDB][fetching large result sets from cratedb].

## Number of shards

In CrateDB data in tables and partitions is distributed in storage units that we
call shards.
In CrateDB data in tables and partitions is distributed in storage units
called "shards".

If we do not specify how many shards we want for a table/partition CrateDB will
If we do not specify how many shards we want for a table/partition, CrateDB will
derive a default from the number of nodes.

CrateDB also has replicas of data and this results in additional shards in the
cluster.

Having too many or too few shards has performance implications, so it is very
important to get familiar with the {ref}`Sharding Performance Guide
<sharding-guide>`.
important to get familiar with the {ref}`sharding-guide`.

In particular, there is a soft limit of 1000 shards per node; so table schemas,
partitioning strategy, and number of nodes need to be planned to stay well below
this limit, one strategy can be to aim for a configuration where even if one node
in the cluster is lost the remaining nodes would still have less than 1000 shards.
in the cluster is lost, the remaining nodes would still have less than 1000 shards.

If this was not considered when initially defining the tables we have the
If this was not taken into account when initially defining the tables, we have the
following considerations:

- changing the partitioning strategy requires creating a new table and copying
Expand Down
181 changes: 121 additions & 60 deletions docs/performance/sharding.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,139 @@
(sharding-guide)=

(sharding-performance)=

# Sharding Performance Guide
# Sharding performance guide

:::{div} sd-text-muted
Applying sharding can drastically improve the performance on large datasets.
:::

This document is a sharding best practice guide for CrateDB.
A brief recap: CrateDB tables are split into a configured number of shards.
These shards are distributed across the cluster to optimize concurrent and
parallel data processing.

Whenever possible, CrateDB will parallelize query workloads and distribute them
across the whole cluster. The more CPUs this query workload can be distributed
across, the faster the query will run.

:::{seealso}
This guide assumes you know the basics.
If you are looking for an intro to sharding, see also the
{ref}`sharding-partitioning` and the
{ref}`sharding reference <crate-reference:ddl-sharding>` documentation.
:::


## General recommendations

To avoid running your clusters with too many shards or too large shards,
implement the following guidelines as a rule of thumb:

- Use shard sizes between 5 GB and 50 GB.

A brief recap: CrateDB tables are split into a configured number of shards, and
then these shards are distributed across the cluster.
- Keep the number of records on each shard below 200 million.

Finding the right balance when it comes to sharding will vary on a lot of
things. While it is generally advisable to slightly over-allocate, we
recommend to benchmark your particular setup to find the sweet spot to
implement an appropriate sharding strategy.

Figuring out how many shards to use for your tables requires you to think about
the type of data you're processing, the types of queries you're running, and
the type of hardware you're using.
the type of data you are processing, the types of queries you are running, and
the type of hardware you are using.

:::{NOTE}
This guide assumes you know the basics.
- Too many shards can degrade search performance and make the cluster unstable.
This is referred to as _oversharding_.

If you are looking for an intro to sharding, see {ref}`sharding
<crate-reference:ddl-sharding>`.
:::
- Very large shards can slow down cluster operations and prolong recovery times
after failures.

## Optimising for query performance
## Sizing considerations

(sharding-under-allocation)=
General principles require careful consideration of cluster
sizing and architecture.
Keep the following things in mind when building your sharding strategy.
Each shard incurs overhead in terms of open files, RAM allocation, and CPU cycles
for maintenance operations.

### Shard size vs. number of shards

The optimal approach balances shard count with shard size. Individual shards should
typically contain 5-50 GB of data, being the sweet spot for most
workloads. In large clusters, this often means fewer shards than total CPU cores,
as larger shards can still be processed efficiently by multiple CPU cores during
query execution.

### Shard-per-CPU ratio

If most nodes have more shards per table than they have CPUs, the cluster can
experience performance degradations.
For example, on clusters with substantial CPU resources (e.g., 8 nodes × 32 CPUs
= 256 total CPUs), creating 256+ shards per table often proves counterproductive.
If you don't manually set the number of shards per table, CrateDB will make a
best guess, based on the assumption that your nodes have two CPUs each.
The general advice is to calculate with 1 shard per CPU as a starting point.

### 1000 shards per node limit

To avoid _oversharding_, CrateDB by default limits the number of shards per node to
1_000 as a protection limit. Any operation that would exceed that limit
leads to an exception.
For an 8-node cluster, this allows up to 8_000 total shards across all tables.
Approaching this limit typically indicates a suboptimal sharding strategy rather
than optimal performance tuning. See also relevant documentation about
{ref}`table reconfiguration <number-of-shards>` wrt. sharding options.

### Partitions

If you are using {ref}`partitioned tables <crate-reference:partitioned-tables>`,
note that each partition is clustered into as many shards as you configure
for the table.

For example, a table with four shards and two partitions will have eight
shards that can be commonly queried across. But a query that only touches
one partition will only query across four shards.

How this factors into balancing your shard allocation will depend on the
types of queries you intend to run.

### Under-allocation is bad
### Replicas

CrateDB uses replicas for both data durability and query performance. When a
node goes down, replicas ensure no data is lost. For read operations, CrateDB
randomly distributes queries across both primary and replica shards, improving
concurrent read throughput.

Each replica adds to the total shard count in the cluster. By default, CrateDB
uses the replica setting `0-1` on newly created tables, resulting in twice the
number of configured shards. The more replicas you add, the higher the
multiplier (x3, x4, etc.) for capacity planning

See the {ref}`replication reference <crate-reference:ddl-replication>`
documentation for more details.

### Segments

The number of segments within a shard affects query performance because more
segments have to be visited.

Comment on lines +115 to +119
Copy link
Member Author

@amotl amotl Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matriv, @seut: This section is a bit flat. Do you have any suggestions to unflatten it slightly, to possibly provide better insights and guidance?

Copy link
Member Author

@amotl amotl Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## Notes

:::{caution}
:class: hero
Balancing the number and size of your shards is important for the performance
and stability of your CrateDB clusters.
:::

(sharding-under-allocation)=
### Avoid under-allocation

:::{CAUTION}
If you have fewer shards than CPUs in the cluster, this is called
*under-allocation*, and it means you're not getting the best performance out
of CrateDB.
:::

Whenever possible, CrateDB will parallelize query workloads and distribute them
across the whole cluster. The more CPUs this query workload can be distributed
across, the faster the query will run.

To increase the chances that a query can be parallelized and distributed
maximally, there should be at least as many shards for a table than there are
CPUs in the cluster. This is because CrateDB will automatically balance shards
Expand All @@ -45,7 +143,8 @@ In summary: the smaller your shards are, the more of them you will have, and so
the more likely it is that they will be distributed across the whole cluster,
and hence across all of your CPUs, and hence the faster your queries will run.

### Significant over-allocation is bad
(sharding-over-allocation)=
### Avoid extensive over-allocation

:::{CAUTION}
If you have more shards per table than CPUs, this is called *over-allocation*. A
Expand All @@ -57,48 +156,10 @@ When you have slightly more shards per table than CPUs, you ensure that query
workloads can be parallelized and distributed maximally, which in turn ensures
maximal query performance.

However, if most nodes have more shards per table than they have CPUs, you
could actually see performance degradation. Each shard comes with a cost in
terms of open files, RAM, and CPU cycles. Smaller shards also means small shard
indexes, which can adversely affect computed search term relevance.

For performance reasons, one thousand shards per table per node is considered
the highest recommended configuration. If you exceed this you will experience a
failing cluster check.

### Balancing allocation

Finding the right balance when it comes to sharding will vary on a lot of
things. And while it's generally advisable to slightly over-allocate, it's also
a good idea to benchmark your particular setup so as to find the sweet spot.

If you don't manually set the number of shards per table, CrateDB will make a best guess,
based on the assumption that your nodes have two CPUs each.

:::{TIP}
For the purposes of calculating how many shards a table should be clustered
into, you can typically ignore replica partitions as these are not usually
queried across for reads.
:::

:::{CAUTION}
If you are using {ref}`partitioned tables <crate-reference:partitioned-tables>`,
note that each partition is
clustered into as many shards as you configure for the table.

For example, a table with four shards and two partitions will have eight
shards that can be commonly queried across. But a query that only touches
one partition will only query across four shards.

How this factors into balancing your shard allocation will depend on the
types of queries you intend to run.
:::

(sharding-ingestion)=
### Optimize for ingestion

## Optimising for ingestion performance

As with [Optimising for query performance], when doing heavy ingestion, it is
When doing heavy ingestion, it is
good to cluster a table across as many nodes as possible. However, [we have
found][we have found] that ingestion throughput can often increase as the table shard per CPU
ratio on each node *decreases*.
Expand All @@ -108,7 +169,7 @@ sizes, batch insert size, and the hardware. In particular: using solid-state
drives (SSDs) instead of hard-disk drives (HDDs) can massively increase
ingestion throughput.

It's a good idea to benchmark your particular setup so as to find the sweet
We recommend to benchmark your particular ingest workload to find the sweet
spot.

[we have found]: https://cratedb.com/blog/big-cluster-insights-ingesting