-
Notifications
You must be signed in to change notification settings - Fork 11
ScalienDB Sizing and Tuning
For a smoothly running database installation, please review this document to tune the configuration parameters below on all shard servers. It is highly recommended that all shard servers in the same quorum are of the same physical configuration (disk, RAM, CPU) and have identical settings in their scaliendb.conf
file:
PARAMETER WHERE DEFAULT VALUE TUNE? NOTE
-----------------------------------------------------------------------------------------------------
shardSplitSize Controller 500M no (should be shardSize)
database.chunkSize Shard Server 64M no
database.logSegmentSize Shard Server 64M no
database.fileChunkCacheSize Shard Server 256M yes
database.memoChunkCacheSize Shard Server 1G yes
database.logSize Shard Server 20G yes
database.replicatedLogSize Shard Server 10G yes
In the paragraphs below some other variables will appear. These cannot be set int the scaliendb.conf
file, these are just an aid for calculations (eg. chunksPerShard
).
To properly tune the parameters, first you have to decide how much data you want to store per shard server (quorum), let's call this dataSize
. Using these calculations you will determine how much RAM you should use, call this memorySize
. ScalienDB puts data into different chunk files, and ideally you want to keep this fragmentation low. The lower the fragmentation, the less disk seeks ScalienDB has to perform to serve GET and LIST requests. This is determined by chunksPerShard
, you want to keep this at about 4-8
(lower is better).
chunksPerShard = shardSplitSize / chunkSize (should be 4-8)
dataSize = numShards * shardSplitSize
memoChunkCacheSize = numShards * chunkSize
fileChunkCacheSize > numShards * chunksPerShard * 64K
memorySize = memoChunkCacheSize + fileChunkCacheSize + (Operating System) + (Safety Margin)
logSize > memoChunkCacheSize
diskSize > (dataSize * 1.5) + logSize + replicatedLogSize + (Operation System) + (Safety Margin)
At the end of the day:
chunksPerShard = dataSize / memoChunkCacheSize
So if you want a low fragmentation, that is chunksPerShard = 8
and to store dataSize = 480G
of data, you should set memoChunkCacheSize = 60G
, so you'll probably need to buy 64G of RAM. If you only have 32G, then no worries, you'll end up with double the fragmentation, which results in linear slowdown (that's the good kind).
It's best to start by leaving database.chunkSize = 64M
and database.logSegmentSize = 64M
. Since these are the defaults, you can skip these altogether. Continuing the previous example, if you have 64G of RAM, you set database.memoChunkCacheSize = 60G
. Leaving shardSplitSize
at 500M, you'll end up with numShards = 960
. So you'll need fileChunkCacheSize > 960 * 8 * 64K ~ 500M
so we set database.fileChunkCacheSize = 1G
. This leaves ample room for the OS and a safety margin. Note that the database.memoChunkCacheSize
will not be pre-allocated, and will only be ever used if you get close to your total database size.
The database.replicatedLogSize
is used for log based catchup if a shard server falls behind in the quorum. If you're writing at 1MB/s into this quorum, the default of 10G should give 10000 seconds or 2.5 hours of breathing room. If you have the disk space, you can set this to a higher number.
The database.logSize
is the maximum number of log files (in the logs
folder). In our example, since memoChunkCacheSize = 60G
we also set database.logSize = 60G
. Note that logs are replayed when you start (restart) ScalienDB, which may take several minutes for long logs.
So, we set on all the shard servers:
database.fileChunkCacheSize = 1G
database.memoChunkCacheSize = 60G
database.logSize = 60G
Everything can be left at the defaults.
If we want to store 480G but only have 32G of RAM, we'd set:
database.fileChunkCacheSize = 1G
database.memoChunkCacheSize = 28G
database.logSize = 28G
In this case we expect a 2x slowdown in GET and LISTs due to twice as many seeks compared to the 64G configuration.