[BUG] The engine isn't usable when running with a large number of databases. #1597

xbasel · 2025-01-21T11:18:20Z

Valkey supports up to INT_MAX databases. While most customers use the default (16 databases), if Valkey is started with a large number of databases, the engine's CPU usage increases significantly, rendering it almost unusable, even when the databases are empty and there is no incoming traffic.

For example, an engine which was started with

valkey-server --databases 10000000

exhibits a latency of over 200 milliseconds:

✗ time bin/valkey-cli ping
PONG
bin/valkey-cli ping  0.00s user 0.00s system 1% cpu 0.225 total

Perf shows:

# Children      Self  Command        Shared Object     Symbol
# ........  ........  .............  ................  .......................................
#
    97.78%     0.00%  valkey-server  valkey-server     [.] _start
            |
            ---_start
               __libc_start_main_impl (inlined)
               __libc_start_call_main
               main
               aeMain
               aeProcessEvents
               |
                --97.77%--processTimeEvents
                          serverCron
                          |
                           --97.73%--databasesCron
                                     |
                                     |--95.31%--activeExpireCycle
                                     |          |
                                     |           --76.50%--kvstoreSize
                                     |
                                      --2.42%--kvstoreSize

We can see the the CPU is busy (%70 utilization):

bin/valkey-cli info all |egrep "uptime|used_cpu_"
uptime_in_seconds:410
uptime_in_days:0
used_cpu_sys:1.842625
used_cpu_user:290.709860
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:1.842629
used_cpu_user_main_thread:290.709501

Expected Behavior
The engine should remain responsive with low CPU usage when handling empty databases without any load.

Proposed Solutions

Lazy Initialization: Avoid allocating memory and data structures for databases that are never accessed. Initialize databases only when they are explicitly used.
Limit Maximum Databases: If the current architecture makes efficient handling of large numbers of databases infeasible, reduce the maximum number of databases supported or clearly document this limitation.

The text was updated successfully, but these errors were encountered:

enjoy-binbin · 2025-01-21T12:16:03Z

this is something mention in this thread redis/redis#12738 (comment). @soloestoy Thoughts?

ranshid · 2025-01-21T16:38:03Z

can't we just keep a linked list of dbs with expiry? or use the BIT as @zuiderkwast suggested before?

soloestoy · 2025-01-22T02:31:29Z

I think we should set a reasonable limit on the number of databases. Even when using bitmaps, setting a bitmap for INT_MAX databases would still require 256MB of space, which is clearly not suitable. Moreover, with such a large bitmap, the efficiency of computations each time would also be an issue.

xbasel · 2025-01-22T11:45:37Z

@ranshid
replying to #1319 (comment)
I'm not sure how jemalloc handles calloc. But I did a simple experiment, by allocating large memory using calloc, which ended up running mmap for this pages region:

709783000000-709834000000 rw-p 00000000 00:00 0
Size:            2899968 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
ProtectionKey:         0
VmFlags: rd wr mr mw me nr sd

Private_Dirty is 0, it seems that memset isn't actually invoked, but this can be verified.

Regarding bitmap, I'm not sure why we need this, is changing databases that common and latency sensitive? I thought maybe we can use a hastable directly. If there's a reason to use bitmap, and we're worried about memory usage, we can use bloom filter, occasionally getting false positives shouldn't be an issue.

But the question really is whether this bug is worth fixing.

zuiderkwast · 2025-01-22T11:54:07Z

Personally, I'm only concerned that if we enable databases for cluster mode, the used memory for an empty node jumps 10 times, from 1MB to 10MB. It can be seen as a regression, which sticks out.

If users use thousands of databases, it's their own problem IMAO.

xbasel · 2025-01-22T11:57:21Z

If we think #1601 is worth implementing, this change could be further justified.

madolson · 2025-01-22T14:28:09Z

If users use thousands of databases, it's their own problem IMAO.

I kind of agree with this. As long as the default behavior stays the same, it should be OK that setting databases to some high value causes issues. The only concern I have is that the default database value is 16, which we artificially rewrite to be 1, but some users could be passing in 16 at startup. We shouldn't regress too much in that case.

ranshid · 2025-01-22T21:02:54Z

Regarding bitmap, I'm not sure why we need this, is changing databases that common and latency sensitive? I thought maybe we can use a hastable directly. If there's a reason to use bitmap, and we're worried about memory usage, we can use bloom filter, occasionally getting false positives shouldn't be an issue.

I am not sure what you are referring. I meant using the Binary Index Tree (not exactly a bitmap) in order to manage the sizes of the dbs and their expiry in order to manage the activeExpiry better. I am not sure, though, a simple linked list of db's with expiry would not do the work still.

If each DB uses 3 kvstores (expiry, keys, sharded pubsubs) so it allocates 256KB upon creation so each DB will have about 770KB zmalloc size right? so that does not sound like a disaster in terms of a single DB but scale bad for thousands.
I agree this might not be so bad for the rare cases of big number of dbs. I only thought that if we can try placing the BIT creation and hashtables array creation lazily and clean them in case the DB is empty we might easily solve the problem indicated by @madolson :

The only concern I have is that the default database value is 16, which we artificially rewrite to be 1, but some users could be passing in 16 at startup. We shouldn't regress too much in that case.

soloestoy · 2025-01-23T03:31:46Z

IMHO, if we support multi databases in cluster, an empty node' memory jumps from 1MB to 10MB is not a big deal. Empty is just a temporary state, it's impossible that users just create empty node for fun. And remember that we have already change the default repl-backlog-size from 1MB to 10MB, so the memory increasement here is acceptable.

ranshid · 2025-01-23T04:41:07Z

IMHO, if we support multi databases in cluster, an empty node' memory jumps from 1MB to 10MB is not a big deal. Empty is just a temporary state, it's impossible that users just create empty node for fun. And remember that we have already change the default repl-backlog-size from 1MB to 10MB, so the memory increasement here is acceptable.

I agree that I would not invest in a complicated change just to solve this. I think that if we do have a quick ans simple way to just better support the default config it might valid to do it, but we can also do as a follow-up.

madolson · 2025-01-23T05:40:44Z

IMHO, if we support multi databases in cluster, an empty node' memory jumps from 1MB to 10MB is not a big deal. Empty is just a temporary state, it's impossible that users just create empty node for fun. And remember that we have already change the default repl-backlog-size from 1MB to 10MB, so the memory increasement here is acceptable.

I think we're all coming at this from a managed service provider where 9MB is not a big deal. I've seen self-hosted folks complain about this size of memory jump though, since it's often multi-tenant and sometimes running on limited hardware. If it's easy to do it's better to not have to announce a big memory jump in a new version, it would be better for it to behave the same and better when a user upgrades.

xbasel mentioned this issue Jan 22, 2025

[NEW] Introduce database logical names #1601

Open

ranshid mentioned this issue Jan 22, 2025

[NEW] Multiple DB supports in cluster mode #1319

Open

xbasel self-assigned this Jan 23, 2025

xbasel linked a pull request Jan 24, 2025 that will close this issue

[DRAFT] Switch from preallocating all databases to lazy allocation, reducing overhead and improving scalability for large database counts. #1609

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] The engine isn't usable when running with a large number of databases. #1597

[BUG] The engine isn't usable when running with a large number of databases. #1597

xbasel commented Jan 21, 2025

enjoy-binbin commented Jan 21, 2025 •

edited

Loading

ranshid commented Jan 21, 2025

soloestoy commented Jan 22, 2025

xbasel commented Jan 22, 2025 •

edited

Loading

zuiderkwast commented Jan 22, 2025

xbasel commented Jan 22, 2025 •

edited

Loading

madolson commented Jan 22, 2025

ranshid commented Jan 22, 2025 •

edited

Loading

soloestoy commented Jan 23, 2025

ranshid commented Jan 23, 2025

madolson commented Jan 23, 2025

[BUG] The engine isn't usable when running with a large number of databases. #1597

[BUG] The engine isn't usable when running with a large number of databases. #1597

Comments

xbasel commented Jan 21, 2025

enjoy-binbin commented Jan 21, 2025 • edited Loading

ranshid commented Jan 21, 2025

soloestoy commented Jan 22, 2025

xbasel commented Jan 22, 2025 • edited Loading

zuiderkwast commented Jan 22, 2025

xbasel commented Jan 22, 2025 • edited Loading

madolson commented Jan 22, 2025

ranshid commented Jan 22, 2025 • edited Loading

soloestoy commented Jan 23, 2025

ranshid commented Jan 23, 2025

madolson commented Jan 23, 2025

enjoy-binbin commented Jan 21, 2025 •

edited

Loading

xbasel commented Jan 22, 2025 •

edited

Loading

xbasel commented Jan 22, 2025 •

edited

Loading

ranshid commented Jan 22, 2025 •

edited

Loading