Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The engine isn't usable when running with a large number of databases. #1597

Open
xbasel opened this issue Jan 21, 2025 · 11 comments · May be fixed by #1609
Open

[BUG] The engine isn't usable when running with a large number of databases. #1597

xbasel opened this issue Jan 21, 2025 · 11 comments · May be fixed by #1609
Assignees

Comments

@xbasel
Copy link
Member

xbasel commented Jan 21, 2025

Valkey supports up to INT_MAX databases. While most customers use the default (16 databases), if Valkey is started with a large number of databases, the engine's CPU usage increases significantly, rendering it almost unusable, even when the databases are empty and there is no incoming traffic.

For example, an engine which was started with

valkey-server --databases 10000000

exhibits a latency of over 200 milliseconds:

✗ time bin/valkey-cli ping
PONG
bin/valkey-cli ping  0.00s user 0.00s system 1% cpu 0.225 total

Perf shows:

# Children      Self  Command        Shared Object     Symbol
# ........  ........  .............  ................  .......................................
#
    97.78%     0.00%  valkey-server  valkey-server     [.] _start
            |
            ---_start
               __libc_start_main_impl (inlined)
               __libc_start_call_main
               main
               aeMain
               aeProcessEvents
               |
                --97.77%--processTimeEvents
                          serverCron
                          |
                           --97.73%--databasesCron
                                     |
                                     |--95.31%--activeExpireCycle
                                     |          |
                                     |           --76.50%--kvstoreSize
                                     |
                                      --2.42%--kvstoreSize

We can see the the CPU is busy (%70 utilization):

bin/valkey-cli info all |egrep "uptime|used_cpu_"
uptime_in_seconds:410
uptime_in_days:0
used_cpu_sys:1.842625
used_cpu_user:290.709860
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:1.842629
used_cpu_user_main_thread:290.709501

Expected Behavior
The engine should remain responsive with low CPU usage when handling empty databases without any load.

Proposed Solutions

Lazy Initialization: Avoid allocating memory and data structures for databases that are never accessed. Initialize databases only when they are explicitly used.
Limit Maximum Databases: If the current architecture makes efficient handling of large numbers of databases infeasible, reduce the maximum number of databases supported or clearly document this limitation.
@enjoy-binbin
Copy link
Member

enjoy-binbin commented Jan 21, 2025

this is something mention in this thread redis/redis#12738 (comment). @soloestoy Thoughts?

@ranshid
Copy link
Member

ranshid commented Jan 21, 2025

can't we just keep a linked list of dbs with expiry? or use the BIT as @zuiderkwast suggested before?

@soloestoy
Copy link
Member

I think we should set a reasonable limit on the number of databases. Even when using bitmaps, setting a bitmap for INT_MAX databases would still require 256MB of space, which is clearly not suitable. Moreover, with such a large bitmap, the efficiency of computations each time would also be an issue.

@xbasel
Copy link
Member Author

xbasel commented Jan 22, 2025

@ranshid
replying to #1319 (comment)
I'm not sure how jemalloc handles calloc. But I did a simple experiment, by allocating large memory using calloc, which ended up running mmap for this pages region:

709783000000-709834000000 rw-p 00000000 00:00 0
Size:            2899968 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
ProtectionKey:         0
VmFlags: rd wr mr mw me nr sd

Private_Dirty is 0, it seems that memset isn't actually invoked, but this can be verified.

Regarding bitmap, I'm not sure why we need this, is changing databases that common and latency sensitive? I thought maybe we can use a hastable directly. If there's a reason to use bitmap, and we're worried about memory usage, we can use bloom filter, occasionally getting false positives shouldn't be an issue.

But the question really is whether this bug is worth fixing.

@zuiderkwast
Copy link
Contributor

Personally, I'm only concerned that if we enable databases for cluster mode, the used memory for an empty node jumps 10 times, from 1MB to 10MB. It can be seen as a regression, which sticks out.

If users use thousands of databases, it's their own problem IMAO.

@xbasel
Copy link
Member Author

xbasel commented Jan 22, 2025

If we think #1601 is worth implementing, this change could be further justified.

@madolson
Copy link
Member

If users use thousands of databases, it's their own problem IMAO.

I kind of agree with this. As long as the default behavior stays the same, it should be OK that setting databases to some high value causes issues. The only concern I have is that the default database value is 16, which we artificially rewrite to be 1, but some users could be passing in 16 at startup. We shouldn't regress too much in that case.

@ranshid
Copy link
Member

ranshid commented Jan 22, 2025

Regarding bitmap, I'm not sure why we need this, is changing databases that common and latency sensitive? I thought maybe we can use a hastable directly. If there's a reason to use bitmap, and we're worried about memory usage, we can use bloom filter, occasionally getting false positives shouldn't be an issue.

I am not sure what you are referring. I meant using the Binary Index Tree (not exactly a bitmap) in order to manage the sizes of the dbs and their expiry in order to manage the activeExpiry better. I am not sure, though, a simple linked list of db's with expiry would not do the work still.

If each DB uses 3 kvstores (expiry, keys, sharded pubsubs) so it allocates 256KB upon creation so each DB will have about 770KB zmalloc size right? so that does not sound like a disaster in terms of a single DB but scale bad for thousands.
I agree this might not be so bad for the rare cases of big number of dbs. I only thought that if we can try placing the BIT creation and hashtables array creation lazily and clean them in case the DB is empty we might easily solve the problem indicated by @madolson :

The only concern I have is that the default database value is 16, which we artificially rewrite to be 1, but some users could be passing in 16 at startup. We shouldn't regress too much in that case.

@soloestoy
Copy link
Member

IMHO, if we support multi databases in cluster, an empty node' memory jumps from 1MB to 10MB is not a big deal. Empty is just a temporary state, it's impossible that users just create empty node for fun. And remember that we have already change the default repl-backlog-size from 1MB to 10MB, so the memory increasement here is acceptable.

@xbasel xbasel self-assigned this Jan 23, 2025
@ranshid
Copy link
Member

ranshid commented Jan 23, 2025

IMHO, if we support multi databases in cluster, an empty node' memory jumps from 1MB to 10MB is not a big deal. Empty is just a temporary state, it's impossible that users just create empty node for fun. And remember that we have already change the default repl-backlog-size from 1MB to 10MB, so the memory increasement here is acceptable.

I agree that I would not invest in a complicated change just to solve this. I think that if we do have a quick ans simple way to just better support the default config it might valid to do it, but we can also do as a follow-up.

@madolson
Copy link
Member

IMHO, if we support multi databases in cluster, an empty node' memory jumps from 1MB to 10MB is not a big deal. Empty is just a temporary state, it's impossible that users just create empty node for fun. And remember that we have already change the default repl-backlog-size from 1MB to 10MB, so the memory increasement here is acceptable.

I think we're all coming at this from a managed service provider where 9MB is not a big deal. I've seen self-hosted folks complain about this size of memory jump though, since it's often multi-tenant and sometimes running on limited hardware. If it's easy to do it's better to not have to announce a big memory jump in a new version, it would be better for it to behave the same and better when a user upgrades.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants